CN106815194A

CN106815194A - Model training method and device and keyword recognition method and device

Info

Publication number: CN106815194A
Application number: CN201510850285.2A
Authority: CN
Inventors: 刘粉香
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2017-06-09

Abstract

This application discloses a kind of model training method and device and keyword recognition method and device.Wherein, the model training method includes：The text message with part of speech mark is obtained, wherein, text message includes a plurality of sentence, and each word in every sentence carries the part of speech mark of corresponding part of speech type；Determine the term vector of each word in every sentence, term vector is the Multidimensional numerical for uniquely representing corresponding word；In units of the sentence in text message, the corresponding part of speech mark of each word in every sentence and its corresponding term vector are input to Recognition with Recurrent Neural Network, training obtains neural network model, wherein, neural network model is used to be marked the word in sentence.Present application addresses technical problem in the prior art to the keyword identification accuracy difference in sentence.

Description

Model training method and device and keyword recognition method and device

Technical field

The application is related to text-processing field, knows in particular to a kind of model training method and device and keyword Other method and device.

Background technology

Generally all include sentence keyword to be expressed in the sentence of text, for example, user's statement " has recently Point is tired, and I thinks that Yonghe Palace Temple is played ", wherein, place " Yonghe Palace Temple " is the keyword that it is included.However, for meter For calculation machine system, can not exactly find out these keywords as people, existing computer system for The identification of keyword is normally based on the part of speech or sentence structure of word in sentence, and target is found out after entering line statement participle , used as keyword, this mode is larger for the dependence of participle instrument, although for single part of speech for the word of part of speech Extract effective, and for occur in various parts of speech and natural language complicated clause, new clause, destructuring clause with And new vocabulary, its identification accuracy it is poor.

For above-mentioned problem, effective solution is not yet proposed at present.

The content of the invention

The embodiment of the present application provides a kind of model training method and device and keyword recognition method and device, with least Solve the technical problem in the prior art to the keyword identification accuracy difference in sentence.

According to the one side of the embodiment of the present application, there is provided a kind of model training method, including：Obtain and carry part of speech The text message of mark, wherein, the text message includes a plurality of sentence, and each word in every sentence is carried The part of speech mark of corresponding part of speech type；Determine the term vector of each word in every sentence, institute's predicate Vector is the Multidimensional numerical for uniquely representing corresponding word；In units of the sentence in the text message, will be every The corresponding part of speech mark of each word and its corresponding term vector are input to Recognition with Recurrent Neural Network in bar sentence, and training is obtained Neural network model, wherein, the neural network model is used to be marked the word in sentence.

Further, it is determined that the term vector of each word in every sentence includes：To every in the text message Bar sentence carries out word segmentation processing, obtains the set of words of the text message；Search each word in the set of words Corresponding term vector.

Further, it is determined that before the term vector of each word in every sentence, the model training method Also include：The text message of preset data amount is obtained, text message set is obtained；Institute is generated using machine learning mode The corresponding term vector of each word in text message set is stated, term vector set is obtained；Wherein, the word collection is searched The corresponding term vector of each word includes in conjunction：Each word in the set of words is searched from the term vector set Corresponding term vector.

Further, the keyword tag in every sentence of the text message is the first preset mark, other words Labeled as the second preset mark, to cause when word is recognized using the neural network model, by the keyword mark It is designated as first preset mark.

According to the another aspect of the embodiment of the present application, a kind of keyword recognition method is additionally provided, including：To text to be measured Originally word segmentation processing is carried out, the corresponding term vector of each word is determined；In units of the sentence in the text to be measured, The corresponding term vector of each word in every sentence is input in neural network model, using the neutral net mould The keyword that phenotypic marker goes out in the text to be measured.

According to the another aspect of the embodiment of the present application, a kind of model training apparatus are additionally provided, including：First obtains single Unit, for obtaining the text message with part of speech mark, wherein, the text message includes a plurality of sentence, every language Each word in sentence carries the part of speech mark of corresponding part of speech type；Determining unit, it is described every for determining The term vector of each word in bar sentence, the term vector is the Multidimensional numerical for uniquely representing corresponding word； Training unit, in units of the sentence in the text message, by the corresponding part of speech of each word in every sentence Mark and its corresponding term vector are input to Recognition with Recurrent Neural Network, and training obtains neural network model, wherein, the god It is used to be marked the word in sentence through network model.

Further, the training unit includes：Word-dividing mode, for being carried out to every sentence in the text message Word segmentation processing, obtains the set of words of the text message；Enquiry module, for searching each in the set of words The corresponding term vector of word.

Further, the model training apparatus also include：Second acquisition unit, for it is determined that every sentence In each word term vector before, obtain preset data amount text message, obtain text message set；Generation Unit, for generating the corresponding term vector of each word in the text message set using machine learning mode, obtains Term vector set；Wherein, during the enquiry module from the term vector set specifically for searching the set of words The corresponding term vector of each word.

According to the another aspect of the embodiment of the present application, a kind of keyword identifying device is additionally provided, including：Vector determines Unit, for carrying out word segmentation processing to text to be measured, determines the corresponding term vector of each word；Indexing unit, uses In in units of the sentence in the text to be measured, the corresponding term vector of each word in every sentence is input to god In through network model, the keyword in the text to be measured is marked using the neural network model.

According to the embodiment of the present application, by obtaining the text message marked with part of speech, wherein, text message includes many Bar sentence, each word in every sentence carries the part of speech mark of corresponding part of speech type；Determine every language The term vector of each word in sentence, term vector is the Multidimensional numerical for uniquely representing corresponding word；With text envelope Sentence in breath is unit, and the corresponding part of speech mark of each word in every sentence and its corresponding term vector are input to Recognition with Recurrent Neural Network, training obtains neural network model, facilitates the use neural network model and the word in sentence is entered Line flag, so as to identify keyword therein, solves and recognizes accuracy to the keyword in sentence in the prior art Poor technical problem, has reached the effect of the accuracy for improving keyword identification.

Brief description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing In：

Fig. 1 is the flow chart of the model training method according to the embodiment of the present application；

Fig. 2 is the flow chart of the keyword recognition method according to the embodiment of the present application；

Fig. 3 is the schematic diagram of the model training apparatus according to the embodiment of the present application；

Fig. 4 is the schematic diagram of the keyword identifying device according to the embodiment of the present application.

Specific embodiment

In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to The scope of the application protection.

It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or for these processes, method, product or other intrinsic steps of equipment or unit.

According to the embodiment of the present application, there is provided a kind of embodiment of the method for model training method, it is necessary to explanation, The step of flow of accompanying drawing is illustrated can perform in the such as one group computer system of computer executable instructions, and And, although logical order is shown in flow charts, but in some cases, can be with different from order herein Perform shown or described step.

Fig. 1 is the flow chart of the model training method according to the embodiment of the present application, as shown in figure 1, the method is included such as Lower step：

Step S102, obtains the text message with part of speech mark, wherein, text message includes a plurality of sentence, every Each word in sentence carries the part of speech mark of corresponding part of speech type.

The text message with part of speech mark of the embodiment of the present application, can be the sample of the text message of advance collection, By being manually marked to word interested in wherein every sentence, the part of speech of word interested, shape are marked Into text message.Wherein, it can also be multiple that the parts of speech classification of word interested can be one, such as mark Point noun, personage's noun etc..Its mask method can be：Place interested is expressed as：PLACE (place) is invalid Word is expressed as：NUL (sky).Such as " I thinks that Yonghe Palace Temple is played ", may be after participle " I thinks that Yonghe Palace Temple is played ", It is after artificial mark " I thinks that NUL goes NUL harmony PLACE palace PLACE to play NUL ".

Step S104, determines the term vector of each word in every sentence, and term vector is for uniquely representing corresponding The Multidimensional numerical of word.

After the text message with part of speech mark is got, determine that word is corresponding in every sentence in the text Term vector, the term vector of each word represents with one group of Multidimensional numerical, each not phase of the corresponding term vector of different words Together.Wherein, the term vector of word can have been pre-defined, after text message is extracted, from advance The vector of each word in text message is inquired in the term vector of definition.Can also be given birth to according to term vector set in advance Into rule, the term vector of each word is generated.Because each word carries corresponding part of speech mark in text message Note, therefore, the corresponding term vector of each word also correspond to be marked with the word identical part of speech.

Step S106, in units of the sentence in text message, by the corresponding part of speech mark of each word in every sentence And its corresponding term vector is input to Recognition with Recurrent Neural Network, training obtains neural network model, wherein, neutral net mould Type is used to be marked the word in sentence.

In the present embodiment, after the term vector for determining each word included in text message, with text message In sentence be unit, the sentence in text message is sequentially inputted to be trained in Recognition with Recurrent Neural Network, be input to Sentence in Recognition with Recurrent Neural Network is replaced with the wherein corresponding term vector of each word, i.e. by each word in sentence Corresponding term vector is input to Recognition with Recurrent Neural Network.The text message for extracting is trained by Recognition with Recurrent Neural Network, Obtain neural network model.

Due to being that the corresponding term vector of word therein is input into Memory Neural Networks in units of sentence, machine can be with Word, part of speech mark in memory sentence and combinations thereof form, and with the parameter (neutral net in neural network model Model Parameter determines that major part is matrix) remember these words, part of speech mark and combinations thereof form, relative to existing Have in technology using part of speech or sentence structure based on word in sentence, target part of speech is found out after entering line statement participle Word as keyword mode, the present embodiment recognizes the key in text by the neural network model that obtains of training Word, can exactly identify the keyword in the sentence of various structure types, and the accuracy to keyword identification is high.

Preferably, determining the term vector of each word in every sentence includes：Every sentence in text message is carried out Word segmentation processing, obtains the set of words of text message；Search the corresponding term vector of each word in set of words.

In the present embodiment, the term vector of word is previously generated, generate term vector set.Collecting as the text of sample After this information, each word pair inquired about from the term vector set for previously generating in every sentence of text information The term vector answered.Wherein, the word segmentation processing to every sentence of text message can be using participle instrument, according to one Set pattern then carries out participle, can be " I thinks that Yonghe Palace Temple is played " after participle such as " I thinks that Yonghe Palace Temple is played ".

Further, it is determined that before the term vector of each word in every sentence, model training method also includes： The text message of preset data amount is obtained, text message set is obtained；Text message collection is generated using machine learning mode The corresponding term vector of each word in conjunction, obtains term vector set；Wherein, each word correspondence in set of words is searched Term vector include：The corresponding term vector of each word in set of words is searched from term vector set.

In the present embodiment, it is determined that before the corresponding term vector of word, first generation term vector set, specifically, first obtains Substantial amounts of text message is taken, wherein, preset data amount can be the larger data volume of the scope for pre-setting；To obtain The text message of the preset data amount for arriving as training term vector text message set, then using machine learning mode The corresponding term vector of generation each word therein, obtains term vector set.So, in the text to being determined as sample In this information during the corresponding term vector of word, can directly be inquired about from the term vector set and obtained.

Machine learning mode can carry out term vector training using Google word2vec, according to input text, to each Individual word generates a dimension identical unique vector, i.e. Multidimensional numerical, and the dimension of the array can such as will with self-defined It is 0,1,0 that " happiness " may be marked ...].

Preferably, the keyword tag in every sentence of text message is the first preset mark, and other words are labeled as Second preset mark, is the first pre- bidding by keyword tag to cause when word is recognized using neural network model Note.

It is the to keyword tag interested using as in the sentence of the text message of training sample in the present embodiment Other invalid words are labeled as the second preset mark by one preset mark.When model training is carried out, what training was obtained Neural network model can remember these marks, therefore, recognize sentence in the neural network model obtained using training In keyword when, can in its output result by keyword tag be the first preset mark, by other invalid words Labeled as the second preset mark.

For example, word interested is the word for representing place, place is expressed as：PLACE (place), invalid word lists It is shown as：NUL (sky).By sentence " I thinks that Yonghe Palace Temple is played ", it is marked after participle, is marked as that " I thinks that NUL goes NUL harmony PLACE palace PLACE play NUL ".

A kind of optional mode of the model training method of the embodiment of the present application includes：

Step one, the substantial amounts of text message of collection, as term vector training text collection 1, for training term vector.

Step 2, participle is carried out to text set 1, term vector is generated using machine learning mode, obtain term vector set. Wherein, machine learning can carry out term vector training using Google word2vec, according to input text, to each Word generates a dimension identical unique vector, i.e. Multidimensional numerical, and the dimension of the array can be with self-defined, such as by " height It is emerging " may to mark be 0,1,0 ...].

The related text message of step 3, capturing service, participle is carried out to every sentence, manually carries out word to each word Property mark, used as training set 2, part of speech is classification interested.Wherein, classification interested can also may be used for one Think multiple, such as marking terrain noun, personage's noun.Its labeling method can be：Place interested is expressed as： PLACE (place), invalid word is expressed as：NUL (sky).May be " I after participle such as " I thinks that Yonghe Palace Temple is played " Think that Yonghe Palace Temple is played ", be after artificial mark " I thinks that NUL goes NUL harmony PLACE palace PLACE to play NUL ".

Word in step 4, training set 2 is represented with the term vector generated in above-mentioned step 2, in units of sentence, Term vector in training set 2 is input into RNN (Recognition with Recurrent Neural Network) to be trained, the RNN training after being trained Model.Wherein, the input Recognition with Recurrent Neural Network with sentence as neutral net, machine can remember word in sentence, Part of speech mark and combinations thereof form, and in the parameters memorizing in model these words, part of speech mark and combinations thereof form.

According to the embodiment of the present application, by way of term vector and Recognition with Recurrent Neural Network are combined, model training is carried out, So that keyword extraction is small to participle instrument accuracy dependence, and robustness is relatively strong (such as：Do not occur in training set Word, part of speech is also can obtain in test, it is keyword to identify whether).

A kind of keyword recognition method is additionally provided according to the embodiment of the present application, the keyword recognition method can be used for leading to The model training method for crossing the above embodiments of the present application trains the neural network model for obtaining to recognize keyword.Such as Fig. 2 Shown, the keyword recognition method includes：

Step S202, word segmentation processing is carried out to text to be measured, determines the corresponding term vector of each word.

In the present embodiment, the mode and the above embodiments of the present application of word segmentation processing and determination term vector to text to be measured The mode being previously mentioned in middle model training method is identical, does not repeat here.

Step S204, in units of the sentence in text to be measured, by the corresponding term vector of each word in every sentence It is input in neural network model, the keyword in text to be measured is marked using neural network model.

The nerve that neural network model in the present embodiment is obtained for the model training method training of the above embodiments of the present application Network model.

In units of the sentence in text to be measured, the wherein corresponding term vector of word is input in neural network model, The keyword in text to be measured is identified using neural network model, and is marked.Specifically, obtain to be measured Text, carries out participle, and each word term vector is represented, term vector is input into neural network model in units of sentence, Obtain the part of speech mark to each word, you can obtain the corresponding word of part of speech interested.

Due to being that the corresponding term vector of word therein is input into Memory Neural Networks in units of sentence, machine can be with Word, part of speech mark in memory sentence and combinations thereof form, and with the parameter (neutral net in neural network model Model Parameter determines that major part is matrix) remember these words, part of speech mark and combinations thereof form.Relative to existing Have in technology using part of speech or sentence structure based on word in sentence, target part of speech is found out after entering line statement participle Word as keyword mode, the present embodiment recognizes the key in text by the neural network model that obtains of training Word, in units of sentence, therefrom can exactly identify the keyword in the sentence of various structure types, to key The accuracy of word identification is high.

For example, be " how is Yonghe Palace Temple evaluation " after " how is Yonghe Palace Temple evaluation " participle, by neutral net mould Type calculate after result be：" harmony PLACE palace PLACE evaluate NUL how NUL ", by screening, can obtain Take place noun interested：Yonghe Palace Temple.

The embodiment of the present application additionally provides a kind of model training apparatus, and the device can be used for performing the embodiment of the present application Model training method, as shown in figure 3, the device includes：First acquisition unit 301, determining unit 303 and training Unit 305.

First acquisition unit 301 is used to obtain the text message with part of speech mark, wherein, text message includes a plurality of Sentence, each word in every sentence carries the part of speech mark of corresponding part of speech type.

The text message with part of speech mark of the embodiment of the present application, can be the sample of the text message of advance collection, By being manually marked to word interested in wherein every sentence, the part of speech of word interested, shape are marked Into text message.Wherein, it can also be multiple that the parts of speech classification of word interested can be 1, such as mark Point noun, personage's noun etc..Its mask method can be：Place interested is expressed as：PLACE (place) is invalid Word is expressed as：NUL (sky).Such as " I thinks that Yonghe Palace Temple is played ", may be after participle " I thinks that Yonghe Palace Temple is played ", It is after artificial mark " I thinks that NUL goes NUL harmony PLACE palace PLACE to play NUL ".

Determining unit 303 is used to determine the term vector of each word in every sentence, and term vector is for unique expression The Multidimensional numerical of corresponding word.

Training unit 305 is used in units of the sentence in text message, by the corresponding word of each word in every sentence Property mark and its corresponding term vector be input to Recognition with Recurrent Neural Network, training obtains neural network model, wherein, nerve Network model is used to be marked the word in sentence.

Preferably, training unit includes：Word-dividing mode, for carrying out word segmentation processing to every sentence in text message, Obtain the set of words of text message；Enquiry module, for searching the corresponding term vector of each word in set of words.

Preferably, model training apparatus also include：Second acquisition unit, for it is determined that each word in every sentence Before the term vector of language, the text message of preset data amount is obtained, obtain text message set；Generation unit, is used for The corresponding term vector of each word in text message set is generated using machine learning mode, term vector set is obtained；Its In, enquiry module is specifically for the corresponding term vector of each word in the lookup set of words from term vector set.

The model training apparatus include processor and memory, above-mentioned first acquisition unit 301, determining unit 303 Stored in memory as program unit with the grade of training unit 305, stored in memory by computing device Said procedure unit.

Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, trained by adjusting kernel parameter and obtain neural network model, for being identified to keyword in sentence.

Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/ Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one Individual storage chip.

Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, fit In the program code for performing initialization there are as below methods step：The text message with part of speech mark is obtained, wherein, text This information includes a plurality of sentence, and each word in every sentence carries the part of speech mark of corresponding part of speech type； Determine the term vector of each word in every sentence, term vector is the Multidimensional numerical for uniquely representing corresponding word； In units of the sentence in text message, by the corresponding part of speech of each word in every sentence mark and its corresponding word to Amount is input to Recognition with Recurrent Neural Network, and training obtains neural network model, wherein, neural network model is used in sentence Word be marked.

The embodiment of the present application additionally provides a kind of keyword identifying device, and the device can be used for performing the embodiment of the present application Keyword recognition method, as shown in figure 4, the device includes：Vector determination unit 401 and indexing unit 403.

Vector determination unit 401 is used to carry out word segmentation processing to text to be measured, determines the corresponding term vector of each word.

Indexing unit 403 is used in units of the sentence in text to be measured, and each word in every sentence is corresponding Term vector is input in neural network model, and the keyword in text to be measured is marked using neural network model.

The keyword identifying device includes processor and memory, above-mentioned vector determination unit 401 and indexing unit 403 Deng being stored in memory as program unit, by computing device storage said procedure unit in memory. It is above-mentioned to may be stored in memory.

Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, utilize neural network model to be identified keyword in text to be measured by adjusting kernel parameter.

Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, fit In the program code for performing initialization there are as below methods step：Word segmentation processing is carried out to text to be measured, each word is determined The corresponding term vector of language；In units of the sentence in text to be measured, by the corresponding word of each word in every sentence to Amount is input in neural network model, and the keyword in text to be measured is marked using neural network model.

Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.

In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit, Can be a kind of division of logic function, there can be other dividing mode when actually realizing, for example multiple units or component Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, institute Display or the coupling each other for discussing or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.

The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.

In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or when using, Can store in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application On all or part of the part that is contributed to prior art in other words or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used to so that one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And foregoing storage medium includes：USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD Etc. it is various can be with the medium of store program codes.

The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims

1. a kind of model training method, it is characterised in that including：

The text message with part of speech mark is obtained, wherein, the text message includes a plurality of sentence, every language Each word in sentence carries the part of speech mark of corresponding part of speech type；

Determine the term vector of each word in every sentence, the term vector is to represent correspondence for unique Word Multidimensional numerical；

In units of the sentence in the text message, by the corresponding part of speech of each word in every sentence mark and Its corresponding term vector is input to Recognition with Recurrent Neural Network, and training obtains neural network model, wherein, the nerve Network model is used to be marked the word in sentence.

2. model training method according to claim 1, it is characterised in that determine each in every sentence The term vector of word includes：

Word segmentation processing is carried out to every sentence in the text message, the set of words of the text message is obtained；

Search the corresponding term vector of each word in the set of words.

3. model training method according to claim 2, it is characterised in that it is determined that every in every sentence Before the term vector of individual word, the model training method also includes：

The text message of preset data amount is obtained, text message set is obtained；

The corresponding term vector of each word in the text message set is generated using machine learning mode, word is obtained Vector set；

Wherein, the corresponding term vector of each word includes in searching the set of words：From the term vector set It is middle to search the corresponding term vector of each word in the set of words.

4. model training method according to any one of claim 1 to 3, it is characterised in that the text message Every sentence in keyword tag be the first preset mark, other words be labeled as the second preset mark, with So that being the described first pre- bidding by the keyword tag when word is recognized using the neural network model Note.

5. a kind of keyword recognition method, it is characterised in that including：

Word segmentation processing is carried out to text to be measured, the corresponding term vector of each word is determined；

It is in units of the sentence in the text to be measured, the corresponding term vector of each word in every sentence is defeated Enter in training the neural network model for obtaining to the model training method any one of Claims 1-4, The keyword in the text to be measured is marked using the neural network model.

6. a kind of model training apparatus, it is characterised in that including：

First acquisition unit, for obtaining the text message with part of speech mark, wherein, the text message bag A plurality of sentence is included, each word in every sentence carries the part of speech mark of corresponding part of speech type；

Determining unit, the term vector for determining each word in every sentence, the term vector is use In the Multidimensional numerical for uniquely representing corresponding word；

Training unit, in units of the sentence in the text message, by each word pair in every sentence The part of speech mark answered and its corresponding term vector are input to Recognition with Recurrent Neural Network, and training obtains neural network model, Wherein, the neural network model is used to be marked the word in sentence.

7. model training apparatus according to claim 6, it is characterised in that the training unit includes：

Word-dividing mode, for carrying out word segmentation processing to every sentence in the text message, obtains the text envelope The set of words of breath；

Enquiry module, for searching the corresponding term vector of each word in the set of words.

8. model training apparatus according to claim 7, it is characterised in that the model training apparatus also include：

Second acquisition unit, for it is determined that before the term vector of each word in every sentence, obtaining The text message of preset data amount, obtains text message set；

Generation unit, for generating the text message set using machine learning mode in each word it is corresponding Term vector, obtains term vector set；

Wherein, the enquiry module from the term vector set specifically for searching each in the set of words The corresponding term vector of word.

9. model training apparatus according to any one of claim 6 to 8, it is characterised in that the text message Every sentence in keyword tag be the first preset mark, other words be labeled as the second preset mark, with So that being the described first pre- bidding by the keyword tag when word is recognized using the neural network model Note.

10. a kind of keyword identifying device, it is characterised in that including：

Vector determination unit, for carrying out word segmentation processing to text to be measured, determine the corresponding word of each word to Amount；

Indexing unit, in units of the sentence in the text to be measured, by each word in every sentence Corresponding term vector is input to the god that the model training method training any one of Claims 1-4 is obtained In through network model, the keyword in the text to be measured is marked using the neural network model.