CN106815193A

CN106815193A - Model training method and device and wrong word recognition methods and device

Info

Publication number: CN106815193A
Application number: CN201510850128.1A
Authority: CN
Inventors: 刘粉香
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2017-06-09

Abstract

This application discloses a kind of model training method and device and wrong word recognition methods and device.Wherein, the model training method includes：Text message is extracted from pre-set text data source, wherein, the text included in pre-set text data source is not comprising the text for having wrong word；The corresponding term vector of each word in text message is determined, wherein, term vector is for unique Multidimensional numerical for representing word；In units of the sentence in text message, the corresponding term vector of each word in every sentence is input to Memory Neural Networks, training obtains neural network model, wherein, neural network model is used to recognize the wrong word in text.Present application addresses the low technical problem of the discrimination of wrong word in prior art Chinese version.

Description

Model training method and device and wrong word recognition methods and device

Technical field

The application is related to text-processing field, knows in particular to a kind of model training method and device and wrong word Other method and device.

Background technology

Text is the important carrier of information-recording.Mostly it is human-edited due to text, and human-edited can go out unavoidably Now slip up, so that occurring wrong word in text.For the identification of wrong word in text, at present generally using artificial Set up correct lexicon, and carry out text matches and recognize the mode of wrong word, however it is this be difficult to find comprehensively, Correct lexicon, causes loss higher, and then cause the discrimination of wrong word in text low.

For above-mentioned problem, effective solution is not yet proposed at present.

The content of the invention

The embodiment of the present application provides a kind of model training method and device and wrong word recognition methods and device, with least Solve the low technical problem of the discrimination of wrong word in prior art Chinese version.

According to the one side of the embodiment of the present application, there is provided a kind of model training method, including：From pre-set text number According to extracting text message in source, wherein, the text included in the pre-set text data source is not comprising there is wrong word Text；The corresponding term vector of each word in the text message is determined, wherein, the term vector is for only One Multidimensional numerical for representing word；In units of the sentence in text message, by each word correspondence in every sentence Term vector be input to Memory Neural Networks, training obtains neural network model, wherein, the neural network model is used Wrong word in text is recognized.

Further, before the corresponding term vector of each word in determining the text message, the model training Method also includes：Target text storehouse is obtained, the text that the target text place is included is not comprising the text for having wrong word This；The target text storehouse is trained using term vector model, to generate the word pair in the target text storehouse The term vector answered, obtains the first training set.

Further, it is determined that go out the corresponding term vector of each word in the text message including：To the text message Word segmentation processing is carried out, the second training set is obtained；Each word in second training set is searched from first training set The corresponding term vector of language.

Further, before the corresponding term vector of each word in every sentence is input into Memory Neural Networks, The model training method also includes：By the corresponding term vector of each word in every sentence labeled as default mark, Wherein, the default mark represents that the corresponding word of term vector is non-wrong word, to cause utilizing the neutral net When Model Identification goes out non-wrong word, the word of non-wrong word is labeled as the default mark.

According to the another aspect of the embodiment of the present application, a kind of wrong word recognition methods is additionally provided, including：To text to be measured Originally word segmentation processing is carried out, the corresponding term vector of each word is determined；In units of the sentence in the text to be measured, The corresponding term vector of each word in every sentence is input in neural network model, using the neutral net mould Type identifies the wrong word in the text to be measured.

According to the another aspect of the embodiment of the present application, a kind of model training apparatus are additionally provided, including：Extraction unit, For extracting text message from pre-set text data source, wherein, the text included in the pre-set text data source It is not comprising the text for having wrong word；Determining unit, for determining the corresponding word of each word in the text message Vector, wherein, the term vector is for unique Multidimensional numerical for representing word；Training unit, for text envelope Sentence in breath is unit, and the corresponding term vector of each word in every sentence is input into Memory Neural Networks, is instructed Neural network model is got, wherein, the neural network model is used to recognize the wrong word in text.

Further, the model training apparatus also include：Acquiring unit, in the text message is determined Before the corresponding term vector of each word, target text storehouse is obtained, the text that the target text place is included is not wrap Text containing wrong word；Generation unit, for being trained to the target text storehouse using term vector model, with The corresponding term vector of word in the target text storehouse is generated, the first training set is obtained.

Further, the determining unit includes：Word-dividing mode, for carrying out word segmentation processing to the text message, Obtain the second training set；Enquiry module, for searching each word in second training set from first training set The corresponding term vector of language.

Further, the model training apparatus also include：Indexing unit, for by each word in every sentence Before the corresponding term vector of language is input to Memory Neural Networks, by the corresponding term vector mark of each word in every sentence Default mark is designated as, wherein, the default mark represents that the corresponding word of term vector is non-wrong word, to cause in profit When identifying non-wrong word with the neural network model, the word of non-wrong word is labeled as the default mark.

According to the another aspect of the embodiment of the present application, a kind of wrong word identifying device is additionally provided, including：Vector determines Unit, for carrying out word segmentation processing to text to be measured, determines the corresponding term vector of each word；Recognition unit, uses In in units of the sentence in the text to be measured, the corresponding term vector of each word in every sentence is input to god In through network model, the wrong word in the text to be measured is identified using the neural network model.

According to the embodiment of the present application, text message is extracted by from pre-set text data source, wherein, pre-set text number It is, not comprising the text for having wrong word, to determine the corresponding word of each word in text message according to the text included in source Vector, wherein, term vector is the Multidimensional numerical for unique expression word, in units of the sentence in text message, The corresponding term vector of each word in every sentence is input to Memory Neural Networks, training obtains neural network model, Facilitate the use neural network model to recognize the wrong word in text, improve the discrimination to wrong word in text, Solve the low technical problem of the discrimination of wrong word in prior art Chinese version.

Brief description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing In：

Fig. 1 is the flow chart of the model training method according to the embodiment of the present application；

Fig. 2 is the flow chart of the wrong word recognition methods according to the embodiment of the present application；

Fig. 3 is the schematic diagram of the model training apparatus according to the embodiment of the present application；

Fig. 4 is the schematic diagram of the wrong word identifying device according to the embodiment of the present application.

Specific embodiment

In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to The scope of the application protection.

It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or for these processes, method, product or other intrinsic steps of equipment or unit.

According to the embodiment of the present application, there is provided a kind of embodiment of the method for model training method, it is necessary to explanation, The step of flow of accompanying drawing is illustrated can perform in the such as one group computer system of computer executable instructions, and And, although logical order is shown in flow charts, but in some cases, can be with different from order herein Perform shown or described step.

Fig. 1 is the flow chart of the model training method according to the embodiment of the present application, as shown in figure 1, the method is included such as Lower step：

Step S102, text message is extracted from pre-set text data source, wherein, included in pre-set text data source Text be not comprising the text for having wrong word.

Pre-set text data source can be People's Daily, Chinese Government net etc. resource website, can be by after correction not Include the text data source of wrong word.Include the substantial amounts of text without wrong word in the pre-set text data source, Therefrom extract these text messages.

Step S104, determines the corresponding term vector of each word in text message, wherein, term vector is for unique Represent the Multidimensional numerical of word.

The text message gone out to said extracted, determines the wherein corresponding term vector of each word, the word of each word to Amount represents that the corresponding term vector of different words is different with one group of Multidimensional numerical.Wherein, the term vector of word Can pre-define, after text message is extracted, text inquired from pre-defined term vector The vector of each word in this information.Each word can also be generated according to term vector create-rule set in advance Term vector.

Step S106, in units of the sentence in text message, by the corresponding term vector of each word in every sentence It is input to Memory Neural Networks, training obtains neural network model, wherein, neural network model is used to recognizing in text Wrong word.

In the present embodiment, after the term vector for determining each word included in text message, with text message In sentence be unit, the sentence in text message is sequentially inputted to be trained in Memory Neural Networks, be input to Sentence in Memory Neural Networks is replaced with the wherein corresponding term vector of each word, i.e. by each word in sentence Corresponding term vector is input to Memory Neural Networks, and the Memory Neural Networks can be preferably based on Recognition with Recurrent Neural Network Memory Neural Networks (i.e. LSTM+Bidirectional RNN) in short-term long.By Memory Neural Networks to extraction Text message is trained, and obtains neural network model.It is in units of sentence that the corresponding term vector of word therein is defeated Enter to Memory Neural Networks, machine can remember the word in sentence and combinations thereof form, and with neural network model Parameter (parameter determination in neural network model, major part is matrix) remember these words and combinations thereof.Relative to In the prior art using manually setting up correct lexicon, and carry out text matches and recognize the mode of wrong word, this Embodiment is trained by Memory Neural Networks to the text without wrong word, obtains neural network model, then profit The wrong word in text is recognized with the neural network model, without manually setting up lexicon, you can according to word combination And sentence recognizes wrong word therein, can be based on context semantic, in effectively and quickly identifying text Wrong word.

For example, for the word " indignant position " occurred in text, word is included in the lexicon set up in the prior art Language " angrily leaving the theatre ", " indignant ", " leaving the theatre " and " position ", therefore when the word of above-mentioned appearance is recognized, match Word " indignant " and " leaving the theatre ", may think that in the word do not have wrong word.And in the embodiment of the present application, due to It is in units of sentence, i.e., using " angrily leaving the theatre " as overall, to be input to god when training obtains neural network model Through being trained in network, by " angrily the leaving the theatre " of its parameters memorizing in neural network model, therefore, when by " anger Right position " is input in neural network model, then will recognise that wherein " vertical " word is wrong word.

Preferably, before the corresponding term vector of each word in determining text message, model training method also includes： Target text storehouse is obtained, the text that target text place is included is not comprising the text for having wrong word；Using term vector mould Type is trained to target text storehouse, to generate the corresponding term vector of word in target text storehouse, obtains the first training Collection.

The target text storehouse of the present embodiment, can be the dictionary for including various words, such as xinhua dictionary, into words and phrases The text library not comprising wrong word such as allusion quotation, article, obtains target text storehouse with as term vector training set.Term vector Model can be existing maturity model, and the model can generate a dimension phase according to input text to each word Same Multidimensional numerical, i.e. term vector, the dimension of the term vector such as will for that can be defined according to term vector training set It is 1,0,0 that " one " may mark ... ...], it is 0,1,0 that " happiness " may be marked ... ...].

In the embodiment of the present application, the term vector of each word in the term vector training set that can be obtained according to training in advance, In order to therefrom inquire about to the term vector for each word in the text message for carrying out neural network model training.

It should be noted that the embodiment of the present application can also be generates the corresponding term vector of each punctuation mark.

Further, it is determined that go out the corresponding term vector of each word in text message including：Participle is carried out to text message Treatment, obtains the second training set；The corresponding term vector of each word in the second training set is searched from the first training set.

For the text message for carrying out neural network model training, word segmentation processing first is carried out to it, obtain word collection It is the second training set to close, the corresponding word of each word in the first training training set of Integrated query second obtained from above-mentioned Vector, so that it is determined that going out the term vector of each word in every sentence of above-mentioned text message.

Specifically, it is possible to use existing participle instrument, word segmentation processing is carried out to above-mentioned text message, wherein, after participle Text be composed of words, such as will " I is a Chinese " participle be for " I is only one China " or " I am Only one China ".

Preferably, before the corresponding term vector of each word in every sentence is input into Memory Neural Networks, mould Type training method also includes：By the corresponding term vector of each word in every sentence labeled as default mark, wherein, Default mark represents that the corresponding word of term vector is non-wrong word, to cause identifying non-mistake using neural network model During malapropism, by the word of non-wrong word labeled as default mark.

In the embodiment of the present application, it is mark to be input to each word in every sentence of Memory Neural Networks and mark, Such as " 1 ", so, text is trained obtain neural network model when, parameter can be remembered in neural network model It is default mark to recall these term identifications.When using neural network model to recognize text to be measured, in its output result Can be by the word for not having wrong word in text to be measured labeled as the default mark, and the word for wrong word occur is not marked then, Or labeled as other marks, in order to quickly filter out the wrong word in text to be measured.

A kind of optional mode of the model training method of the embodiment of the present application includes：

Step one, obtain reliable text library (such as text of xinhua dictionary, dictionary of idioms, article not comprising wrong word This storehouse) it is target text storehouse, it is the first training set as the training set 1 of term vector.

Step 2, using term vector model training training set 1, obtain each word (including punctuation mark) in training set 1 Term vector.Wherein, term vector model can utilize existing maturity model, the model to be given according to input text Each word generates a unique Multidimensional numerical of dimension identical, i.e. term vector, and the dimension of the term vector can be advance Definition, such as it is 1,0,0 that may mark " one " ... ...], it is 0,1,0 that " happiness " may be marked ... ...].

Step 3, from reliably by extracting text message in the molecular text data source of a large amount of sentences, as text training Collection.Wherein, the molecular text data source-representation of reliable a large amount of sentences：There is no the text data source of wrong word, such as from The channels such as People's Daily, Chinese Government's net are obtained.

Step 4, using existing participle instrument, word segmentation processing is carried out to above-mentioned text training set, obtain training set 2 i.e. Second training set.Wherein, the text after participle is word composition, may be such as " I by " I is a Chinese " participle It is only one China " or " I is only one China ".

Step 5, in units of the sentence of training set 2, the term vector of each word in the sentence is found out from training set 1, And each word is labeled as non-wrong word (as represented non-wrong word with " 1 "), the term vector input that will be obtained is based on following The length of ring neutral net Memory Neural Networks (i.e. LSTM+Bidirectional RNN) in short-term, training obtains nerve Network model (Model Parameter determines that major part is matrix).Wherein, neutral net, machine are input into units of sentence Device can remember the word in sentence and combinations thereof form, and with these combinations of the parameters memorizing in model.

By using neural network model, can be analyzed according to word combination, sentences in article and paragraph, Jin Erti Recognition accuracy high, reduction loss.

A kind of wrong word recognition methods is additionally provided according to the embodiment of the present application, the wrong word recognition methods can be used for leading to The model training method for crossing the above embodiments of the present application trains the neural network model for obtaining to recognize wrong word.Such as Fig. 2 Shown, the wrong word recognition methods includes：

Step S202, word segmentation processing is carried out to text to be measured, determines the corresponding term vector of each word.

Each word after word segmentation processing can from first training its corresponding word of Integrated query in the embodiment of the present application to Amount.

Step S204, in units of the sentence in text to be measured, by the corresponding term vector of each word in every sentence It is input in neural network model, the wrong word in text to be measured is identified using neural network model.

The nerve that neural network model in the present embodiment is obtained for the model training method training of the above embodiments of the present application Network model.

Because the neural network model is the text without wrong word to be trained by Memory Neural Networks obtain, Parameter (parameter determination, most of for matrix in neural network model) in neural network model can be with memory training text There is no word of wrong word and combinations thereof in this, without manually setting up lexicon, you can according to word combination and sentence To recognize wrong word therein, can be based on context semantic, effectively and quickly identify the wrong word in text.

The neural network model that the term vector input of text to be measured is trained, by the calculating of neural network model, will Each word is marked in output result, such as non-wrong word is designated as：1, wrong word is designated as：- 1, and then can screen Go out wrong word.

The embodiment of the present application additionally provides a kind of model training apparatus, and the device can be used for performing the embodiment of the present application Model training method, as shown in figure 3, the model training apparatus include：Extraction unit 301, the and of determining unit 303 Training unit 305.

Extraction unit 301 is used to extract text message from pre-set text data source, wherein, in pre-set text data source Comprising text be not comprising the text for having wrong word.

Determining unit 303 be used for determine the corresponding term vector of each word in text message, wherein, term vector is use In unique Multidimensional numerical for representing word.

Training unit 305 is used in units of the sentence in text message, and each word in every sentence is corresponding Term vector is input to Memory Neural Networks, and training obtains neural network model, wherein, neural network model is used to recognize Wrong word in text.

Preferably, model training apparatus also include：Acquiring unit, for each word pair in text message is determined Before the term vector answered, target text storehouse is obtained, the text that target text place is included is not comprising the text for having wrong word This；Generation unit, for being trained to target text storehouse using term vector model, with generating target text storehouse The corresponding term vector of word, obtains the first training set.

Preferably, determining unit includes：Word-dividing mode, for carrying out word segmentation processing to text message, obtains the second instruction Practice collection；Enquiry module, for searching the corresponding term vector of each word in the second training set from the first training set.

Preferably, model training apparatus also include：Indexing unit, for each word in every sentence is corresponding Term vector be input to Memory Neural Networks before, by the corresponding term vector of each word in every sentence labeled as pre- Bidding is known, wherein, preset mark and represent that the corresponding word of term vector is non-wrong word, to cause utilizing neutral net When Model Identification goes out non-wrong word, by the word of non-wrong word labeled as default mark.

The model training apparatus include processor and memory, said extracted unit 301, determining unit 303 and instruction Practice unit 305 grade as program unit storage in memory, by computing device store in memory it is above-mentioned Program unit.

Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, trained by adjusting kernel parameter and obtain neural network model.

Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/ Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one Individual storage chip.

Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, fit In the program code for performing initialization there are as below methods step：Text message is extracted from pre-set text data source, wherein, Text included in pre-set text data source is not comprising the text for having wrong word；Determine each word in text message The corresponding term vector of language, wherein, term vector is for unique Multidimensional numerical for representing word；With the language in text message Sentence is unit, and the corresponding term vector of each word in every sentence is input into Memory Neural Networks, and training obtains god Through network model, wherein, neural network model is used to recognize the wrong word in text.

A kind of wrong word identifying device is additionally provided according to the embodiment of the present application, the wrong word identifying device can be used for holding The wrong word recognition methods that row the embodiment of the present application is provided.As shown in figure 4, the wrong word identifying device includes：Vector Determining unit 401 and recognition unit 403.

Vector determination unit 401 is used to carry out word segmentation processing to text to be measured, determines the corresponding term vector of each word；

Recognition unit 403 is used in units of the sentence in text to be measured, and each word in every sentence is corresponding Term vector is input in neural network model, and the wrong word in text to be measured is identified using neural network model.

The wrong word identifying device includes processor and memory, above-mentioned vector determination unit 401 and recognition unit 403 Deng being stored in memory as program unit, by computing device storage said procedure unit in memory.

Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, the wrong word in text is recognized by adjusting kernel parameter.

Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, fit In the program code for performing initialization there are as below methods step：Word segmentation processing is carried out to text to be measured, each word is determined The corresponding term vector of language；In units of the sentence in text to be measured, by the corresponding word of each word in every sentence to Amount is input in neural network model, and the wrong word in text to be measured is identified using neural network model.

Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.

In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit, Can be a kind of division of logic function, there can be other dividing mode when actually realizing, for example multiple units or component Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, institute Display or the coupling each other for discussing or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.

The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.

In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or when using, Can store in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application On all or part of the part that is contributed to prior art in other words or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used to so that one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And foregoing storage medium includes：USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD Etc. it is various can be with the medium of store program codes.

The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims

1. a kind of model training method, it is characterised in that including：

Text message is extracted from pre-set text data source, wherein, included in the pre-set text data source Text is not comprising the text for having wrong word；

The corresponding term vector of each word in the text message is determined, wherein, the term vector is for only One Multidimensional numerical for representing word；

In units of the sentence in text message, the corresponding term vector of each word in every sentence is input to Memory Neural Networks, training obtains neural network model, wherein, the neural network model is used to recognize text In wrong word.

2. model training method according to claim 1, it is characterised in that every in the text message is determined Before the corresponding term vector of individual word, the model training method also includes：

Target text storehouse is obtained, the text that the target text place is included is not comprising the text for having wrong word；

The target text storehouse is trained using term vector model, to generate the word in the target text storehouse The corresponding term vector of language, obtains the first training set.

3. model training method according to claim 2, it is characterised in that determine each in the text message The corresponding term vector of word includes：

Word segmentation processing is carried out to the text message, the second training set is obtained；

The corresponding term vector of each word in second training set is searched from first training set.

4. model training method according to claim 1, it is characterised in that by each word in every sentence Before corresponding term vector is input to Memory Neural Networks, the model training method also includes：

By the corresponding term vector of each word in every sentence labeled as default mark, wherein, the pre- bidding Know and represent that the corresponding word of term vector is non-wrong word, with so that non-being identified using the neural network model During wrong word, the word of non-wrong word is labeled as the default mark.

5. a kind of wrong word recognition methods, it is characterised in that including：

Word segmentation processing is carried out to text to be measured, the corresponding term vector of each word is determined；

It is in units of the sentence in the text to be measured, the corresponding term vector of each word in every sentence is defeated Enter in training the neural network model for obtaining to the model training method any one of Claims 1-4, The wrong word in the text to be measured is identified using the neural network model.

6. a kind of model training apparatus, it is characterised in that including：

Extraction unit, for extracting text message from pre-set text data source, wherein, the pre-set text number It is not comprising the text for having wrong word according to the text included in source；

Determining unit, for determining the corresponding term vector of each word in the text message, wherein, it is described Term vector is for unique Multidimensional numerical for representing word；

Training unit, in units of the sentence in text message, by each word correspondence in every sentence Term vector be input to Memory Neural Networks, training obtains neural network model, wherein, the neutral net mould Type is used to recognize the wrong word in text.

7. model training apparatus according to claim 6, it is characterised in that the model training apparatus also include：

Acquiring unit, for before the corresponding term vector of each word in determining the text message, obtaining Target text storehouse, the text that the target text place is included is not comprising the text for having wrong word；

Generation unit, for being trained to the target text storehouse using term vector model, to generate the mesh The corresponding term vector of word in mark text library, obtains the first training set.

8. model training apparatus according to claim 7, it is characterised in that the determining unit includes：

Word-dividing mode, for carrying out word segmentation processing to the text message, obtains the second training set；

Enquiry module, for searching second training set from first training set in each word it is corresponding Term vector.

9. model training apparatus according to claim 6, it is characterised in that the model training apparatus also include：

Indexing unit, for the corresponding term vector of each word in every sentence to be input into memory nerve net Before network, by the corresponding term vector of each word in every sentence labeled as default mark, wherein, it is described pre- It is non-wrong word that bidding knows the corresponding word of expression term vector, to cause to be recognized using the neural network model When going out non-wrong word, the word of non-wrong word is labeled as the default mark.

10. a kind of wrong word identifying device, it is characterised in that including：

Vector determination unit, for carrying out word segmentation processing to text to be measured, determine the corresponding word of each word to Amount；

Recognition unit, in units of the sentence in the text to be measured, by each word in every sentence Corresponding term vector is input to the god that the model training method training any one of Claims 1-4 is obtained In through network model, the wrong word in the text to be measured is identified using the neural network model.