CN106649294A

CN106649294A - Training of classification models and method and device for recognizing subordinate clauses of classification models

Info

Publication number: CN106649294A
Application number: CN201611250331.6A
Authority: CN
Inventors: 郭祥; 杨君; 赵博洋; 田东东; 王思月; 柴静
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-05-10
Anticipated expiration: 2036-12-29
Also published as: CN106649294B

Abstract

The embodiment of the invention provides training of classification models and a method and device for recognizing subordinate clauses of the classification models. The training method includes the steps that English sentences containing English subordinate clauses are set as training samples; the training samples are converted into feature text sequences; the classification models used for recognizing the English subordinate clauses are trained by means of the feature text sequences. The types of the subordinate clauses included in the English sentences can be automatically recognized, information diversity of the English sentences is improved, the situation that users manually compare the English sentences by inquiring other documents is reduced, waste time can be reduced, and efficiency is improved. Besides, the error probability is reduced under the circumstance that little knowledge is mastered.

Description

A kind of training of disaggregated model and its Clauses Recognition method and apparatus

Technical field

The present invention relates to the technical field of computer disposal, more particularly to a kind of training of the disaggregated model of English subordinate clause Method, a kind of method and a kind of corresponding training cartridge of the disaggregated model of English subordinate clause that English subordinate clause is recognized based on disaggregated model Put, a kind of device that English subordinate clause is recognized based on disaggregated model.

Background technology

With globalization development, English is used as one of international language, it has also become people study basic subject it One.

People run into the english sentence being ignorant of when the English film of english article, viewing is read, and most people all can be by Translation application is translated.

Current translation application is often translated to english sentence, obtains corresponding implication, but, for For practising the people of purpose, especially student, can have other demands, at this time, it may be necessary to people manually pass through to the English sentence Inquire about other data to contrast English sentence, not only spend more time, cause it is less efficient, and to the acquisition of knowledge Easily malfunction in the case of less.

The content of the invention

In view of the above problems, it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on State the disaggregated model of a kind of English subordinate clause of problem training method, it is a kind of based on disaggregated model recognize English subordinate clause method and A kind of corresponding trainer of the disaggregated model of English subordinate clause, a kind of device that English subordinate clause is recognized based on disaggregated model.

According to one aspect of the present invention, there is provided a kind of training method of the disaggregated model of English subordinate clause, including：

English sentence with English subordinate clause is set to into training sample；

Training sample conversion is characterized into text sequence；

The disaggregated model for recognizing English subordinate clause is trained using the feature text sequence.

Alternatively, it is described by the training sample conversion be characterized text sequence the step of include：

Recognize the composition structure of the training sample；

Characteristic sequence text is formed using the composition structure.

Alternatively, it is described to be trained for wrapping the step of the disaggregated model for recognizing English subordinate clause using the feature text sequence Include：

The feature text sequence is input in convolutional neural networks；

In the convolutional neural networks based on the order of word in the training sample, using the feature text sequence Train the disaggregated model for recognizing English subordinate clause.

According to a further aspect in the invention, there is provided a kind of method that English subordinate clause is recognized based on disaggregated model, including：

Determine english sentence to be identified；

English sentence conversion is characterized into text sequence；

The feature text sequence is input into into preset disaggregated model, to recognize the subordinate clause class that the english sentence is included Type.

Alternatively, it is described to include the step of english sentence conversion is characterized into text sequence：

Recognize the composition structure of the english sentence；

Characteristic sequence text is formed using the composition structure.

Alternatively, it is described that the feature text sequence is input into preset disaggregated model, to recognize the english sentence institute Comprising subordinate clause type the step of include：

The feature text sequence is input in the disaggregated model trained by convolutional neural networks；

The order of word in the english sentence is based in the disaggregated model, is recognized using the feature text sequence The subordinate clause type that the english sentence is included.

According to a further aspect in the invention, there is provided a kind of trainer of the disaggregated model of English subordinate clause, including：

Training sample setup module, is suitable to for the english sentence with English subordinate clause to be set to training sample；

Training sample modular converter, is suitable to for training sample conversion to be characterized text sequence；

Disaggregated model training module, is suitable for use with the feature text sequence and trains classification mould for recognizing English subordinate clause Type.

Alternatively, the training sample modular converter includes：

The composition of sample recognizes submodule, is suitable to recognize the composition structure of the training sample；

Sample characteristics form submodule, are suitable for use with the composition structure and form characteristic sequence text.

Alternatively, the disaggregated model training module includes：

Convolutional neural networks input submodule, is suitable to that the feature text sequence is input in convolutional neural networks；

Convolutional neural networks train submodule, are suitable in the convolutional neural networks based on word in the training sample Order, disaggregated model for recognizing English subordinate clause is trained using the feature text sequence.

According to a further aspect in the invention, there is provided a kind of device that English subordinate clause is recognized based on disaggregated model, including：

English sentence determining module, is adapted to determine that english sentence to be identified；

English sentence modular converter, is suitable to for english sentence conversion to be characterized text sequence；

Subordinate clause type identification module, is suitable to for the feature text sequence to be input into preset disaggregated model, described to recognize The subordinate clause type that english sentence is included.

Alternatively, the english sentence modular converter includes：

Sentence structure recognizes submodule, is suitable to recognize the composition structure of the english sentence；

Sentence characteristics form submodule, are suitable for use with the composition structure and form characteristic sequence text.

Alternatively, the subordinate clause type identification module includes：

Disaggregated model input submodule, is suitable to that the feature text sequence was input into by dividing that convolutional neural networks are trained In class model；

Disaggregated model recognizes submodule, be suitable in the disaggregated model based on the order of word in the english sentence, The subordinate clause type that the english sentence is included is recognized using the feature text sequence.

English sentence with English subordinate clause is set to the embodiment of the present invention into training sample and conversion is characterized text sequence Row, train the disaggregated model for recognizing English subordinate clause so that can be with automatic identification english sentence using this feature text sequence Comprising subordinate clause type, improve the information diversity of english sentence, reduce user manually by inquiring about other data pair English sentence is contrasted, and not only can reduce the time of cost, improves efficiency, and, in the situation less to the acquisition of knowledge The lower probability for reducing error.

English sentence conversion is characterized text sequence and is input into preset disaggregated model by the embodiment of the present invention, to recognize English The subordinate clause type that sentence is included, realizes the type of the subordinate clause that automatic identification english sentence is included, and improves english sentence Information diversity, reduce user manually by inquiry other data English sentence is contrasted, cost not only can be reduced Time, improve efficiency, and, in the probability for reducing error in the case of less to the acquisition of knowledge.

Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.

Description of the drawings

By the detailed description for reading hereafter preferred embodiment, various other advantages and benefit is common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred embodiment, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings：

The step of Fig. 1 shows a kind of recognition methods of english information according to an embodiment of the invention flow chart；

Fig. 2A-Fig. 2 E show a kind of identification operation example figure of english sentence according to an embodiment of the invention；

The step of Fig. 3 shows the recognition methods of another kind of english information according to an embodiment of the invention flow chart；

The step of Fig. 4 shows a kind of training method of the disaggregated model of English subordinate clause according to an embodiment of the invention Flow chart；

Fig. 5 shows a kind of identification exemplary plot of composition structure according to an embodiment of the invention；

Fig. 6 shows a kind of step of method that English subordinate clause is recognized based on disaggregated model according to an embodiment of the invention Rapid flow chart；

Fig. 7 shows a kind of structured flowchart of the identifying device of english information according to an embodiment of the invention；

Fig. 8 shows the structured flowchart of the identifying device of another kind of english information according to an embodiment of the invention；

Fig. 9 shows a kind of structure of the trainer of the disaggregated model of English subordinate clause according to an embodiment of the invention Block diagram；And

Figure 10 shows a kind of device that English subordinate clause is recognized based on disaggregated model according to an embodiment of the invention Structured flowchart.

Specific embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.

With reference to Fig. 1, flow process the step of show a kind of recognition methods of english information according to an embodiment of the invention Figure, specifically may include steps of：

Step 101, selection target view data.

In implementing, the embodiment of the present invention can apply in mobile terminal, for example, mobile phone, PDA (Personal Digital Assistant, personal digital assistant), laptop computer, palm PC etc., the embodiment of the present invention to this not It is any limitation as.

These mobile terminals can install the operating systems such as Windows, Android (Android), IOS, WindowsPhone, In these operating systems, English identification application can be installed, to carry out the identification of english information, English identification is using can be with For the system application in operating system, or third-party application.

In embodiments of the present invention, English identification application can be selected to record English letter according to the operational order of user The destination image data of breath, with pending identification.

In implementing, English identification application can selection target view data in the following way：

First, shoot.

In the manner, mobile terminal configuration has camera (camera), and as shown in Figure 2 A, user starts English identification should With after, the control of " take pictures and know sentence " is clicked at the interface of English identification application, eject menu bar as shown in Figure 2 B, Yong Huke To click on the control of " taking pictures ".

English identification application can call camera to gather preview image data according to the control of " taking pictures ".

By taking android system as an example, English identification application is formerly in manifest (the Java bags of application program) file Use of the statement to camera and other related feature (function, such as auto-focusing).

Intent (is intended to, such as used in the main activity (movable component) of English identification application MediaStore.ACTION_IMAGE_CAPTURE) notify that the built-in video camera application of operating system, video camera application pass through StartActivityForResult () method performs the intent of camera, and user will after being taken pictures using shooting Preview image data returns main activity, and the method for reception preview image data is added in main activity (such as OnActivityResult () method), the preview image data operation to returning.

Because english information may be less, in order to reduce the interference of other things, improve the accuracy of identification, can be pre- Look in view data and load preview pane, for example, as that shown in fig. 2 c four angles for white point rectangle, user can be by the way that adjust should The shape of preview pane, position, size so that english information includes the position of the preview pane, and excludes other things.

Certainly, user can also directly choose whole frame preview image data as destination image data, the embodiment of the present invention This is not any limitation as.

If user clicks on " √ " control as that shown in fig. 2 c, the preview image data in preview pane can be extracted, as Destination image data.

2nd, it is local to upload.

In the manner, as shown in Figure 2 A, user starts after English identification application, in the interface point of English identification application The control of " take pictures and know sentence " is hit, menu bar as shown in Figure 2 B is ejected, user can click on the control of " selecting from mobile phone photo album " Part, so as to select local view data.

English identification application can import locally stored view data, as target image number according to the selection of user According to.

Can be the view data of acquisition of formerly taking pictures it should be noted that the locally stored view data of mobile terminal, Can also be sectional drawing obtain view data, can also be other modes obtain view data, the embodiment of the present invention to this not It is any limitation as.

Certainly, the mode of above-mentioned selection target view data is intended only as example, when the embodiment of the present invention is implemented, can be with The mode of other selection target view data is set according to actual conditions, the embodiment of the present invention is not any limitation as to this.In addition, removing Outside the mode of above-mentioned selection target view data, those skilled in the art can also according to actual needs using other selection mesh The mode of logo image data, the embodiment of the present invention is not also any limitation as to this.

Step 102, recognizes english information from the destination image data, and splits out one or more english sentences.

For destination image data, can be by OCR (Optical Character Recognition, optical character knowledge English information) is not recognized from destination image data.

In this kind of mode, destination image data can be pre-processed, including binaryzation, noise remove, inclination compared with Just etc., to improve the precision of identification.

For the destination image data after pretreatment, character features can be extracted, generally include the following two kinds：

1st, the feature of statistics.For example, the black/white points ratio in character area, when word is distinguished into several regions, this One by one region black/white count than joint, just into a numerical value vector in space.

2nd, it is the feature of structure.For example, after word image graph thinning, obtain the stroke end points of word, the quantity in crosspoint and Position, or be characterized with stroke section.

The feature of extraction is compared with all English alphabets to be recognized of storage in database, from the ratio of theorem in Euclid space To method, relax the modes such as Comparison Method (Relaxation), dynamic routine Comparison Method (Dynamic Programming, DP), not Go out the corresponding English alphabet of this feature.

Hereafter, it is possible to use in the English alphabet and its possible similar candidates sub-block after comparison, identify according to before and after English alphabet find out most logical English alphabet, corrected.

In embodiments of the present invention, one or more english sentences may be included in destination image data, then can be based on The modes such as fullstop are recognized and split out each sentence.

In actual applications, in order to save the resource consumption of mobile terminal, the identification of english information, the fractionation of english sentence Can be performed by server.

Then in the manner, English identification application can send destination image data to server, and server passes through light Learn character recognition mode and recognize english information from destination image data, one or more English sentences are split out from english information Son, and return English identification application.

English identification is returned using the reception server, is recognized from destination image data by optical character identification mode English information, and one or more english sentences split out from english information.

As shown in Figure 2 D, due to when server carries out the identification of english information, the fractionation of english sentence needs to expend some Between, then show information such as " recognizing ... ", user waiting prompt in the interface of English identification application.

Certainly, the identification of english information, the fractionation of english sentence can also be performed by English identification application, and the present invention is implemented Example is not any limitation as to this.

Step 103, by the english sentence interactive elements that each word can be clicked are split into, and, recognize the English The clause factor of sentence.

In embodiments of the present invention, each word of composition english sentence can be split, generate to click afterwards Interactive elements, such as JSON (JavaScript Object Notatio plant the data interchange format of lightweight) data etc..

Each word can be generated in an independent interactive elements, the i.e. interactive elements by recording the forms such as word The word is represented, these interactive elements are distributed according to the distribution of word, complete english sentence can be constituted.

User can select one or more interactive elements and then select one or more words by modes such as clicks, with Just the word to selecting carries out the operation such as translating.

For example, as shown in Figure 2 E, for english sentence " The question whether it is right or Wrong depends on the result ", can respectively to " The ", " question ", " whether ", " it ", " is ", " right ", " or ", " wrong ", " depends ", " on ", " the ", " result " respectively generate the interactive elements that can be clicked.

Further, it is also possible to the English attribute in the clause factor of english sentence, the i.e. english sentence is recognized, to facilitate user Inquired about.

In embodiments of the present invention, the clause factor can include following one or more：

1st, sentence structure

The structure of English sentence can include following one or more：

1.1st, subject-predicate phrase, in this structure, predicate is intransitive verb, and for example, (he runs He runs quickly. Hurry up.)

1.2nd, main copular construction, in this structure, predicate is link-verb, for example, He is older than he Looks. (he is than looking old.)

1.3rd, SVO structure, in this structure, predicate is transitive verb, therefore has object, for example, I saw a film Yesterday. (I saw a film yesterday.)

1.4th, the double guest's structures of subject-predicate, in this structure, predicate is the transitive verb with double objects, for example, He gave (he gives me a book to me a book/a book to me..)

1.5th, SVO mends structure, and in this structure, predicate is the transitive verb for having object complement, for example, They (they make this girl angry to made the girl angry..)

2nd, subordinate clause type

Subordinate clause (Subordinate Clause) is that for main clause, i.e., in compound sentence, subordinate clause is subordinated to certain One main clause, and can not individually make a sentence, but with subject part and predicate part, guided by that, who, when etc. Word (Connective) is guided.

In English, mainly there is three kinds of subordinates clause, i.e., noun clause (including subject clause, object clause, predicative clause, Appositive clause), Adjective subordinate clause (i.e. attributive clause), adverbial subordinate clause (i.e. adverbial clause, including time, condition, knot Really, purpose, reason, concession, place, mode etc.).

Specifically：

2.1st, subject clause, the sentence that subject is used as in compound sentence is called subject clause.

For example, That he finished writing the composition in such a short time (he has just write this composition let us and has been taken aback surprised us all. in the so short time.)

2.2nd, object clause, the sentence that object is used as in compound sentence is called subject clause.

For example, Tell him which class you are in. (tell him you are in which class.)

2.3rd, predicative clause, the sentence that predicative is used as in compound sentence is called subject clause.

For example, China is no longer what she used to be. are (during the China of today is no longer past State.)

2.4th, appositive clause, is used as the sentence of appositive appositive clause in compound sentence.

For example, (I has heard that what our teams won disappears to I heard the news that our team had won. Breath.)

2.5th, attributive clause, is used as the sentence of attribute appositive clause in compound sentence.

For example, (missing dog have found The dog that/which was lost has been found..)

2.6th, adverbial clause, is used as the sentence of the adverbial modifier appositive clause in compound sentence.

For example, (I will not go ginseng to I will not go to her party if she doesn ' t invite me. Plus her party, if she does not invite me.)

In one embodiment of the invention, subordinate clause type can in the following way be recognized：

Sub-step S1031, determines english sentence to be identified；

Sub-step S1032, by the english sentence conversion text sequence is characterized；

Sub-step S1033, is input into preset disaggregated model, to recognize the english sentence institute by the feature text sequence Comprising subordinate clause type.

In embodiments of the present invention, due to sub-step S1031, sub-step S1032 and sub-step S1033 and step 501, step Rapid 502, the application basic simlarity of step 503, so description is fairly simple, related part is referring to step 501, step 502, step Rapid 503 part explanation, embodiment of the present invention here is not described in detail.

3rd, sentence tense

The tense of English sentence can include following one or more：

3.1st, present indefinite simple present, represents regular thing, regular action or general true.

For example, She doesn't often write to her family, only once a month. (she seldom Write home, only the envelope of January one.)

3.2nd, past idenfinite, can be used to be described in over the state of the action or presence occurred when certain, also may be used For representing the recurrent habitual action in the time in past section.

For example, (he has taken driving license last month to He got his driving license last month..)

3.3rd, future simple tense, can be used to the situation for describing the action that will occur or being present in future.

For example, (he arrives at here tonight He will arrive here this evening..)

3.4th, present progressive tense, can be used to describe " speak, write ought carving for article " occurent action, or " existing The action that stage " is being carried out always.

For example, (they are matching football to They are having a football match..)

3.5th, past progressive tense, can represent the action for occurring, carrying out on past certain time point.

For example, At this moment yesterday, I was packing for camp. (this when of yesterday, I Getting things together camping.)

3.6th, past perfect tense, represents that past perfect tense is represented and is having occurred and that in those years or before action or complete Into action.

For example, When I woke up, it had stopped raining. (when I wakes up, rain stop over.)

4th, part of speech

Part of speech is called part of speech, and function of the English word according to it in sentence can include following one or more：

4.1st, noun (noun, n.), for example, student (student).

4.2nd, pronoun (pronoun, pron.), for example, you (you).

4.3rd, adjective (adjective, adj.), for example, happy (glad).

4.4th, adverbial word (adverb, adv.), for example, quickly (promptly).

4.5th, verb (verb, v.), for example, cut (cuts, cuts).

4.6th, number (numeral, num.), for example, three (three).

4.7th, article (article, art.), for example, a ().

4.8th, preposition (preposition, prep.), for example, at ().

4.9th, conjunction (conjunction, conj.), for example, and (and).

4.10th, interjection (interjection, interj.), for example, oh ().

It should be noted that an English word might have multiple parts of speech, the part of speech in the embodiment of the present invention can refer to Part of speech of the English word in english sentence to be identified, can assist in identifying English word to be identified by contextual information English sentence in part of speech.

Certainly, the above-mentioned clause factor is intended only as example, when the embodiment of the present invention is implemented, can be set according to actual conditions Other clause factors are put, the embodiment of the present invention is not any limitation as to this.In addition, in addition to the above-mentioned clause factor, art technology Personnel can also according to actual needs adopt other clause factors, the embodiment of the present invention not also to be any limitation as this.

Because the data volume of the clause factor may be more, therefore, it can recognize in batches, show the clause factor, it is also possible to one Play identification, show the clause factor in batches, the embodiment of the present invention is not any limitation as to this.

For example, interface as shown in Figure 2 E, if user click on " clause analysis " control, can show sentence structure, Subordinate clause type, if user clicks on the control of " tense analysis ", can show sentence tense, if user clicks on " part of speech analysis " Control, then can show part of speech.

In actual applications, in order to save the resource consumption of mobile terminal, the fractionation of English word, the identification of the clause factor Can be performed by server.

Then in the manner, English identification application can send english sentence to server, and server is from english sentence Each word is split out, and, sentence structure, subordinate clause type, sentence tense, the word from english sentence identification is in english sentence In part of speech in one or more information, and return English identification application.

English identification is returned using the reception server, from each word that english sentence splits out, and, from English sentence One or more information in part of speech of the sentence structure, subordinate clause information, sentence tense, word of son identification in english sentence.

Hereafter, English identification is applied in interface, and with each word the interactive elements that can be clicked are generated.

Certainly, the fractionation of English word, the identification of the clause factor can also be performed by English identification application, and the present invention is implemented Example is not any limitation as to this.

The embodiment of the present invention recognizes english information from the destination image data for selecting, and splits out one or more English Sentence, by english sentence the interactive elements that each word can be clicked are split into, and, recognize the clause factor of english sentence, a side Face, user can pass through one or more words needed for selecting in interactive elements carries out the operation such as follow-up translation, the opposing party Face, the clause factor of automatic identification english sentence improves the information diversity of english sentence, reduces user manually by inquiry Other data are contrasted to English sentence, not only can reduce the time of cost, improve efficiency, and, to the acquisition of knowledge The probability of error is reduced in the case of less.

With reference to Fig. 3, flow the step of show the recognition methods of another kind of english information according to an embodiment of the invention Cheng Tu, specifically may include steps of：

Step 301, selection target view data.

Step 302, recognizes english information from the destination image data, and splits out one or more english sentences.

Step 303, by the english sentence interactive elements that each word can be clicked are split into, and, recognize the English The clause factor of sentence.

Step 304, selects one or more target english sentences from one or more of english sentences.

One or more of target english sentences are translated by step 305, obtain target-language information.

In embodiments of the present invention, user can be turned over from selection target english sentence in the English sentence for identifying Translate, the target-language information needed for obtaining, such as Chinese translation, Korean translation, Portugal language translation.

For example, as shown in Figure 2 E, for english sentence " The question whether it is right or Wrong depends on the result ", can translate into " problem is pair or wrong, depending on result ".

It should be noted that can be simple sentence translation, or many translations for English sentence.

In actual applications, in order to save the resource consumption of mobile terminal, the translation of target english sentence can be by servicing Device is performed.

Then in the manner, English identification application can send one or more target english sentences to server, clothes Business device by this, translated by one or more target english sentences, obtains target-language information, and returns English identification application.

English identification application receives what the server was returned, translates what one or more of target english sentences were obtained Target-language information.

Certainly, the translation of target english sentence can also be performed by English identification application, and the embodiment of the present invention is not added with to this To limit.

Step 306, the word selection target word based on the interactive elements from the english sentence.

Step 307, translates to the target word, obtains target-language information.

In embodiments of the present invention, user can be translated from selection target word in certain English sentence, obtain institute The target-language information for needing, such as Chinese translation, Korean translation, Portugal language translation.

For example, as shown in Figure 2 E, for english sentence " The question whether it is right or Wrong depends on the result ", user can click on selection " question ", " depends ", " on " as mesh Mark word, clicks on " turning over " control and is translated.

In actual applications, in order to save the resource consumption of mobile terminal, the translation of target word can be held by server OK.

Then in the manner, English identification application can send target word to server, and server is to the target list Word is translated, and obtains target-language information, and returns English identification application.

English identification is returned using the reception server, the target-language information that special translating purpose word is obtained.

Certainly, the translation of target word can also be performed by English identification application, and the embodiment of the present invention is not limited this System.

With reference to Fig. 4, a kind of training method of the disaggregated model of English subordinate clause according to an embodiment of the invention is shown The step of flow chart, specifically may include steps of：

Step 401, by the english sentence with English subordinate clause training sample is set to.

In embodiments of the present invention, English subordinate clause (Subordinate Clause) can be collected as the instruction of disaggregated model Practice sample.

So-called subordinate clause, is that for main clause, i.e., in compound sentence, subordinate clause is subordinated to some main clause, and can not Individually make a sentence, but with subject part and predicate part, drawn by the introducers such as that, who, when (Connective) Lead.

Specifically：

Subject clause, the sentence that subject is used as in compound sentence is called subject clause.

Object clause, the sentence that object is used as in compound sentence is called subject clause.

Predicative clause, the sentence that predicative is used as in compound sentence is called subject clause.

Appositive clause, is used as the sentence of appositive appositive clause in compound sentence.

Attributive clause, is used as the sentence of attribute appositive clause in compound sentence.

Adverbial clause, is used as the sentence of the adverbial modifier appositive clause in compound sentence.

Step 402, by the training sample conversion text sequence is characterized.

In implementing, can be with the feature of recognition training sample (i.e. English subordinate clause), with the feature replacement training sample (i.e. English subordinate clause), forms feature text sequence.

In one embodiment of the invention, step 402 can include following sub-step：

Sub-step S4021, recognizes the composition structure of the training sample；

Sub-step S4022, using the composition structure characteristic sequence text is formed.

In embodiments of the present invention, Stamford parser (stanford parser) can be pre-configured with, wherein, Stamford parser is a Lexical probability CFG analyzer, while also using dependency analysis.

By Stamford parser (stanford parser), training sample (i.e. English subordinate clause) can be carried out Interdependent syntactic analysis is done, the dependence of english sentence is exported.

Stamford parser (stanford parser) is used for natural language processing, mainly realizes following Function：

1) recognize and mark the part of speech of word in sentence；

2) the grammatical relation Stanford Dependencies in a sentence two-by-two between word is created；

3) syntactic structure of a sentence is obtained.

Furthermore, the Stamford parser (stanford parser) can provide the syntax of a sentence Analytic tree, and the part of speech and constituent of each word.

For English subordinate clause, English word itself do not have too many meaning, the composition structure of english sentence is strong feature, Therefore, the embodiment of the present invention can extract strong feature, remove useless feature.

In one example, as shown in figure 5, by Stamford parser (stanford parser) to English sentence Sub " The boy who is presenting the powerpoint is the most handsome man. " carry out according to Syntactic analysis is deposited, can be changed and be characterized text sequence " ROOT S NP DT NN SBAR WHNP WP S VP VBZ VP Wherein, ROOT represents that to process the sentence of text, NP represents noun to VBG NP DT JJ VP VBZ NP DT RBS JJ NN. " Phrase, DT (determiner) represent that determiner, NN represent major terms, etc..

In addition to the parser of Stamford, the composition structure of other modes recognition training sample can also be adopted, this Inventive embodiments are not any limitation as to this.

Step 403, the disaggregated model for recognizing English subordinate clause is trained using the feature text sequence.

In actual applications, can be trained using feature text sequence, to obtain use by the method for machine learning In the disaggregated model of the English subordinate clause of identification.

In one embodiment of the invention, step 403 can include following sub-step：

Sub-step S4031, the feature text sequence is input in convolutional neural networks；

Sub-step S4032, in the convolutional neural networks based on the order of word in the training sample, using described Feature text sequence trains the disaggregated model for recognizing English subordinate clause.

Convolutional neural networks (Convolutional Neural Network, CNN) are formula neutral nets of bursting before, energy Its topological structure is extracted from a two dimensional image, network structure is optimized using back-propagation algorithm, solved in network not Know parameter.

For natural language processing (Natural Language Processing, NLP), convolutional neural networks are input into No longer it is pixel, but the feature text sequence represented in forms such as matrixes, this matrix is the equal of a width " image ".

Convolutional neural networks classification when, it may be considered that in english sentence in word word order, so as to learn to English The sentence structure of language subordinate clause.

In implementing, convolutional neural networks structure includes：Convolutional layer, down-sampled layer, full linking layer.Each layer has many Individual characteristic pattern, each characteristic pattern extracts a kind of feature of input by a kind of convolution filter, and each characteristic pattern has multiple nerves Unit.

Convolutional layer：It is, by convolution algorithm, can to make that the reason for using convolutional layer is an important feature of convolution algorithm Original signal feature strengthens, and reduces noise.

Down-sampled layer：Using it is down-sampled the reason for be, according to the principle of image local correlation, sub-sampling to be carried out to image Amount of calculation can be reduced, while keeping image rotation consistency.

The purpose of sampling mainly obscures the particular location of feature, because after certain feature is found out, its particular location Inessential, we only need to this feature with other relative positions, such as one " 8 ", above we have obtained When one " o ", we require no knowledge about its particular location in image, it is only necessary to know below it and be one " o " we just It is known that be one ' 8' because in picture " 8 " in picture it is to the left or it is to the right do not affect us and recognize it, it is this Obscuring the strategy of particular location can be identified to the picture for deforming and distort.

Full articulamentum：Connected entirely using softmax, the picture that the activation value for obtaining i.e. convolutional neural networks are extracted is special Levy.

After having constructed convolutional neural networks, convolutional Neural is solved, training mainly includes four steps, this four step is divided For two stages：

First stage, forward propagation stage：

1) sample, is taken from sample set, convolutional Neural is input into；

2), corresponding reality output is calculated；In this stage, information, through conversion step by step, is sent to output from input layer Layer.

Second stage, back-propagation stage：

1) difference of reality output and corresponding preferable output, is calculated；

2), weight matrix is adjusted by the method for minimization error.

Furthermore, the training process of network is as follows：

(1), training group is selected, randomly seeks N number of sample respectively from sample set as training group；

(2), by each weights, threshold value, be set to it is little close to 0 random value, and initialize Accuracy Controlling Parameter and study Rate；

(3) input pattern, is taken from training group and is added to network, and provide its target output vector；

(4) intermediate layer output vector, is calculated, the reality output vector of network is calculated；

(5), the element in the element and object vector in output vector is compared, output error is calculated；For The hidden unit in intermediate layer is also required to calculate error；

(6) adjustment amount of each weights and the adjustment amount of threshold value, are calculated successively；

(7) weights and adjustment threshold value, are adjusted；

(8), after M is experienced, whether judge index meets required precision, if be unsatisfactory for, returns (3), continues iteration； If satisfaction is put into next step；

(9), training terminates, and weights and threshold value are preserved hereof.At this moment it is considered that each weights has reached surely Fixed, grader has been formed.It is trained again, directly derives weights from file and threshold value is trained, it is not necessary to carry out Initialization.

In addition to convolutional neural networks, the method that can also adopt other machines study is trained for recognizing English subordinate clause Disaggregated model, for example, SVM (Support Vector Machine, SVMs), adaboost etc., the present invention is real Apply example not to be any limitation as this.

With reference to Fig. 6, a kind of side that English subordinate clause is recognized based on disaggregated model according to an embodiment of the invention is shown The step of method flow chart, specifically may include steps of：

Step 601, determines english sentence to be identified.

In implementing, interface as shown in Figure 2 E, for some english sentence, if user clicks on " clause analysis " Control, then can be using the english sentence as english sentence to be identified, to recognize sentence structure, subordinate clause type.

Now, if the identification of the clause factor (including subordinate clause type) can be performed by server, server can connect The english sentence of English identification application upload is received as english sentence to be identified.

Certainly, if the identification of the clause factor (including subordinate clause type) can be performed by English identification application, English is known Ying Yong not be using the extracting directly english sentence as english sentence to be identified.

Additionally, in addition to aforesaid way, english sentence to be identified can also be determined using other modes, for example, use Family directly inputs english sentence to be identified, etc., and the embodiment of the present invention is not any limitation as to this.

Step 602, by the english sentence conversion text sequence is characterized.

In implementing, the feature of english sentence can be recognized, with the feature replacement english sentence, form feature text Sequence.

In one embodiment of the invention, step 602 can include following sub-step：

Sub-step S6021, recognizes the composition structure of the english sentence；

Sub-step S6022, using the composition structure characteristic sequence text is formed.

1) recognize and mark the part of speech of word in sentence；

3) syntactic structure of a sentence is obtained.

In addition to the parser of Stamford, other modes can also be adopted to recognize the composition structure of english sentence, this Inventive embodiments are not any limitation as to this.

Step 603, is input into preset disaggregated model, to recognize that the english sentence is included by the feature text sequence Subordinate clause type.

Using the embodiment of the present invention, can be by the method for machine learning, the feature text being converted into using training sample Sequence is trained, to obtain the disaggregated model for recognizing English subordinate clause.

In one embodiment of the invention, the disaggregated model can in the following way be trained：

Sub-step S6031, by the english sentence with English subordinate clause training sample is set to；

Sub-step S6032, by the training sample conversion text sequence is characterized；

Sub-step S6033, the disaggregated model for recognizing English subordinate clause is trained using the feature text sequence.

In embodiments of the present invention, due to sub-step S6031, sub-step S6032 and sub-step S6033 and step 401, step Rapid 402, the application basic simlarity of step 403, so description is fairly simple, related part is referring to step 401, step 402, step Rapid 403 part explanation, embodiment of the present invention here is not described in detail.

In implementing, feature text sequence can be input into the disaggregated model, to identify that the english sentence is wrapped The subordinate clause type for containing.

In one embodiment of the invention, step 603 can include following sub-step：

Sub-step S6034, the feature text sequence is input in the disaggregated model trained by convolutional neural networks；

Sub-step S6035, in the disaggregated model based on the order of word in the english sentence, using the feature Text sequence recognizes the subordinate clause type that the english sentence is included.

In embodiments of the present invention, disaggregated model is trained based on convolutional neural networks.

Convolutional neural networks classification when, it may be considered that in english sentence in word word order, so as to learn to English The sentence structure of language subordinate clause, so as to recognize the subordinate clause type that english sentence is included.

For embodiment of the method, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but this area Technical staff should know that the embodiment of the present invention is not limited by described sequence of movement, because according to present invention enforcement Example, some steps can adopt other orders or while carry out.Secondly, those skilled in the art also should know, specification Described in embodiment belong to preferred embodiment, necessary to the involved action not necessarily embodiment of the present invention.

With reference to Fig. 7, a kind of structured flowchart of the identifying device of english information according to an embodiment of the invention is shown, Specifically can include such as lower module：

Destination image data selecting module 701, is suitably selected for destination image data；

Sentence splits module 702, is suitable to from the destination image data to recognize english information, and splits out one or many Individual english sentence；

Sentence Attribute Recognition module 703, is suitable to for the english sentence to be split into the interactive elements that each word can be clicked, with And, recognize the clause factor of the english sentence.

In one embodiment of the invention, the destination image data selecting module 701 includes：

Preview image data gathers submodule, is suitable to call camera to gather preview image data；

Preview pane loads submodule, is suitable to load preview pane in the preview image data；

Preview image data extracting sub-module, is suitable to extract the preview image data in the preview pane, as target figure As data；

And/or,

View data imports submodule, is suitable to import locally stored view data, as destination image data.

In one embodiment of the invention, the sentence splits module 702 and includes：

Destination image data sending submodule, is suitable to the destination image data be sent to server；

Fractionation information receiving submodule, is suitable to receive what the server was returned, by optical character identification mode from institute State the english information of destination image data identification, and one or more english sentences split out from the english information.

In one embodiment of the invention, the sentence Attribute Recognition module 703 includes：

English sentence sending submodule, is suitable to the english sentence be sent to server；

Sentence attribute reception submodule, is suitable to receive what the server was returned, from the english sentence split out it is each Individual word, and, sentence structure, subordinate clause type, sentence tense, the word from english sentence identification is in the english sentence In part of speech in one or more information；

The interactive elements that can be clicked are generated with each word.

With reference to Fig. 8, the structural frames of the identifying device of another kind of english information according to an embodiment of the invention are shown Figure, specifically can include such as lower module：

Destination image data selecting module 801, is suitably selected for destination image data；

Sentence splits module 802, is suitable to from the destination image data to recognize english information, and splits out one or many Individual english sentence；

Sentence Attribute Recognition module 803, is suitable to for the english sentence to be split into the interactive elements that each word can be clicked, with And, recognize the clause factor of the english sentence.

Target english sentence selecting module 804, is suitable to select one or more from one or more of english sentences Target english sentence；

Target english sentence translation module 805, is suitable to translate one or more of target english sentences, obtains Target-language information.

Target word selecting module 806, is suitable to the word based on the interactive elements from the english sentence and selects mesh Mark word；

Target word translation module 807, is suitable to translate the target word, obtains target-language information.

In one embodiment of the invention, the target english sentence translation module 805 includes：

Target english sentence sending submodule, is suitable to one or more of target english sentences be sent to server；

Target english sentence translation information receiving submodule, is suitable to receive what the server was returned, translates one Or the target-language information that multiple target english sentences are obtained.

In one embodiment of the invention, the target word translation module 707 includes：

Target word sending submodule, is suitable to the target word be sent to server；

Target word translation information receiving submodule, is suitable to receive what the server was returned, translates the target word The target-language information of acquisition.

With reference to Fig. 9, a kind of trainer of the disaggregated model of English subordinate clause according to an embodiment of the invention is shown Structured flowchart, specifically can include such as lower module：

Training sample setup module 901, is suitable to for the english sentence with English subordinate clause to be set to training sample；

Training sample modular converter 902, is suitable to for training sample conversion to be characterized text sequence；

Disaggregated model training module 903, is suitable for use with the feature text sequence and trains for recognizing dividing for English subordinate clause Class model.

In one embodiment of the invention, the training sample modular converter 902 includes：

In one embodiment of the invention, the disaggregated model training module 903 includes：

With reference to Figure 10, a kind of dress that English subordinate clause is recognized based on disaggregated model according to an embodiment of the invention is shown The structured flowchart put, specifically can include such as lower module：

English sentence determining module 1001, is adapted to determine that english sentence to be identified；

English sentence modular converter 1002, is suitable to for english sentence conversion to be characterized text sequence；

Subordinate clause type identification module 1003, is suitable to for the feature text sequence to be input into preset disaggregated model, to recognize The subordinate clause type that the english sentence is included.

In one embodiment of the invention, the english sentence modular converter 1002 includes：

In one embodiment of the invention, the subordinate clause type identification module 1003 includes：

For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, it is related Part is illustrated referring to the part of embodiment of the method.

Provided herein algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment. Various general-purpose systems can also be used together based on teaching in this.As described above, construct required by this kind of system Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this Bright preferred forms.

In specification mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.

Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist Above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The more features of feature that the application claims ratio of shield is expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.

Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification is (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation Replace.

Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint One of meaning can in any combination mode using.

The present invention all parts embodiment can be realized with hardware, or with one or more processor operation Software module realize, or with combinations thereof realization.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) are realizing the instruction of the disaggregated model of English subordinate clause according to embodiments of the present invention Practice equipment, some or all functions of some or all parts in the equipment of English subordinate clause are recognized based on disaggregated model. The present invention is also implemented as some or all equipment or program of device for performing method as described herein (for example, computer program and computer program).Such program for realizing the present invention can be stored in computer-readable On medium, or there can be the form of one or more signal.Such signal can be downloaded from internet website Arrive, or provide on carrier signal, or provide in any other form.

It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design without departing from the scope of the appended claims alternative embodiment.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame Claim.

Claims

1. a kind of training method of the disaggregated model of English subordinate clause, including：

Training sample conversion is characterized into text sequence；

2. the method for claim 1, it is characterised in that described that training sample conversion is characterized into text sequence Step includes：

Recognize the composition structure of the training sample；

Characteristic sequence text is formed using the composition structure.

3. method as claimed in claim 1 or 2, it is characterised in that described to be trained for knowing using the feature text sequence The step of disaggregated model of not English subordinate clause, includes：

The feature text sequence is input in convolutional neural networks；

The order of word in the training sample is based in the convolutional neural networks, is trained using the feature text sequence For recognizing the disaggregated model of English subordinate clause.

4. a kind of method that English subordinate clause is recognized based on disaggregated model, including：

Determine english sentence to be identified；

English sentence conversion is characterized into text sequence；

The feature text sequence is input into into preset disaggregated model, to recognize the subordinate clause type that the english sentence is included.

5. method as claimed in claim 4, it is characterised in that described to be characterized text sequence from by english sentence conversion The step of include：

Recognize the composition structure of the english sentence；

Characteristic sequence text is formed using the composition structure.

6. the method as described in claim 5 or 6, it is characterised in that described that the feature text sequence is input into into preset point Class model, includes the step of to recognize subordinate clause type that the english sentence included：

Based on the order of word in the english sentence, described using feature text sequence identification in the disaggregated model The subordinate clause type that english sentence is included.

7. a kind of trainer of the disaggregated model of English subordinate clause, including：

Disaggregated model training module, is suitable for use with the feature text sequence and trains disaggregated model for recognizing English subordinate clause.

8. device as claimed in claim 7, it is characterised in that the training sample modular converter includes：

9. device as claimed in claim 7 or 8, it is characterised in that the disaggregated model training module includes：

Convolutional neural networks train submodule, be suitable in the convolutional neural networks based in the training sample word it is suitable Sequence, disaggregated model for recognizing English subordinate clause is trained using the feature text sequence.

10. a kind of device that English subordinate clause is recognized based on disaggregated model, including：

Subordinate clause type identification module, is suitable to for the feature text sequence to be input into preset disaggregated model, to recognize the English The subordinate clause type that sentence is included.