CN106293114A - The method and device of prediction user's word to be entered - Google Patents

The method and device of prediction user's word to be entered Download PDF

Info

Publication number
CN106293114A
CN106293114A CN201510296610.5A CN201510296610A CN106293114A CN 106293114 A CN106293114 A CN 106293114A CN 201510296610 A CN201510296610 A CN 201510296610A CN 106293114 A CN106293114 A CN 106293114A
Authority
CN
China
Prior art keywords
word
term vector
user
entered
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510296610.5A
Other languages
Chinese (zh)
Other versions
CN106293114B (en
Inventor
李齐周
操颖平
盛子夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510296610.5A priority Critical patent/CN106293114B/en
Publication of CN106293114A publication Critical patent/CN106293114A/en
Application granted granted Critical
Publication of CN106293114B publication Critical patent/CN106293114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a kind of method and device predicting user's word to be entered, including: according to user inputted term vector corresponding to word and in word table corresponding with term vector the term vector corresponding to multiple words of record, prediction user's word to be entered, and showing user's word to be entered of prediction so that user can directly select from the user's word to be entered shown and be actually subjected to input word.Achieve and combine context of co-text prediction user's word to be entered, improve the hit rate of user's word to be entered of prediction, such that it is able to effectively promote the input speed of input method.

Description

The method and device of prediction user's word to be entered
Technical field
The application relates to input method field, particularly relates to a kind of method and device predicting user's word to be entered.
Background technology
Traditional input method is receiving the part relevant character of user's word to be entered (such as portion by keyboard Divide phonetic or some stroke) time, it is the most relatively low to the hit rate of user's word to be entered of user's propelling movement, And only when receiving whole relevant character of user's word to be entered, could correctly push to user User's word to be entered, this have impact on the input speed of input method, and these whole relevant characters can be big Take to amount the memory source of computer.
Summary of the invention
The embodiment of the present application provides a kind of method and device predicting user's word to be entered, can effectively carry The input speed of high input method.
First aspect, it is provided that a kind of method predicting user's word to be entered, the method includes:
Reading the natural language of user's input from current statistic unit, described statistic unit is that user inputs Natural language in semantic unit between predetermined punctuation mark;
Described natural language is carried out natural language processing, has been inputted word;
The term vector that word is corresponding, wherein, institute's predicate has been inputted described in inquiry in word table corresponding with term vector Table corresponding with term vector for record include described in inputted multiple words of word and the plurality of word is corresponding Term vector;
Term vector corresponding to word and term vector corresponding to the plurality of word has been inputted, it was predicted that use according to described Family word to be entered;
Show user's word to be entered of prediction.
Second aspect, it is provided that a kind of device predicting user's word to be entered, this device includes: read single Unit, processing unit, query unit, predicting unit and display unit;
Described reading unit, for reading the natural language of user's input from current statistic unit, described Statistic unit is the semantic unit in the natural language that user inputs between predetermined punctuation mark;
Described processing unit, carries out natural language for the described natural language reading described reading unit Process, inputted word;
Described query unit, for inquiring about, in word table corresponding with term vector, the institute that described processing unit obtains Stating and input the term vector that word is corresponding, wherein, institute's predicate table corresponding with term vector includes described for record Multiple words of word and the term vector that the plurality of word is corresponding are inputted;
Described predicting unit, for according to described query unit inquire about described in inputted word corresponding to word to Amount and term vector corresponding to the plurality of word, it was predicted that user's word to be entered;
Described display unit, for showing user's word to be entered that described predicting unit is predicted.
The method and device of prediction user's word to be entered that the application provides, has inputted word corresponding according to user Term vector and in word table corresponding with term vector the term vector corresponding to multiple words of record, it was predicted that user Word to be entered, and show user's word to be entered of prediction so that user can be from the user's word to be entered shown In directly select be actually subjected to input word.Achieve and combine context of co-text prediction user's word to be entered, improve The hit rate of the user word to be entered of prediction, such that it is able to effectively promote the input speed of input method.
Accompanying drawing explanation
The method flow diagram of prediction user's word to be entered that Fig. 1 provides for a kind of embodiment of the application;
Fig. 2 is one of schematic diagram of user's word to be entered of prediction;
Fig. 3 is the two of the schematic diagram of user's word to be entered of prediction;
Fig. 4 is the three of the schematic diagram of user's word to be entered of prediction;
The device schematic diagram of prediction user's word to be entered that Fig. 5 provides for the application another kind embodiment;
The device schematic diagram of prediction user's word to be entered that Fig. 6 provides for the application another embodiment;
The device schematic diagram of prediction user's word to be entered that Fig. 7 provides for another embodiment of the application.
Detailed description of the invention
For making the purpose of the embodiment of the present application, technical scheme and advantage clearer, below in conjunction with this Shen Please accompanying drawing in embodiment, the technical scheme in the embodiment of the present application is clearly and completely described, Obviously, described embodiment is some embodiments of the present application rather than whole embodiments.Based on Embodiment in the application, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of the application protection.
For ease of the understanding to the embodiment of the present application, do further with specific embodiment below in conjunction with accompanying drawing Illustrating, embodiment is not intended that the restriction to the embodiment of the present application.
The method of prediction user's word to be entered that the embodiment of the present application provides can be combined with any input method.Root According to user inputted term vector corresponding to word and in word table corresponding with term vector record multiple words corresponding Term vector, it was predicted that user's word to be entered, and show user's word to be entered of prediction so that user can be from The user's word to be entered shown directly selects and is actually subjected to input word.Achieve and combine context of co-text prediction User's word to be entered, improves the hit rate of user's word to be entered of prediction, such that it is able to effectively promote defeated Enter the input speed of method.
Wherein, described term vector is the vector of regular length, and this length can arbitrarily set.In the present embodiment Word is mapped as term vector, as the basis carrying out quantum chemical method during predicting user's word to be entered.
Institute's predicate table corresponding with term vector be based on substantial amounts of specification text training obtain for preserve word with The set of term vector corresponding relation.The method of the prediction user's word to be entered namely performing the application needs instruction Get this word table corresponding with term vector.Training the process of this word table corresponding with term vector is to perform this pre- Complete before surveying the method for user's word to be entered.
The method flow diagram of prediction user's word to be entered that Fig. 1 provides for a kind of embodiment of the application.Described The executive agent of method can be the equipment with disposal ability: server or system or device, as Shown in Fig. 1, described method specifically may include that
Step 110, reads the natural language of user's input from current statistic unit, and described statistic unit is Semantic unit between predetermined punctuation mark in the natural language of user's input.
Alternatively, can also include training institute's predicate table corresponding with term vector before performing step 110 Step, can improve the word of record and term vector corresponding relation in word table corresponding with term vector by this step Accuracy, such that it is able to improve according to just inputting term vector prediction user's word to be entered corresponding to word Really property.
Specifically comprise the following steps that
Step A: choose sample from corpus, carries out natural language processing to described sample, obtains multiple Continuous print word.
It should be noted that described continuous print word, i.e. selected multiple words are at a statistic unit In.Described statistic unit refers to the semantic unit between predetermined punctuation mark.Described predetermined punctuation mark Including: comma, fullstop, question mark, branch, exclamation mark, ellipsis, pause mark, colon.
Sample in corpus can be the literary composition collected from Webpage in advance by server or client Word information, e.g., a large amount of articles on webpage, comment etc., in these articles, comment, each two is adjacent Pre-scaling point all forms a statistic unit between meeting.In one embodiment, can be such a Statistic unit is as a sample;Or can also be to be inputted from user in advance by server or client Word message in collect.Natural language processing includes standardization and word segmentation processing etc., for example, Standardization is exactly by unified for the English capitalization comprised in the sample small letter that is converted into, or will wrap in sample The complex form of Chinese characters contained is converted into simplified Chinese character etc. and processes.And a sample is split into multiple company by word segmentation processing exactly Continuous word, e.g., by sample " we need to administer city haze " after word segmentation processing, it is possible to Obtain multiple continuous print word: " we ", " needs ", " improvement ", " city " and " haze ".
Step B: the institute inquired about in described sample in addition to last word in institute's predicate table corresponding with term vector There is the term vector that other word is corresponding, table corresponding with term vector for institute's predicate do not record the word of term vector, It is then term vector corresponding to its random assortment.
In the present embodiment, it is referred to as training word by other words all in addition to last word in described sample, And be referred to as verifying word by last word in described sample.
Term vector is the vector for characterizing word, and it can include several dimensions, the most each dimension pair Answer the numerical value between one [-1,1].Each word is uniquely corresponding to a term vector, and each term vector is also It is uniquely corresponding to a word.
Word table corresponding with term vector is for recording multiple word and term vector corresponding to multiple word, the most Individual word can be the whole word or part obtained after the sample in above-mentioned corpus is done word segmentation processing Word;Time initial, the term vector that above-mentioned multiple words are corresponding is empty, does not i.e. protect in this word table corresponding with term vector Deposit the term vector that multiple word is corresponding, then can be term vector corresponding to multiple word random assortment.Such as, may be used From [-1,1], a number is randomly choosed for each dimension in the term vector that word each in multiple words is corresponding Value.
The dimension of term vector is not particularly limited by the embodiment of the present application, and the dimension of term vector is many, i.e. represents The length of term vector is long.When training, the length of term vector is the longest, then train the word and term vector obtained Corresponding relation the most accurate.
In previous example, after a sample in corpus is carried out natural language processing, obtain multiple Continuous print word: " we ", " needs ", " improvement ", " city " and " haze ", according to Above-mentioned definition understands, and training word includes: " we ", " needs ", " improvement ", " city ", And verify that word is " haze ".
The term vector that word is corresponding of training obtained by query word table corresponding with term vector is respectively as follows:
The term vector of " we " is [C11、C12、C13、…C1m];
The term vector " needed " is [C21、C22、C23、…C2m];
The term vector " administered " is [C31、C32、C33、…C3m];
The term vector in " city " is [C41、C42、C43、…C4m]。
Step C: the term vector that other words all in addition to last word in described sample are corresponding is carried out pre- Constant linear conversion or nonlinear transformation, obtain predicting term vector.
This step can obtain predicting term vector according to the term vector that training word is corresponding, concrete, can be to training The term vector that word is corresponding carries out predetermined linear conversion or nonlinear transformation obtains predicting term vector.
Such as previous example, it is assumed that the term vector of prediction can be expressed as [y1、y2、y3、…ym], wherein, y1= C11+C21+C31+C41, in like manner, y2=C12+C22+C32+C42..., ym=C1m+C2m+C3m+C4m, or Person, y 1 = C 11 2 + C 21 2 + C 31 2 + C 41 2 3 , In like manner, y 2 = C 12 2 + C 22 2 + C 32 2 + C 42 2 3 , ... , y m = C 1 m 2 + C 2 m 2 + C 3 m 2 + C 4 m 2 3 .
When calculating prediction term vector according to training term vector corresponding to word, for word corresponding to training word to The algorithm that each dimension of amount uses is identical, namely according to linear transformation method, then the most all Use this linear transformation method.According to non-linear transformation method, then the most all use nonlinear transformation side Method.
Step D: described prediction term vector is inputted voice training model, it is judged that the result of output whether with institute Stating last word to be consistent, wherein, described voice training model is predefined for the word according to input Vector draws the machine learning model of the word that this term vector is corresponding.
Voice training model is predefined for drawing, according to the term vector of input, the word that this term vector is corresponding Machine learning model.The effect of this voice training model is to realize determining according to the term vector of input and being somebody's turn to do The word that term vector is corresponding.The voice training model applied is not particularly limited by the embodiment of the present application.Should Voice training model can use degree of depth learning model, such as, word2vec or RNN (Recurrent Neural Network, Multi-Layer Feedback) etc..
After predicting term vector input voice training model, it is judged that whether the word of output is checking word, example As, will prediction term vector [y1、y2、y3、…ym] input in voice training model, it is judged that the word of output is No is " haze ".
If the word of output is consistent with the checking word in sample, then end operation, otherwise perform step E.
Step E: if not corresponding, then adjust word corresponding to described other words all in addition to last word to Amount, until output result is consistent with last word described.
Concrete, if the word of output does not corresponds with the checking word in sample, then adjust prediction term vector, directly Exporting, according to the prediction term vector after this adjustment, the word being consistent with checking word to voice training model.Root According to the term vector that the prediction term vector corresponding adjusting training word after adjusting is corresponding.Concrete adjustment amplitude with The predetermined linear conversion used or non-linear transformation method are correlated with.
Step F: utilize the term vector that other words all in addition to last word after adjusting are corresponding to update institute The term vector that described in predicate table corresponding with term vector, other words all in addition to last word are corresponding.
It is, the term vector that word is corresponding of training after adjusting updates in institute's predicate table corresponding with term vector In, this renewal specifically includes: the corresponding relation of training term vector corresponding to word after adjusting and training word The term vector that in substitute table corresponding with term vector, existing training word is corresponding and the corresponding relation training word.
It is understood that above-mentioned flow process is the processing procedure for a sample, hands-on process needs Utilizing substantial amounts of sample to repeat above-mentioned flow process, sample size is the most, then train the word and word obtained In the corresponding table of vector, word is the most accurate with term vector corresponding relation.The word related to below the application and term vector pair Answering table can be all to use said method to train the word table corresponding with term vector obtained.
Return in step 110, statistic unit be in the natural language that user inputs predetermined punctuation mark it Between semantic unit, the definition of predetermined punctuation mark is as described in step A.It is, the embodiment of the present application Can record that user in the statistic unit (i.e. current statistic unit) at user's input operation place inputted from So language.
Step 120, carries out natural language processing, has been inputted word described natural language.
Wherein, step 120 specifically may include that
Described natural language is carried out natural language processing, obtains n the word that user recently inputs, wherein, N is the number of the word that user has inputted and the smaller in predetermined number N, n and N in current statistic unit For positive integer.
Herein, natural language processing includes standardization and word segmentation processing etc..Due to user input from So language is the most nonstandard, as may be the natural language of input may not only included capitalizing but also including small letter, It is also possible that the situations such as the complex form of Chinese characters, accordingly, it would be desirable to be standardized natural language processing, in order to machine Device is capable of identify that.After being standardized natural language processing, in addition it is also necessary to carry out word segmentation processing, just N the word that user recently inputs can be obtained.Wherein, the user of record has inputted the number of word and can set Less than predetermined number N (N is positive integer).The user then needing record has inputted the number n (n of word For positive integer) be: in current statistic unit in the number of the word that user recently inputs and predetermined number N Smaller.
If, N is 5, then, after natural language is carried out natural language processing, inputted the individual of word When number is 4, then records these 4 and inputted word;And after natural language is carried out natural language processing, Inputted the number of word when being 6, the most only recorded front 5 words recently input, namely by record Current statistic unit in first word delete.If current statistic unit has inputted, such as, user Have input a comma, then start next statistic unit, start note from the first of next statistic unit word Record, in previous statistic unit, the word of record is deleted.
Step 130, has inputted, described in inquiry, the term vector that word is corresponding in word table corresponding with term vector, wherein, Institute's predicate table corresponding with term vector for record include described in inputted multiple words of word and the plurality of word Corresponding term vector.
Obtain user in the current statistic unit recorded by query word table corresponding with term vector and input word Corresponding term vector.Wherein, the definition such as step B institute of multiple words of record in word table corresponding with term vector State.It should be noted that word table corresponding with term vector herein is to use step A to the training of step F Method obtains, so, each word in multiple words is uniquely in word table corresponding with term vector for record A corresponding term vector.
Step 140, according to described inputted term vector corresponding to word and word corresponding to the plurality of word to Amount, it was predicted that user's word to be entered.
Wherein, step 140 specifically may include that
Step 1401, carries out predetermined linear conversion or non-linear change to described term vector corresponding to word that inputted Change, obtain target word vector.
It should be noted that the alternative approach herein used (predetermined linear conversion or nonlinear transformation) with The alternative approach used during training word table corresponding with term vector is consistent.
Step 1402, calculates term vector and described target word vector that in the plurality of word, each word is corresponding Similarity.
Step 1403, predicts user's word to be entered according to described similarity.
The embodiment of the present application is to calculating the word that described in word table corresponding with term vector, in multiple words, each word is corresponding The method of the similarity of vectorial and described target word vector is not particularly limited, and it can use existing arbitrary Plant the method for similarity between two vectors that calculates.
Wherein, similarity height represents that hit rate is the highest, i.e. user to input the probability of this word the highest.Then basis The method of described similarity prediction user's word to be entered can be: similarity is higher than one of predetermined threshold Or word corresponding to multiple term vector is defined as described user word to be entered;And/or, according to described similarity, The term vector that the plurality of word is corresponding is ranked up;One or more term vectors forward for sequence are corresponding Word be defined as described user word to be entered.It is, can be by similarity higher than the one of described predetermined threshold The word that individual or multiple term vector is corresponding all shows user;Or similarity is higher than the appointment of predetermined threshold The word that quantity term vector is corresponding shows user;Or according to similarity order from high to low to term vector After sequence, word corresponding for one or more term vectors forward for sequence is showed user.
Step 150, shows user's word to be entered of prediction.
When showing user's word to be entered of described prediction, can be somebody's turn to do according to the sequence of similarity order from high to low User's word to be entered of prediction.
The present embodiment achieves and combines context of co-text prediction user's word to be entered, and shown in Figure 2 is pre- The signal of user's word to be entered of the prediction shown in one of schematic diagram of user's word to be entered surveyed and Fig. 3 The two of figure, when including " we need to administer city " when user has inputted word in a statistic unit, The user's word to be entered then predicted by the method for the present embodiment is included: " haze ", " pollution ", " rubbish Rubbish " and " energy shortage " etc.;Include that when user has inputted word in a statistic unit " we need Plan a city " time, the user's word to be entered predicted by the method for the present embodiment is included: " construction ", " traffic ", " street " and " developing direction " etc..Visible, user's word to be entered of being predicted with The linguistic context of context is relevant.When showing user's word to be entered of prediction due to can not be by all pre- The user's word to be entered surveyed disposably is shown, therefore by the arrow to the left or backward in Fig. 2 and Fig. 3 Page turning or move left and right and show other user predicted word to be entered.
If the user of current presentation word to be entered contains user and is actually subjected to the word of input, then user is without again Input any out of Memory and just can directly select this word.If the user of current presentation word to be entered does not comprise User is actually subjected to the word of input, then can update the user of current presentation according to the out of Memory of user's input Word to be entered.Described out of Memory is the relevant character of the reality word to be entered of user's input, including: The actual part phonetic of word to be entered, the stroke of actual word to be entered or the katakana etc. of actual word to be entered Deng.Wherein, according to the out of Memory of user's input, the user's word to be entered updating current presentation includes:
Receiving the out of Memory of user's input, according to described out of Memory, the user updating described prediction treats The displaying order of input word;And/or,
Receive the out of Memory of user's input, according to described out of Memory, treat defeated from the user of described prediction Enter word filters out targeted customer's word to be entered, show described targeted customer word to be entered.
As a example by described out of Memory is for the first letter of pinyin of actual word to be entered, if, initial presentation User's word to be entered of prediction is as in figure 2 it is shown, when user's input Pinyin " s ", then the user after updating treats Input word is as shown in Figure 4.Due to the first letter of pinyin of " water pollution " in the user's word to be entered shown For " s ", the first letter of pinyin of other user word to be entered is non-" s ", therefore by word " water pollution " Sequence in advance, the sequence of the user's word to be entered being namely consistent by the out of Memory inputted with user carries Before.Or directly user's word to be entered that the out of Memory inputted with user is not inconsistent is hidden, and will with User's word to be entered screening that the out of Memory of family input is consistent is targeted customer's word to be entered, e.g., permissible " water pollution ", " appearance of the city ", " city's looks " and " trees " in Fig. 4 is screened as targeted customer Word to be entered, and show this targeted customer word to be entered.If so actual word to be entered of user is that " water is dirty Dye " time, then can directly select this word after inputting this first letter of pinyin " s ", and without input Whole phonetics of this word, improve input speed.
The application provide prediction user's word to be entered method, inputted according to user word corresponding to word to Amount and the term vector that multiple words of record are corresponding in word table corresponding with term vector, it was predicted that user is to be entered Word, and show user's word to be entered of prediction so that user can from the user's word to be entered shown directly Selection is actually subjected to input word.Achieve and combine context of co-text prediction user's word to be entered, improve prediction The hit rate of user's word to be entered, such that it is able to effectively promote the input speed of input method.
With the method for above-mentioned prediction user word to be entered accordingly, what the embodiment of the present application also provided for is a kind of pre- Survey the device of user's word to be entered, as it is shown in figure 5, this device may be disposed at any one input method existing In system, combine in input process and inputted word prediction user's word to be entered.This device includes: read Take unit 501, processing unit 502, query unit 503, predicting unit 504 and display unit 505.
Read unit 501, for reading the natural language of user's input from current statistic unit, described Statistic unit is the semantic unit in the natural language that user inputs between predetermined punctuation mark.
Statistic unit is the semantic unit in the natural language that user inputs between predetermined punctuation mark, in advance The definition of scaling point symbol is as described in step A.It is, the embodiment of the present application can record user inputs behaviour Make the natural language that in the statistic unit (i.e. current statistic unit) at place, user has inputted.
Processing unit 502, for carrying out at natural language the described natural language reading unit 501 reading Reason, has been inputted word.
Processing unit 502 specifically for: described natural language is carried out natural language processing, obtains user N the word recently input, wherein, n be in current statistic unit the number of the word that user has inputted with predetermined Smaller in number N, n and N is positive integer.
Herein, natural language processing includes standardization and word segmentation processing etc..Due to user input from So language is the most nonstandard, as may be the natural language of input may not only included capitalizing but also including small letter, It is also possible that the situations such as the complex form of Chinese characters, accordingly, it would be desirable to be standardized natural language processing, in order to machine Device is capable of identify that.After being standardized natural language processing, in addition it is also necessary to carry out word segmentation processing, just N the word that user recently inputs can be obtained.Wherein, the user of record has inputted the number of word and can set Less than predetermined number N (N is positive integer).The user then needing record has inputted the number n (n of word For positive integer) be: in current statistic unit in the number of the word that user recently inputs and predetermined number N Smaller.
If, N is 5, then, after natural language is carried out natural language processing, inputted the individual of word When number is 4, then records these 4 and inputted word;And after natural language is carried out natural language processing, Inputted the number of word when being 6, the most only recorded front 5 words recently input, namely by record Current statistic unit in first word delete.If current statistic unit has inputted, such as, user Have input a comma, then start next statistic unit, start note from the first of next statistic unit word Record, in previous statistic unit, the word of record is deleted.
Query unit 503, described in word table corresponding with term vector, query processing unit 502 obtains Inputted the term vector that word is corresponding, wherein, institute's predicate table corresponding with term vector for record include described in Input multiple words of word and the term vector that the plurality of word is corresponding.
Obtain user in the current statistic unit recorded by query word table corresponding with term vector and input word Corresponding term vector.Wherein, the definition such as step B institute of multiple words of record in word table corresponding with term vector State.It should be noted that word table corresponding with term vector herein is to use step A to the training of step F Method obtains, so, each word in multiple words is uniquely in word table corresponding with term vector for record A corresponding term vector.
Predicting unit 504, for having inputted, according to query unit 503 inquiry, the term vector that word is corresponding And the term vector that the plurality of word is corresponding, it was predicted that user's word to be entered.
Predicting unit 504 specifically for: described term vector corresponding to word that inputted is carried out predetermined linear change Change or nonlinear transformation, obtain target word vector;
Calculate term vector and the similarity of described target word vector that in the plurality of word, each word is corresponding;
User's word to be entered is predicted according to described similarity.
It should be noted that the alternative approach herein used (predetermined linear conversion or nonlinear transformation) with The alternative approach used during training word table corresponding with term vector is consistent.
The embodiment of the present application is to calculating the word that described in word table corresponding with term vector, in multiple words, each word is corresponding The method of the similarity of vectorial and described target word vector is not particularly limited, and it can use existing arbitrary Plant the method for similarity between two vectors that calculates.
Wherein, similarity height represents that hit rate is the highest, i.e. user to input the probability of this word the highest.
Alternatively, it was predicted that unit 504 also particularly useful for:
Word corresponding higher than one or more term vectors of predetermined threshold for similarity is defined as described user treat Input word;And/or,
According to described similarity, the term vector that the plurality of word is corresponding is ranked up;
Word corresponding for one or more term vectors forward for sequence is defined as described user word to be entered.
It is, can be complete by word corresponding higher than one or more term vectors of described predetermined threshold for similarity Portion shows user;Or word corresponding higher than specified quantity the term vector of predetermined threshold for similarity is shown To user;Or after term vector being sorted according to similarity order from high to low, by forward for sequence one Or word corresponding to multiple term vector shows user.
Display unit 505, for showing user's word to be entered that predicting unit 504 is predicted.
When showing user's word to be entered of described prediction, can be somebody's turn to do according to the sequence of similarity order from high to low User's word to be entered of prediction.
The present embodiment achieves and combines context of co-text prediction user's word to be entered, and shown in Figure 2 is pre- The signal of user's word to be entered of the prediction shown in one of schematic diagram of user's word to be entered surveyed and Fig. 3 The two of figure, when including " we need to administer city " when user has inputted word in a statistic unit, The user's word to be entered then predicted by the method for the present embodiment is included: " haze ", " pollution ", " rubbish Rubbish " and " energy shortage " etc.;Include that when user has inputted word in a statistic unit " we need Plan a city " time, the user's word to be entered predicted by the method for the present embodiment is included: " construction ", " traffic ", " street " and " developing direction " etc..Visible, user's word to be entered of being predicted with The linguistic context of context is relevant.When showing user's word to be entered of prediction due to can not be by all pre- The user's word to be entered surveyed disposably is shown, therefore by the arrow to the left or backward in Fig. 2 and Fig. 3 Page turning or move left and right and show other user predicted word to be entered.
If the user of current presentation word to be entered contains user and is actually subjected to the word of input, then user is without again Input any out of Memory and just can directly select this word.If the user of current presentation word to be entered does not comprise User is actually subjected to the word of input, then can update the user of current presentation according to the out of Memory of user's input Word to be entered.Described out of Memory is the relevant character of the reality word to be entered of user's input, including: The actual part phonetic of word to be entered, the stroke of actual word to be entered or the katakana etc. of actual word to be entered Deng.
As shown in Figure 6, device described in another kind of embodiment can also include: receives unit 506, is used for connecing Receive the out of Memory of user's input, according to described out of Memory, update user's word to be entered of described prediction Displaying order;And/or,
Receive the out of Memory of user's input, according to described out of Memory, treat defeated from the user of described prediction Enter word filters out targeted customer's word to be entered, show described targeted customer word to be entered.
As a example by described out of Memory is for the first letter of pinyin of actual word to be entered, if, initial presentation User's word to be entered of prediction is as in figure 2 it is shown, when user's input Pinyin " s ", then the user after updating treats Input word is as shown in Figure 4.Due to the first letter of pinyin of " water pollution " in the user's word to be entered shown For " s ", the first letter of pinyin of other user word to be entered is non-" s ", therefore by word " water pollution " Sequence in advance, the sequence of the user's word to be entered being namely consistent by the out of Memory inputted with user carries Before.Or directly user's word to be entered that the out of Memory inputted with user is not inconsistent is hidden, and will with User's word to be entered screening that the out of Memory of family input is consistent is targeted customer's word to be entered, e.g., permissible " water pollution ", " appearance of the city ", " city's looks " and " trees " in Fig. 4 is screened as targeted customer Word to be entered, and show this targeted customer word to be entered.If so actual word to be entered of user is that " water is dirty Dye " time, then can directly select this word after inputting this first letter of pinyin " s ", and without input Whole phonetics of this word, improve input speed.
As it is shown in fig. 7, device described in another kind of embodiment can also include: training unit 507, for weight Perform following process again:
From corpus, choose sample, described sample is carried out natural language processing, obtain multiple continuous print Word;
Institute's predicate table corresponding with term vector is inquired about in described sample in addition to last word all other The term vector that word is corresponding, does not records the word of term vector in table corresponding with term vector for institute's predicate, then be it The term vector that random assortment is corresponding;
The term vector that other words all in addition to last word in described sample are corresponding is carried out predetermined linear Conversion or nonlinear transformation, obtain predicting term vector;
Described prediction term vector is inputted voice training model, it is judged that the result of output whether with described finally One word is consistent, and wherein, described voice training model is predefined for obtaining according to the term vector of input Go out the machine learning model of word corresponding to this term vector;
If not corresponding, then adjust the term vector that described other words all in addition to last word are corresponding, directly Being consistent to output result with last word described;
Utilize term vector corresponding to other words all in addition to last word after adjusting update institute's predicate with The term vector that described in term vector correspondence table, other words all in addition to last word are corresponding.
The function of each functional module of the embodiment of the present application device, can pass through each of said method embodiment Step realizes, therefore, the specific works process of the device that the application provides, repeat the most again at this.
The device of prediction user's word to be entered that the application provides, reads unit 501 from current statistic unit The middle natural language reading user's input, described statistic unit is predetermined in the natural language that user inputs Semantic unit between punctuation mark;Processing unit 502 carries out natural language processing to described natural language, Inputted word;Query unit 503 inquire about in word table corresponding with term vector described in have inputted word corresponding Term vector, wherein, institute's predicate table corresponding with term vector for record include described in inputted the multiple of word Word and term vector corresponding to the plurality of word;The institute that predicting unit 504 is inquired about according to query unit 503 State and input term vector corresponding to word and term vector corresponding to the plurality of word, it was predicted that user's word to be entered; Display unit 505 shows user's word to be entered of prediction.Achieve combine context of co-text prediction user treat Input word, improves the hit rate of user's word to be entered of prediction, such that it is able to effectively promote input method Input speed.
Professional should further appreciate that, describes in conjunction with the embodiments described herein The object of each example and algorithm steps, it is possible to come with electronic hardware, computer software or the combination of the two Realize, in order to clearly demonstrate the interchangeability of hardware and software, the most according to function Generally describe composition and the step of each example.These functions are come with hardware or software mode actually Perform, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to often Individual specifically should being used for uses different methods to realize described function, but this realization it is not considered that Beyond scope of the present application.
The method described in conjunction with the embodiments described herein or the step of algorithm can use hardware, process The software module that device performs, or the combination of the two implements.Software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable ROM, Other form any well known in depositor, hard disk, moveable magnetic disc, CD-ROM or technical field Storage medium in.
Above-described detailed description of the invention, is carried out purpose, technical scheme and the beneficial effect of the application Further describe, be it should be understood that the foregoing is only the application detailed description of the invention and , it is not used to limit the protection domain of the application, all within spirit herein and principle, done Any modification, equivalent substitution and improvement etc., within should be included in the protection domain of the application.

Claims (12)

1. the method predicting user's word to be entered, it is characterised in that described method includes:
Reading the natural language of user's input from current statistic unit, described statistic unit is that user inputs Natural language in semantic unit between predetermined punctuation mark;
Described natural language is carried out natural language processing, has been inputted word;
In word table corresponding with term vector, inputted, described in inquiry, the term vector that word is corresponding, wherein, institute's predicate with Term vector correspondence table for record include described in inputted multiple words of word and the word that the plurality of word is corresponding Vector;
Term vector corresponding to word and term vector corresponding to the plurality of word has been inputted, it was predicted that user according to described Word to be entered;
Show user's word to be entered of prediction.
Method the most according to claim 1, it is characterised in that described described natural language is carried out Natural language processing, has been inputted word and has been included:
Described natural language is carried out natural language processing, obtains n the word that user recently inputs, wherein, N is the number of the word that user has inputted and the smaller in predetermined number N, n and N in current statistic unit For positive integer.
Method the most according to claim 1, it is characterised in that inputted word pair described in described basis The term vector answered and term vector corresponding to the plurality of word, it was predicted that user's word to be entered includes:
Described term vector corresponding to word that inputted is carried out predetermined linear conversion or nonlinear transformation, obtains mesh Mark term vector;
Calculate term vector and the similarity of described target word vector that in the plurality of word, each word is corresponding;
User's word to be entered is predicted according to described similarity.
Method the most according to claim 3, it is characterised in that described predict according to described similarity User's word to be entered includes:
Word corresponding higher than one or more term vectors of predetermined threshold for similarity is defined as described user treat Input word;And/or,
According to described similarity, the term vector that the plurality of word is corresponding is ranked up;
Word corresponding for one or more term vectors forward for sequence is defined as described user word to be entered.
5. according to the method described in any one of claim 1-4, it is characterised in that described method also includes:
Receiving the out of Memory of user's input, according to described out of Memory, the user updating described prediction treats The displaying order of input word;And/or,
Receive the out of Memory of user's input, according to described out of Memory, treat defeated from the user of described prediction Enter word filters out targeted customer's word to be entered, show described targeted customer word to be entered.
6. according to the method described in any one of claim 1-4, it is characterised in that described method also includes: The step of training institute predicate table corresponding with term vector, specifically includes:
Repeat following process:
From corpus, choose sample, described sample is carried out natural language processing, obtain multiple continuous print Word;
Institute's predicate table corresponding with term vector is inquired about in described sample in addition to last word all other The term vector that word is corresponding, does not records the word of term vector in table corresponding with term vector for institute's predicate, then be it The term vector that random assortment is corresponding;
The term vector that other words all in addition to last word in described sample are corresponding is carried out predetermined linear Conversion or nonlinear transformation, obtain predicting term vector;
Described prediction term vector is inputted voice training model, it is judged that the result of output whether with described finally One word is consistent, and wherein, described voice training model is predefined for obtaining according to the term vector of input Go out the machine learning model of word corresponding to this term vector;
If not corresponding, then adjust the term vector that described other words all in addition to last word are corresponding, directly Being consistent to output result with last word described;
Utilize term vector corresponding to other words all in addition to last word after adjusting update institute's predicate with The term vector that described in term vector correspondence table, other words all in addition to last word are corresponding.
7. the device predicting user's word to be entered, it is characterised in that described device includes: read single Unit, processing unit, query unit, predicting unit and display unit;
Described reading unit, for reading the natural language of user's input from current statistic unit, described Statistic unit is the semantic unit in the natural language that user inputs between predetermined punctuation mark;
Described processing unit, is carried out at natural language for the described natural language reading described reading unit Reason, has been inputted word;
Described query unit, for inquiring about described in described processing unit obtains in word table corresponding with term vector Inputted the term vector that word is corresponding, wherein, institute's predicate table corresponding with term vector for record include described in Input multiple words of word and the term vector that the plurality of word is corresponding;
Described predicting unit, has inputted, described in inquiring about according to described query unit, the term vector that word is corresponding And the term vector that the plurality of word is corresponding, it was predicted that user's word to be entered;
Described display unit, for showing user's word to be entered that described predicting unit is predicted.
Device the most according to claim 7, it is characterised in that described processing unit specifically for:
Described natural language is carried out natural language processing, obtains n the word that user recently inputs, wherein, N is the number of the word that user has inputted and the smaller in predetermined number N, n and N in current statistic unit For positive integer.
Device the most according to claim 7, it is characterised in that described predicting unit specifically for:
Described term vector corresponding to word that inputted is carried out predetermined linear conversion or nonlinear transformation, obtains mesh Mark term vector;
Calculate term vector and the similarity of described target word vector that in the plurality of word, each word is corresponding;
User's word to be entered is predicted according to described similarity.
Device the most according to claim 9, it is characterised in that described predicting unit is the most specifically used In:
Word corresponding higher than one or more term vectors of predetermined threshold for similarity is defined as described user treat Input word;And/or,
According to described similarity, the term vector that the plurality of word is corresponding is ranked up;
Word corresponding for one or more term vectors forward for sequence is defined as described user word to be entered.
11. according to the device described in any one of claim 7-10, it is characterised in that described device also wraps Include: receive unit, for receiving the out of Memory of user's input, according to described out of Memory, update institute State the displaying order of user's word to be entered of prediction;And/or,
Receive the out of Memory of user's input, according to described out of Memory, treat defeated from the user of described prediction Enter word filters out targeted customer's word to be entered, show described targeted customer word to be entered.
12. according to the device described in any one of claim 7-10, it is characterised in that described device also wraps Include: training unit, be used for repeating following process:
From corpus, choose sample, described sample is carried out natural language processing, obtain multiple continuous print Word;
Institute's predicate table corresponding with term vector is inquired about in described sample in addition to last word all other The term vector that word is corresponding, does not records the word of term vector in table corresponding with term vector for institute's predicate, then be it The term vector that random assortment is corresponding;
The term vector that other words all in addition to last word in described sample are corresponding is carried out predetermined linear Conversion or nonlinear transformation, obtain predicting term vector;
Described prediction term vector is inputted voice training model, it is judged that the result of output whether with described finally One word is consistent, and wherein, described voice training model is predefined for obtaining according to the term vector of input Go out the machine learning model of word corresponding to this term vector;
If not corresponding, then adjust the term vector that described other words all in addition to last word are corresponding, directly Being consistent to output result with last word described;
Utilize term vector corresponding to other words all in addition to last word after adjusting update institute's predicate with The term vector that described in term vector correspondence table, other words all in addition to last word are corresponding.
CN201510296610.5A 2015-06-02 2015-06-02 Predict the method and device of user's word to be entered Active CN106293114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510296610.5A CN106293114B (en) 2015-06-02 2015-06-02 Predict the method and device of user's word to be entered

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510296610.5A CN106293114B (en) 2015-06-02 2015-06-02 Predict the method and device of user's word to be entered

Publications (2)

Publication Number Publication Date
CN106293114A true CN106293114A (en) 2017-01-04
CN106293114B CN106293114B (en) 2019-03-29

Family

ID=57655704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510296610.5A Active CN106293114B (en) 2015-06-02 2015-06-02 Predict the method and device of user's word to be entered

Country Status (1)

Country Link
CN (1) CN106293114B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107621891A (en) * 2017-09-28 2018-01-23 北京新美互通科技有限公司 A kind of text entry method, device and electronic equipment
CN109656385A (en) * 2018-12-28 2019-04-19 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432722A (en) * 2006-04-21 2009-05-13 泰吉克通讯股份有限公司 Contextual prediction of user words and user actions
US20110167340A1 (en) * 2010-01-06 2011-07-07 Bradford Allen Moore System and Method for Issuing Commands to Applications Based on Contextual Information
CN102253929A (en) * 2011-06-03 2011-11-23 北京搜狗科技发展有限公司 Method and device for prompting user to input characters
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432722A (en) * 2006-04-21 2009-05-13 泰吉克通讯股份有限公司 Contextual prediction of user words and user actions
US20110167340A1 (en) * 2010-01-06 2011-07-07 Bradford Allen Moore System and Method for Issuing Commands to Applications Based on Contextual Information
CN102253929A (en) * 2011-06-03 2011-11-23 北京搜狗科技发展有限公司 Method and device for prompting user to input characters
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
代贤俊: "面向写作辅助的中文智能输入法***", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107621891A (en) * 2017-09-28 2018-01-23 北京新美互通科技有限公司 A kind of text entry method, device and electronic equipment
CN109656385A (en) * 2018-12-28 2019-04-19 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment

Also Published As

Publication number Publication date
CN106293114B (en) 2019-03-29

Similar Documents

Publication Publication Date Title
Newman et al. Wicked tendencies in policy problems: Rethinking the distinction between social and technical problems
CN111309759B (en) Intelligent matching platform for enterprise science and technology projects
Jiang et al. Assessment of online public opinions on large infrastructure projects: A case study of the Three Gorges Project in China
Licklider et al. Libraries of the Future
Franzosi Quantitative narrative analysis
CN106202010A (en) The method and apparatus building Law Text syntax tree based on deep neural network
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN107515855A (en) The microblog emotional analysis method and system of a kind of combination emoticon
Bell et al. Coal, injustice, and environmental destruction: Introduction to the special issue on coal and the environment
CN109344234A (en) Machine reads understanding method, device, computer equipment and storage medium
Del Moral et al. On the concentration properties of interacting particle processes
CN106682089A (en) RNNs-based method for automatic safety checking of short message
Baele et al. A diachronic cross-platforms analysis of violent extremist language in the incel online ecosystem
CN110196945A (en) A kind of microblog users age prediction technique merged based on LSTM with LeNet
CN115392237A (en) Emotion analysis model training method, device, equipment and storage medium
CN111221881A (en) User characteristic data synthesis method and device and electronic equipment
CN106293114A (en) The method and device of prediction user's word to be entered
Lowry Three years and counting—the economic crisis is still with us
Bichler et al. The scientist and the church
Bai et al. Gated character-aware convolutional neural network for effective automated essay scoring
CN106126606A (en) A kind of short text new word discovery method
Gupta et al. Real-time sentiment analysis of tweets: A case study of Punjab elections
Hill et al. How to turn around a failing school
Adel et al. Unravelling technology meta-landscapes: A patent analytics approach to assess trajectories and fragmentation
Månsson et al. News text generation with adversarial deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right