CN102915299B - Word segmentation method and device - Google Patents

Word segmentation method and device Download PDF

Info

Publication number
CN102915299B
CN102915299B CN201210407529.6A CN201210407529A CN102915299B CN 102915299 B CN102915299 B CN 102915299B CN 201210407529 A CN201210407529 A CN 201210407529A CN 102915299 B CN102915299 B CN 102915299B
Authority
CN
China
Prior art keywords
matching result
value
phrase
numerical value
dictionary storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210407529.6A
Other languages
Chinese (zh)
Other versions
CN102915299A (en
Inventor
李成华
王勇进
王峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Original Assignee
Hisense Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Co Ltd filed Critical Hisense Group Co Ltd
Priority to CN201510179584.8A priority Critical patent/CN104765838A/en
Priority to CN201510179858.3A priority patent/CN104765724A/en
Priority to CN201210407529.6A priority patent/CN102915299B/en
Publication of CN102915299A publication Critical patent/CN102915299A/en
Application granted granted Critical
Publication of CN102915299B publication Critical patent/CN102915299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a word segmentation method, which is used for improving word segmentation accuracy. The method comprises the following steps of: acquiring a character string to be processed; matching the character string to be processed with a universal dictionary library according to a forward maximum matching method, thus obtaining a first matching result; matching the character string to be processed with the universal dictionary library according to a reverse maximum matching method, thus obtaining a second matching result; and judging whether the first matching result is consistent with the second matching result, and if so, outputting the first matching result or the second matching result to serve as a word segmentation result. The invention also discloses a device for implementing the method.

Description

A kind of segmenting method and device
Technical field
The present invention relates to participle field, particularly a kind of segmenting method and device.
Background technology
Along with the universal of network and the maturation of electronic technology, televisor is made progressively to trend towards " high Qinghua ", " networking ", " intellectuality ".
Carry out video request program search by internet and become demand main in intelligent television and application.And from internet mass video, the video content that user wants to see will be searched out exactly, just need effectively to extract text message, therefore, how effectively to extract the major issue that text message also just becomes information retrieval field.Chinese word segmentation as information processing and retrieval a major technique and be subject to extensive concern, particularly require more and more higher to participle in the different application of different field, can say that the quality of participle technique has also directly had influence on the result of information processing and retrieval.
Have multiple segmenting method in prior art, wherein based on the segmenting method of character string because comparatively simple and more common.
The existing segmenting method based on character string probably can comprise Forward Maximum Method method and reverse maximum matching method.A kind of segmenting method based on character string is such as had mainly to adopt Forward Maximum Method method or reverse maximum matching method to carry out mechanical Chinese word segmentation process to needing the character string of participle, to unidentified go out individual character achieve the participle identification of place name and street name, its object is to identify place name, street name etc., expanded ground thesaurus.
Present inventor, in the process realizing the embodiment of the present application technical scheme, at least finds to there is following technical matters in prior art:
1, existing Words partition system only adopts a kind of segmenting method (Forward Maximum Method method or reverse maximum matching method) to carry out participle, and participle process is comparatively coarse, causes the word segmentation result that obtains not accurate enough, reduces word segmentation accuracy;
2, existing segmenting method only relates to the participle in place name field, and the character string for other field still cannot effectively identify.
Summary of the invention
The embodiment of the present invention provides a kind of segmenting method and device, for solving the technical matters that in prior art, word segmentation accuracy is not high, achieves the technique effect improving word segmentation accuracy.
An aspect of of the present present invention, provides a kind of segmenting method, comprises the following steps:
Obtain pending character string;
According to Forward Maximum Method method, described pending character string is mated with universaling dictionary storehouse, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result, wherein, the first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to the first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result, the individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result,
Judge that whether described first numerical value is equal with described second value;
When described first numerical value is equal with described second value, judge whether described third value is greater than described 4th numerical value, wherein, in described first matching result, include the individual character that third value is individual, in described second matching result, include the individual character that the 4th numerical value is individual;
When described third value equals described 4th numerical value, export a described first numerical value phrase.
Another aspect of the present invention, provides a kind of participle device, comprising:
Acquisition module, for obtaining pending character string;
Matching module, for described pending character string being mated with universaling dictionary storehouse according to Forward Maximum Method method, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result, wherein, the first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to the first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result, the individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result,
Whether the first judge module is identical with described second value for judging described first numerical value;
Second judge module, when described first numerical value is identical with described second value, judge whether described third value is greater than described 4th numerical value, wherein, include the individual character that third value is individual in described first matching result, in described second matching result, include the individual character that the 4th numerical value is individual;
Output module, when described third value equals described 4th numerical value, exports a described first numerical value phrase.
Segmenting method in the embodiment of the present invention comprises: obtain pending character string, according to Forward Maximum Method method, described pending character string is mated with universaling dictionary storehouse, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result, wherein, the first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to the first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result, the individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result, judge that whether described first numerical value is equal with described second value, when described first numerical value is equal with described second value, judge whether described third value is greater than described 4th numerical value, wherein, in described first matching result, include the individual character that third value is individual, in described second matching result, include the individual character that the 4th numerical value is individual, when described third value equals described 4th numerical value, export a described first numerical value phrase.
In the embodiment of the present invention, Forward Maximum Method method and reverse maximum matching method is adopted to mate same pending character string respectively, after to be matched, if matching result is identical, then can direct Output rusults, so, first be employing two kinds of matching process, comparison matching result afterwards, exports if identical again, obviously improves the accuracy of participle.And in the embodiment of the present invention, if matching result is different, certain ambiguity elimination can also be carried out to matching result, thus can ensure that the result obtained is comparatively accurate as far as possible, ensure that the accuracy of participle from many aspects.
Accompanying drawing explanation
Fig. 1 is the main flow figure of segmenting method in the embodiment of the present invention;
Fig. 2 is the detailed structure view of participle device in the embodiment of the present invention.
Embodiment
Segmenting method in the embodiment of the present invention comprises: obtain pending character string; According to Forward Maximum Method method, described pending character string is mated with universaling dictionary storehouse, obtain the first matching result, and according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result; Judge that whether described first matching result is consistent with described second matching result; When consistent, export described first matching result or described second matching result as word segmentation result.
In the embodiment of the present invention, Forward Maximum Method method and reverse maximum matching method is adopted to mate same pending character string respectively, after to be matched, if matching result is identical, then can direct Output rusults, so, first be employing two kinds of matching process, comparison matching result afterwards, exports if identical again, obviously improves the accuracy of participle.And in the embodiment of the present invention, if matching result is different, certain ambiguity elimination can also be carried out to matching result, thus can ensure that the result obtained is comparatively accurate as far as possible, ensure that the accuracy of participle from many aspects.
See Fig. 1, the segmenting method in the embodiment of the present invention can comprise the following steps:
Step 101: obtain pending character string.
In the embodiment of the present invention, first passage can be obtained, after acquisition passage, first dictionary can be loaded.In prior art, the dictionary loaded can be common universaling dictionary storehouse, in the embodiment of the present invention, can build a special dictionary storehouse voluntarily, this special dictionary storehouse can be the special dictionary storehouse in any field, such as, can be the special dictionary storehouse in video display field, or can be the special dictionary storehouse of building field, or can be the special dictionary storehouse of electric field, etc., the special dictionary storehouse being video display field for described special dictionary storehouse in the embodiment of the present invention is described.The information relevant to video display that each actor name, director names, video display title, video display type, film and TV language etc. are different can be included in the special dictionary storehouse in this video display field, by carrying out searching for and mating in the special dictionary storehouse in this video display field, participle device can be made better at the effect in video search field.
In the embodiment of the present invention, a stop words extension dictionary storehouse can also be built voluntarily, multiple vocabulary is included in described stop words extension dictionary storehouse, such as can have auxiliary words of mood, conjunction etc., the vocabulary comprised in described stop words extension dictionary storehouse is all to understanding whole sentence without the vocabulary helped.Such as, have in short: " I goes to have a meal together with you." subject is " I, you ", predicate is " going ", and object is " having a meal ", and wherein " with " be exactly conjunction, be exactly insignificant phrase concerning the whole sentence of understanding, then this " with " word just can be included in described stop words extension dictionary storehouse.
In the embodiment of the present invention, described special dictionary storehouse and the described stop words extension dictionary storehouse of structure can be included in a universaling dictionary storehouse.But the universaling dictionary storehouse described in the embodiment of the present invention is different from universaling dictionary storehouse of the prior art, the universaling dictionary storehouse in the embodiment of the present invention is the universaling dictionary storehouse containing described special dictionary storehouse and described stop words extension dictionary storehouse.Such as, be that the special dictionary storehouse being video display field for described special dictionary storehouse is described in the embodiment of the present invention, then the described universaling dictionary storehouse in the embodiment of the present invention can be contain the special dictionary storehouse in described video display field and the universaling dictionary storehouse in described stop words extension dictionary storehouse.
After loading contains the described universaling dictionary storehouse in described special dictionary storehouse and described stop words extension dictionary storehouse, first can carry out rough lumber according to information such as punctuates to the passage obtained and divide, can be multiple sentence by its cutting.Wherein, each sentence can be described pending character string.
Step 102: described pending character string is mated with universaling dictionary storehouse according to Forward Maximum Method method, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result.
In the embodiment of the present invention, first can mate described pending character string according to Forward Maximum Method method, obtain described first matching result, described first matching result can correspond to the first individual phrase of the first numerical value.After described pending character string being mated according to Forward Maximum Method method, can continue to mate described pending character string according to reverse maximum matching method, obtain described second matching result, described second matching result can correspond to the second individual phrase of second value.Wherein, described first numerical value is the quantity of described first phrase comprised in described first matching result, described second value is the quantity of described second phrase comprised in described second matching result, namely described first numerical value can be determined according to described first matching result, and described second value can be determined according to described second matching result.Phrase in the embodiment of the present invention can comprise multiword phrase and individual character.Described first numerical value can be obtained according to described first matching result, described second value can be obtained according to described second matching result.
Or in the embodiment of the present invention, first can mate described pending character string according to reverse maximum matching method, obtain described second matching result, described second matching result can correspond to a described second value phrase.After described pending character string being mated according to reverse maximum matching method, can continue to mate described pending character string according to Forward Maximum Method method, obtain described first matching result, described first matching result can correspond to the first individual phrase of the first numerical value.
Or, in the embodiment of the present invention, also can mate described pending character string respectively according to Forward Maximum Method method and reverse maximum matching method simultaneously, obtain described first matching result and described second matching result respectively.That is, in the embodiment of the present invention, employing Forward Maximum Method method and reverse maximum matching method can be any to the sequencing that described pending character string is mated.
Wherein, the process of Forward Maximum Method method (MM) can be as follows:
First set a most major term long, the long length of this most major term needs the length being not more than described pending character string, and preferably, the length that this most major term is grown is less than the length of described pending character string.In general, the length that this most major term is long can rule of thumb set.The described most major term such as set is long is n, then can get n character from left to right to described pending character string, mate with described universaling dictionary storehouse, if there is this entry in described universaling dictionary storehouse, then the match is successful, the cutting from described pending character string of this n character is gone out, continues to get n character from left to right from remaining described pending character string and mate, until by complete for described pending string processing; If wherein an entry coupling is unsuccessful, then from this n character, remove last character, mate with the entry in described universaling dictionary storehouse again, if coupling or unsuccessful, then from this n-1 character, remove last character again, mate with the entry in described universaling dictionary storehouse again, re-treatment like this.Wherein, suppose that the length of described pending character string is m, then n should be and is greater than 1 and the natural number being not more than m.
The ultimate principle of reverse maximum matching method (RMM) is identical with Forward Maximum Method method, direction unlike point word segmentation is contrary with Forward Maximum Method method, scanning can be mated from the end of described pending character string, get the long character of most major term of least significant end as matching field at every turn, if it fails to match, then remove a word of matching field foremost, continue coupling.
Illustrate forward matching method below.
Such as, a pending character string is: " of me has a meal ".
The first step, first setting most major term length is 5.The character be then first syncopated as is " of me eats ", these 5 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 5 characters is removed, become " of me ", these 4 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 4 characters is removed, become " I have one ", these 3 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 3 characters is removed, become " I one ", these 2 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 2 characters is removed, become " I ", this 1 character is mated with described universaling dictionary storehouse, the match is successful.
Second step, carries out cutting by remaining described pending character string, obtains " people has a meal ".These 5 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 5 characters is removed, become " people eats ", these 4 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 4 characters is removed, become " people ", mated with described universaling dictionary storehouse by these 3 characters, discovery cannot be mated, then last character of these 3 characters is removed, become " one ", mated with described universaling dictionary storehouse by these 2 characters, the match is successful.
3rd step, carries out cutting by remaining described pending character string, obtains " people has a meal ".These 3 characters are mated with described universaling dictionary storehouse, discovery cannot be mated, then last character of these 3 characters is removed, become " people eats ", mated with described universaling dictionary storehouse by these 2 characters, discovery cannot be mated, then last character of these 2 characters is removed, become " people ", mated with described universaling dictionary storehouse by this 1 character, the match is successful.
4th step, carries out cutting by remaining described pending character string, obtains " having a meal ".Mated with described universaling dictionary storehouse by these 2 characters, the match is successful.
Then, employing Forward Maximum Method method to the word segmentation result that " of me has a meal " the words obtains after carrying out participle is: I/mono-/people/have a meal, namely obtain four phrases, comprising two individual characters.
Adopt reverse maximum matching method to carry out participle to " of me has a meal " the words again, the word segmentation result obtained is: I/mono-/individual/have a meal.
After described pending character string being mated according to Forward Maximum Method method, described first matching result can be obtained, described first matching result can correspond to the first individual phrase of described first numerical value, such as in the above-described embodiments, described first numerical value is 4, after described pending character string being mated according to reverse maximum matching method, described second matching result can be obtained, described second matching result can correspond to the second individual phrase of described second value, such as in the above-described embodiments, described second value is 4.
Step 103: judge that whether described first matching result is consistent with described second matching result.
In the embodiment of the present invention, after obtaining described first matching result and described second matching result, can judge that whether described first matching result is consistent with described second matching result.Consistent finger herein to be not only phrase quantity consistent, and the phrase content obtained is also completely the same.Such as, for " of me has a meal " the words, described first matching result adopting Forward Maximum Method method to obtain is: I/mono-/people/have a meal, if and adopt reverse maximum matching method, described second matching result then obtained can be: I/mono-/individual/have a meal, described first numerical value is 4, described second value is also 4, although the described second value that described first numerical value that described first matching result is corresponding is corresponding with described second matching result is equal, but the phrase obtained is also incomplete same, therefore still judge determine described first matching result and described second matching result inconsistent.
Such as, judge that whether described first matching result is consistent with described second matching result, can be specifically:
Judge that whether described first numerical value is equal with described second value.
When described first numerical value and described second value unequal time, can show there is ambiguity between described first matching result and described second matching result.
When described first numerical value is equal with described second value, judge that whether the first phrase of described first numerical value second phrase individual with described second value be identical.Wherein, herein identical whether refer to the content of the first phrase of described first numerical value second phrase individual with described second value completely the same.Such as, described first numerical value is 4, described first phrase is respectively: I/mono-/people/have a meal, described second value is 4, described second phrase is respectively: I/mono-/individual/have a meal, although described first numerical value is equal with described second value, the content of described first phrase and described second phrase is not quite identical, and the second phrase that therefore the first phrase of described first numerical value is individual with described second value is incomplete same.And, if, described first numerical value is 4, described first phrase is respectively: I/mono-/people/have a meal, described second value is 4, described second phrase is respectively: I/mono-/people/have a meal, then can determine that the first phrase of described first numerical value second phrase individual with described second value is identical.
When the second phrase that the first phrase that described first numerical value is individual is individual with described second value is identical, show there is no ambiguity between described first matching result and described second matching result, when the second phrase that the first phrase of described first numerical value is individual with described second value is incomplete same, show there is ambiguity between described first matching result and described second matching result.
Preferably, in the embodiment of the present invention, before step 101, first can load the described universaling dictionary storehouse comprising described special dictionary storehouse, wherein, before the described universaling dictionary storehouse of loading, can first classify to described special dictionary storehouse.Like this, after judging that whether described first matching result is consistent with described second matching result, the phrase that described first matching result or described second matching result can be comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively.Because of judging that described first matching result can determine matching result to be output after whether consistent with described second matching result, such as, if described matching result to be output is described first matching result, the phrase that then described first matching result can be comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively, if described matching result to be output is described second matching result, the phrase that then described second matching result can be comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
Step 104: when consistent, exports described first matching result or described second matching result as word segmentation result.
If judge to determine that described first matching result is consistent with described second matching result, namely, described first numerical value is equal with described second value, and the content of the first phrase of described first numerical value second phrase individual with described second value is identical, then can export described first matching result or described second matching result using as word segmentation result.
In the embodiment of the present invention, if judge determine described first matching result and described second matching result inconsistent, then can carry out ambiguity elimination to described first matching result and described second matching result, using export through ambiguity eliminate after described first matching result or described second matching result as word segmentation result.
In the embodiment of the present invention, the process that ambiguity is eliminated can be as follows:
First can judge described first numerical value and described second value whether unequal, if judge determine described first numerical value and described second value unequal, then can continue to judge whether described first numerical value is greater than described second value, if judge to determine that described first numerical value is greater than described second value, what then can determine needs output is a described second value phrase, namely according to the phrase that reverse maximum matching method obtains, if and judgement determines that described first numerical value is less than described second value, what then can determine needs output is a described first numerical value phrase, namely according to the phrase that Forward Maximum Method method obtains.
And if judgement determines that described first numerical value is equal with described second value, then other determining step can be continued.Such as, can determine can comprise a third value individual character in a described first numerical value phrase, a 4th numerical value individual character in a described second value phrase, can be comprised, can continue to judge that whether described third value is unequal with described 4th numerical value.If judge determine described third value and described 4th numerical value unequal, then can judge whether described third value is greater than described 4th numerical value, if judge to determine that described third value is greater than described 4th numerical value, what then can determine needs output is a described second value phrase, namely the phrase obtained according to reverse maximum matching method is exported, if and judgement determines that described third value is less than described 4th numerical value, what then can determine needs output is a described first numerical value phrase, namely exports the phrase obtained according to Forward Maximum Method method.Wherein, described third value is the quantity of the individual character comprised in described first matching result, described 4th numerical value is the quantity of the individual character comprised in described second matching result, namely described third value can be determined according to described first matching result, and described 4th numerical value can be determined according to described second matching result.Described third value can be obtained according to described first matching result, described 4th numerical value can be obtained according to described second matching result.
If judge to determine that described first numerical value is equal with described second value, described third value is also equal with described 4th numerical value, then what can determine needs output is a described first numerical value phrase, namely exports the phrase obtained according to Forward Maximum Method method.
Namely, in the embodiment of the present invention, if the described second value that described first numerical value that described first matching result is corresponding is corresponding from described second matching result is different, what then can determine to need to export is the result of phrase negligible amounts, if the described second value that described first numerical value that described first matching result is corresponding is corresponding with described second matching result is identical, and described third value is different from described 4th numerical value, then what can determine to need to export is the result of individual character negligible amounts.This disposal route is adopted, mainly in order to improve the accuracy that ambiguity is eliminated in the embodiment of the present invention.
In the embodiment of the present invention, ambiguity elimination is carried out to described first matching result and described second matching result, using export through ambiguity eliminate after described first matching result or described second matching result as word segmentation result.
Preferably, in the embodiment of the present invention, before step 101, first can load the described universaling dictionary storehouse comprising described special dictionary storehouse, wherein, before the described universaling dictionary storehouse of loading, can first classify to described special dictionary storehouse.Like this, after carrying out ambiguity elimination to described first matching result and described second matching result, the phrase that the word segmentation result after ambiguity can being eliminated comprises mates with the phrase in sorted described special dictionary storehouse according to classification respectively.Because matching result to be output can be determined after carrying out ambiguity elimination, such as, if described matching result to be output is described first matching result after ambiguity is eliminated, the phrase that then described first matching result after eliminating through ambiguity can be comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively, if described matching result to be output is described second matching result after ambiguity is eliminated, the phrase that then described second matching result after eliminating through ambiguity can be comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
Such as, if divided the special dictionary storehouse in described video display field in order to 5 classifications, be respectively actor name, director names, video display title, video display type and film and TV language, then can respectively each phrase be mated successively with each classification when mating.Concrete elder generation with which classification mates, and mates afterwards with which classification, and order can sets itself, or order can be any.
Such as, if the special dictionary storehouse in described video display field is divided in order to 5 classifications, be respectively actor name, director names, video display title, video display type and film and TV language, the matching order of setting is: actor name-video display title-director names-video display type-film and TV language.And ambiguity eliminate after word segmentation result in a phrase comprising be " hiding ", then first this phrase can be mated with this classification of actor name, discovery does not have entry to match, this phrase is then continued to mate with this classification of video display title, the match is successful, then can word segmentation result after output matching, and can be clear and definite when exporting, this phrase is video display title.
In the embodiment of the present invention, before judging that whether described first matching result is consistent with described second matching result, according to described stop words extension dictionary storehouse, the phrase of the first kind in described first matching result and described second matching result all can also be deleted.Because judging that what cannot determine to need to export is described first matching result or described second matching result before whether described first matching result is consistent with described second matching result, therefore all can delete the phrase of the first kind in described first matching result and described second matching result according to described stop words extension dictionary storehouse.
In the embodiment of the present invention, after judging that whether described first matching result is consistent with described second matching result, according to described stop words extension dictionary storehouse, the phrase of the first kind described in matching result to be output can also be deleted, wherein, described matching result to be output is described first matching result or described second matching result.Because after judging that whether described first matching result is consistent with described second matching result, what can determine needs output is described first matching result or described second matching result, if then determine that described matching result to be output is described first matching result, can delete according to the phrase of described stop words extension dictionary storehouse by the first kind described in described first matching result, without the need to processing described second matching result, if determine that described matching result to be output is described second matching result, can delete according to the phrase of described stop words extension dictionary storehouse by the first kind described in described second matching result, without the need to processing described first matching result, so also step can be saved.
In the embodiment of the present invention, the phrase of the described first kind can refer to the insignificant phrase of implication to understanding described pending character string.Such as, have a word segmentation result for "/I/do not know ", then " " is wherein auxiliary words of mood, obviously nonsensical to the described pending character string of understanding, when it being mated with described stop words extension dictionary storehouse, the match is successful, can be deleted.Concrete, in the embodiment of the present invention, the phrase of the described first kind can be function word phrase, and such as, the phrase of the described first kind can be auxiliary word phrase, conjunction phrase, adverbial idiom, preposition phrase, interjection phrase, onomatopoeia phrase, etc.Preferably, the kind of the phrase comprised in described stop words extension dictionary storehouse can change to some extent according to the difference in field belonging to described pending character string, the phrase comprising which kind in concrete described stop words extension dictionary storehouse can be determined according to real needs, and the present invention does not limit this.
Namely, in the embodiment of the present invention, first phrase of described first numerical value that described first matching result can be obtained mates with described stop words extension dictionary storehouse respectively, if the match is successful phrase, then this phrase is deleted, second phrase of a described second value that also described second matching result can be obtained mates with described stop words extension dictionary storehouse respectively, if there is phrase, the match is successful, then deleted by this phrase.
See Fig. 2, the present invention also provides a kind of participle device, and described device can comprise acquisition module 201, matching module 202, judge module 203 and output module 204.Described device can also comprise disambiguation module 205, load-on module 206, sort module 207 and processing module 208.
Acquisition module 201 may be used for obtaining pending character string.
Matching module 202 may be used for described pending character string being mated with universaling dictionary storehouse according to Forward Maximum Method method, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result.
The phrase that matching module 202 can also be used for described first matching result or described second matching result to comprise mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
The phrase that matching module 202 can also be used for the first matching result after carrying out ambiguity elimination or described second matching result comprise mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
Judge module 203 may be used for judging that whether described first matching result is consistent with described second matching result.
The first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to described first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result.Judge module 203 specifically may be used for: judge that whether described first numerical value is equal with described second value; When described first numerical value and described second value unequal time, show there is ambiguity between described first matching result and described second matching result; When described first numerical value is equal with described second value, judge that whether the first phrase of described first numerical value second phrase individual with described second value be identical; When the second phrase that the first phrase that described first numerical value is individual is individual with described second value is identical, show there is no ambiguity between described first matching result and described second matching result, when the second phrase that the first phrase of described first numerical value is individual with described second value is incomplete same, show there is ambiguity between described first matching result and described second matching result.
Output module 204 may be used for when consistent, exports described first matching result or described second matching result as word segmentation result.
Output module 204 can also be used for exporting described first matching result after ambiguity is eliminated or described second matching result as word segmentation result.
Output module 204 specifically may be used for: when described first numerical value is greater than described second value, export a described second value phrase; When described first numerical value is less than described second value, export a described first numerical value phrase.
Output module 204 specifically may be used for: when described third value is greater than described 4th numerical value, exports a described second value phrase; When described third value is less than described 4th numerical value, export a described first numerical value phrase; When described third value equals described 4th numerical value, export a described first numerical value phrase.
Disambiguation module 205 may be used for when inconsistent, carries out ambiguity elimination to described first matching result and described second matching result, using export through ambiguity eliminate after described first matching result or described second matching result as word segmentation result.
Disambiguation module 205 specifically may be used for when stating the first numerical value and described second value is unequal, judges whether described first numerical value is greater than described second value.
The individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result.Disambiguation module 205 specifically may be used for: when stating the first numerical value and being equal with described second value, judges whether described third value is greater than described 4th numerical value.
Load-on module 206 may be used for loading described universaling dictionary storehouse, and described universaling dictionary storehouse comprises special dictionary storehouse.
Load-on module 206 may be used for loading described universaling dictionary storehouse, and described universaling dictionary storehouse comprises stop words extension dictionary storehouse.
Sort module 207 may be used for classifying to described special dictionary storehouse.
Processing module 208 may be used for according to described stop words extension dictionary storehouse, is all deleted by the phrase of the first kind in described first matching result and described second matching result.
Processing module 208 may be used for, according to described stop words extension dictionary storehouse, being deleted by the phrase of the first kind in matching result to be output, and described matching result to be output is described first matching result or described second matching result.
In the embodiment of the present invention, the phrase of the described first kind can be function word phrase, and such as, the phrase of the described first kind can be auxiliary word phrase, conjunction phrase, adverbial idiom, preposition phrase, interjection phrase, onomatopoeia phrase, etc.Preferably, the kind of the phrase comprised in described stop words extension dictionary storehouse can change to some extent according to the difference in field belonging to described pending character string, and the phrase comprising which kind in concrete described stop words extension dictionary storehouse can be determined according to real needs.
Segmenting method in the embodiment of the present invention comprises: obtain pending character string; According to Forward Maximum Method method, described pending character string is mated with universaling dictionary storehouse, obtain the first matching result, and according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result; Judge that whether described first matching result is consistent with described second matching result; When consistent, export described first matching result or described second matching result as word segmentation result.
In the embodiment of the present invention, Forward Maximum Method method and reverse maximum matching method is adopted to mate same pending character string respectively, after to be matched, if matching result is identical, then can direct Output rusults, so, first be employing two kinds of matching process, comparison matching result afterwards, exports if identical again, obviously improves the accuracy of participle.And in the embodiment of the present invention, if matching result is different, certain ambiguity elimination can also be carried out to matching result, thus can ensure that the result obtained is comparatively accurate as far as possible, ensure that the accuracy of participle from many aspects.
In the embodiment of the present invention, describe the process that ambiguity is eliminated in detail, those skilled in the art can be easy to realize technical scheme of the present invention according to the content that the embodiment of the present invention describes, open comparatively abundant.And the disambiguation method in the employing embodiment of the present invention, the accuracy of participle can be improved.
The embodiment of the present invention constructs special dictionary storehouse specially, can mate, make the word segmentation result of output more targeted according to described special dictionary storehouse to word segmentation result.Described special dictionary storehouse can be the special dictionary storehouse of every field, thus the participle device in the embodiment of the present invention can be enable to carry out participle to the described pending character string in each field better.Such as, if described special dictionary storehouse is the special dictionary storehouse in described video display field, then described participle device can be enable to be applied to better in video search process.
The embodiment of the present invention also constructs stop words extension dictionary storehouse specially, can first delete insignificant phrase in phrase before output matching result, neither affects the result that participle exports, decreases follow-up operating process, save step.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (22)

1. a segmenting method, is characterized in that, comprises the following steps:
Obtain pending character string;
According to Forward Maximum Method method, described pending character string is mated with universaling dictionary storehouse, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result, wherein, the first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to the first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result, the individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result,
Judge that whether described first numerical value is equal with described second value;
When described first numerical value is equal with described second value, judge whether described third value is greater than described 4th numerical value, wherein, in described first matching result, include the individual character that third value is individual, in described second matching result, include the individual character that the 4th numerical value is individual;
When described third value equals described 4th numerical value, export a described first numerical value phrase.
2. the method for claim 1, is characterized in that, described judge that whether described first numerical value equal with described second value after, described method also comprises:
Described first numerical value and described second value unequal time, judge whether described first numerical value is greater than described second value;
When described first numerical value is greater than described second value, export a described second value phrase;
When described first numerical value is less than described second value, export a described first numerical value phrase.
3. method as claimed in claim 2, is characterized in that, described when described first numerical value is equal with described second value, after judging whether described third value is greater than described 4th numerical value, also comprise:
When described third value is greater than described 4th numerical value, export a described second value phrase;
When described third value is less than described 4th numerical value, export a described first numerical value phrase.
4. the method for claim 1, is characterized in that, also comprises step: load described universaling dictionary storehouse, described universaling dictionary storehouse comprises special dictionary storehouse before obtaining pending character string.
5. method as claimed in claim 4, is characterized in that, also comprises step: classify to described special dictionary storehouse before the described universaling dictionary storehouse of loading.
6. method as claimed in claim 5, it is characterized in that, judge that described first numerical value also comprises step after whether equal with described second value described: the phrase described first matching result or described second matching result comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
7. method as claimed in claim 2, it is characterized in that, described described first numerical value and described second value unequal time, judge described first numerical value also comprises step after whether being greater than described second value: a described second value phrase or a described first numerical value phrase are mated with the phrase in sorted special dictionary storehouse according to classification respectively.
8. the method for claim 1, is characterized in that, also comprises step: load described universaling dictionary storehouse, described universaling dictionary storehouse comprises stop words extension dictionary storehouse before obtaining pending character string.
9. method as claimed in claim 8, it is characterized in that, judge that described first numerical value also comprises step before whether equal with described second value described: according to described stop words extension dictionary storehouse, the phrase of the first kind in described first matching result and described second matching result is deleted.
10. method as claimed in claim 8, it is characterized in that, judge that described first numerical value also comprises step after whether equal with described second value described: according to described stop words extension dictionary storehouse, deleted by the phrase of the first kind in matching result to be output, described matching result to be output is described first matching result or described second matching result.
11. methods as described in claim 9 or 10, it is characterized in that, the phrase of the described first kind is function word phrase.
12. 1 kinds of participle devices, is characterized in that, comprising:
Acquisition module, for obtaining pending character string;
Matching module, for described pending character string being mated with universaling dictionary storehouse according to Forward Maximum Method method, obtain the first matching result, with according to reverse maximum matching method, described pending character string is mated with universaling dictionary storehouse, obtain the second matching result, wherein, the first phrase that the first numerical value is individual is included in described first matching result, the second phrase that second value is individual is included in described second matching result, the quantity of described first phrase of described first numerical value for comprising in described first matching result determined according to the first matching result, the quantity of described second phrase of described second value for comprising in described second matching result determined according to described second matching result, the individual character that third value is individual is included in described first matching result, the individual character that the 4th numerical value is individual is included in described second matching result, the quantity of individual character of described third value for comprising in described first matching result determined according to described first matching result, the quantity of individual character of described 4th numerical value for comprising in described second matching result determined according to described second matching result,
Whether the first judge module is identical with described second value for judging described first numerical value;
Second judge module, when described first numerical value is identical with described second value, judge whether described third value is greater than described 4th numerical value, wherein, include the individual character that third value is individual in described first matching result, in described second matching result, include the individual character that the 4th numerical value is individual;
Output module, when described third value equals described 4th numerical value, exports a described first numerical value phrase.
13. devices as claimed in claim 12, is characterized in that, described second judge module, also for:
Described first numerical value and described second value unequal time, judge whether described first numerical value is greater than described second value;
Described output module specifically for:
When described first numerical value is greater than described second value, export a described second value phrase;
When described first numerical value is less than described second value, export a described first numerical value phrase.
14. devices as claimed in claim 13, is characterized in that, described output module also for:
When described third value is greater than described 4th numerical value, export a described second value phrase;
When described third value is less than described 4th numerical value, export a described first numerical value phrase.
15. devices as claimed in claim 12, it is characterized in that, described device also comprises load-on module, and for loading described universaling dictionary storehouse, described universaling dictionary storehouse comprises special dictionary storehouse.
16. devices as claimed in claim 15, it is characterized in that, described device also comprises sort module, for classifying to described special dictionary storehouse.
17. devices as claimed in claim 16, is characterized in that, described matching module also for: the phrase described first matching result or described second matching result comprised mates with the phrase in sorted described special dictionary storehouse according to classification respectively.
18. devices as claimed in claim 13, it is characterized in that, described matching module also for: described described first numerical value and described second value unequal time, judge described first numerical value also comprises step after whether being greater than described second value: a described second value phrase or a described first numerical value phrase are mated with the phrase in sorted special dictionary storehouse according to classification respectively.
19. devices as claimed in claim 12, it is characterized in that, described device also comprises load-on module, and for loading described universaling dictionary storehouse, described universaling dictionary storehouse comprises stop words extension dictionary storehouse.
20. devices as claimed in claim 19, it is characterized in that, described device also comprises processing module, for according to described stop words extension dictionary storehouse, is deleted by the phrase of the first kind in described first matching result and described second matching result.
21. devices as claimed in claim 19, it is characterized in that, described device also comprises processing module, for according to described stop words extension dictionary storehouse, deleted by the phrase of the first kind in matching result to be output, described matching result to be output is described first matching result or described second matching result.
22. devices as described in claim 20 or 21, it is characterized in that, the phrase of the described first kind is function word phrase.
CN201210407529.6A 2012-10-23 2012-10-23 Word segmentation method and device Active CN102915299B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201510179584.8A CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179858.3A CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201210407529.6A CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210407529.6A CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201510179584.8A Division CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179858.3A Division CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Publications (2)

Publication Number Publication Date
CN102915299A CN102915299A (en) 2013-02-06
CN102915299B true CN102915299B (en) 2015-04-08

Family

ID=47613670

Family Applications (3)

Application Number Title Priority Date Filing Date
CN201210407529.6A Active CN102915299B (en) 2012-10-23 2012-10-23 Word segmentation method and device
CN201510179858.3A Pending CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179584.8A Pending CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN201510179858.3A Pending CN104765724A (en) 2012-10-23 2012-10-23 Word segmenting method and device
CN201510179584.8A Pending CN104765838A (en) 2012-10-23 2012-10-23 Word segmenting method and device

Country Status (1)

Country Link
CN (3) CN102915299B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018201600A1 (en) * 2017-05-05 2018-11-08 平安科技(深圳)有限公司 Information mining method and system, electronic device and readable storage medium

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544309B (en) * 2013-11-04 2017-03-15 北京中搜网络技术股份有限公司 A kind of retrieval string method for splitting of Chinese vertical search
CN103593338B (en) * 2013-11-15 2016-05-11 北京锐安科技有限公司 A kind of information processing method and device
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN105630807B (en) * 2014-10-31 2020-02-07 高德软件有限公司 Method and device for analyzing incidence relation between unknown road and known road
CN104461056B (en) * 2014-12-22 2018-06-01 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN105138514B (en) * 2015-08-24 2018-11-09 昆明理工大学 It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
CN105243055B (en) * 2015-09-28 2018-07-31 北京橙鑫数据科技有限公司 Based on multilingual segmenting method and device
CN105335488A (en) * 2015-10-16 2016-02-17 中国南方电网有限责任公司电网技术研究中心 Knowledge base construction method
CN106649251B (en) * 2015-10-30 2019-07-09 北京国双科技有限公司 A kind of method and device of Chinese word segmentation
CN105550170B (en) * 2015-12-14 2018-10-12 北京锐安科技有限公司 A kind of Chinese word cutting method and device
CN106202040A (en) * 2016-06-28 2016-12-07 邓力 A kind of Chinese word cutting method of PDA translation system
CN107622044A (en) * 2016-07-13 2018-01-23 阿里巴巴集团控股有限公司 Segmenting method, device and the equipment of character string
CN107092590A (en) * 2017-03-17 2017-08-25 贵州恒昊软件科技有限公司 A kind of sentence segmenting method and system
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN108009153A (en) * 2017-12-08 2018-05-08 北京明朝万达科技股份有限公司 A kind of searching method and system based on search statement cutting word result
CN110222335A (en) * 2019-05-20 2019-09-10 平安科技(深圳)有限公司 A kind of text segmenting method and device
CN112215010A (en) * 2019-07-10 2021-01-12 北京猎户星空科技有限公司 Semantic recognition method and equipment
CN113302683B (en) * 2019-12-24 2023-08-04 深圳市优必选科技股份有限公司 Multi-tone word prediction method, disambiguation method, device, apparatus, and computer-readable storage medium
CN112287108B (en) * 2020-10-29 2022-08-16 四川长虹电器股份有限公司 Intention recognition optimization method in field of Internet of things
CN113342989B (en) * 2021-05-24 2022-12-20 北京航空航天大学 Knowledge graph construction method and device of patent data, storage medium and terminal
CN113221552A (en) * 2021-06-02 2021-08-06 浙江百应科技有限公司 Multi-model word segmentation method and device based on deep learning and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122900A (en) * 2007-09-25 2008-02-13 中兴通讯股份有限公司 Words partition system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692B (en) * 2006-03-24 2010-09-22 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN102394061B (en) * 2011-11-08 2013-01-02 中国农业大学 Text-to-speech method and system based on semantic retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122900A (en) * 2007-09-25 2008-02-13 中兴通讯股份有限公司 Words partition system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一个基于词典与统计的中文分词算法;张旭;《中国优秀硕士学位论文全文数据库 信息科技辑》;20080215(第2期);第19页第2.4节,第20页第3.2节,第37页第3.3.2.1节,第40页第4.1节 *
中文分词基础原则及正向最大匹配法、逆向最大匹配法、双向最大匹配法的分析;hfgang;《新浪微博》;http://blog.sina.com.cn/s/blog_53daccf401011t74.html;20120509;第1页第2段至第2页最后一段 *
基于新的关键词提取方法的快速文本分类***;罗杰 等;《计算机应用研究》;20060430;摘要、第32页第1节至第34页第4节、表1 *
文本自动分类关键技术研究;张冬慧 等;《微计算机信息》;20080630;摘要、第197页第1,2节,第198页第4.2节至第199页第5.2节 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018201600A1 (en) * 2017-05-05 2018-11-08 平安科技(深圳)有限公司 Information mining method and system, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN102915299A (en) 2013-02-06
CN104765724A (en) 2015-07-08
CN104765838A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN102915299B (en) Word segmentation method and device
CN110543574B (en) Knowledge graph construction method, device, equipment and medium
US8577882B2 (en) Method and system for searching multilingual documents
US20020174095A1 (en) Very-large-scale automatic categorizer for web content
US8126897B2 (en) Unified inverted index for video passage retrieval
US20050138018A1 (en) Information retrieval system, search result processing system, information retrieval method, and computer program product therefor
CN105930362B (en) Search for target identification method, device and terminal
CN101826099B (en) Method and system for identifying similar documents and determining document diffusance
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN107784110B (en) Index establishing method and device
CN102339294B (en) Searching method and system for preprocessing keywords
WO2007136861A2 (en) Annotation by search
CN107577663B (en) Key phrase extraction method and device
CN106777261A (en) Data query method and device based on multi-source heterogeneous data set
CN105468584A (en) Filtering method and system for bad literal information in text
CN102789464A (en) Natural language processing method, device and system based on semanteme recognition
CN109933216B (en) Word association prompting method, device and equipment for intelligent input and computer storage medium
CN109446313B (en) Sequencing system and method based on natural language analysis
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
US20040122660A1 (en) Creating taxonomies and training data in multiple languages
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
Celikyilmaz et al. Leveraging web query logs to learn user intent via bayesian latent variable model
Watrin et al. An N-gram frequency database reference to handle MWE extraction in NLP applications
CN106776590A (en) A kind of method and system for obtaining entry translation
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant