CN105786793A

CN105786793A - Method and device for analyzing semanteme of spoken language text information

Info

Publication number: CN105786793A
Application number: CN201510977813.0A
Authority: CN
Inventors: 陈由之; 时培轩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-12-23
Filing date: 2015-12-23
Publication date: 2016-07-20
Anticipated expiration: 2035-12-23
Also published as: CN105786793B

Abstract

The invention discloses a method and device for analyzing semanteme of spoken language text information. One specific implementation manner of the method comprises the following steps of: performing word segmentation of received spoken language text information so as to extract characteristics; determining associated fields of the spoken language text information according to nouns in the extracted characteristics; matching pre-set characteristics in the associated fields in a pre-set database by responding to the extracted characteristics, and determining weight values of the pre-set characteristics in the associated fields as weight values of the extracted characteristics in the associated fields, wherein the pre-set database can comprise, but not limited to, the weight values of the pre-set characteristics in many fields comprising, but not limited to, the associated fields; determining scores of regular expressions of the text information in the associated fields based on the weight values of the extracted characteristics in the associated fields; sorting scores, and obtaining a pre-set number of regular expressions according to the sorting result; and taking the obtained regular expressions as analysis texts of the spoken language text information. By means of the implementation manner, the semantic analysis result obtaining accuracy can be improved.

Description

Resolve the semantic method and apparatus of spoken language text information

Technical field

The application relates to field of computer technology, is specifically related to technical field of voice recognition, particularly relates to resolve the semantic method and apparatus of spoken language text information.

Background technology

Spoken semantic parsing is the information of understanding spoken language voice signal carrying, after the voice signal inputted for user carries out spoken semantic parsing, can retrieve according to the parsing text of spoken language text information, thus improve the speed of retrieval information, improve the updating ability of information.

The most conventional spoken semantic analytic method, is that spoken voice signal is identified as spoken language text information, uses the method for rule match to resolve spoken language text information afterwards, obtains the parsing text of spoken language text information.

But, current spoken semantic analytic method, when the method using rule match resolves, to same spoken language text information, the parsing text obtaining spoken language text information, tend to obtain a plurality of parsing text, and not can determine which bar more convergence user intention to be expressed in a plurality of parsing text.

Summary of the invention

The purpose of the application is to propose the semantic method and apparatus of the spoken language text information that resolves of a kind of improvement, solves the technical problem that background section above is mentioned.

First aspect, this application provides a kind of semantic method resolving spoken language text information, and described method includes: the spoken language text information received is carried out participle to extract feature；The association field of described spoken language text information is determined by the noun in the feature extracted；In response to the default feature associating field described in the database that the characteristic matching of described extraction is preset, the described default feature weighted value in described association field is defined as the feature of the described extraction weighted value in described association field, wherein, described default database includes presetting the feature weighted value in multiple fields, and the plurality of field includes described association field；Feature based on described extraction, at the weighted value in described association field, determines the described text message score value at the regular expression in described association field；Described score value is ranked up, obtains the regular expression of predetermined number according to the result of sequence；Using the regular expression that obtains as the parsing text of described spoken language text information.

In certain embodiments, the described default feature weighted value in multiple fields is determined by following process: the number of times default feature occurred in each field in multiple fields, divided by there is presetting total word number of the text message sample of feature, obtains presetting the frequency that feature occurs in each field；Will appear from the quantity of text message sample of described default feature divided by the quantity of total text message sample, obtain the reverse document-frequency of described default feature, wherein, text message sample and described total text message sample of the described default feature of described appearance is obtained by the historical data of the spoken language text information resolving semanteme；Described default feature is multiplied by the frequency that each field occurs the reverse document-frequency of described default feature, obtain presetting the feature weighted value in each field, and according to the described default feature weighted value in each field, obtain the described default feature weighted value in multiple fields.

In certain embodiments, the described feature based on the described extraction weighted value in described association field, determine that described text message includes at the score value of the regular expression in described association field: in described association field, the weighted value hitting the feature of regular expression in the feature of described extraction is added, obtains the described text message score value at the regular expression in described association field.

In certain embodiments, in response to the default feature associating field described in the database that the characteristic matching of described extraction is preset, the described default feature weighted value in described association field is defined as the feature of the described extraction weighted value in described association field include: filter hit in the feature of described extraction and preset the feature filtering vocabulary, the feature after being filtered；The default feature in field is associated, by the described default feature feature after the weighted value in described association field the is defined as described filtration weighted value in described association field in response to described in the database that the characteristic matching after described filtration is preset；And it is described in described association field, the weighted value hitting the feature of regular expression in the feature of described extraction is added, obtain described text message and include at the score value of the regular expression in described association field: in described association field, the weighted value hitting the feature of regular expression in feature after described filtration is added, obtains the score value of the regular expression of described text message.

In certain embodiments, the described feature based on the described extraction weighted value in described association field, determines that described text message also includes at the score value of the regular expression in described association field: obtained the text message regular expression in described association field by following steps: identify the type label of entity information from the feature of described extraction；Associate, described in the type label coupling initialized data base identified, the type label preset that the regular expression in field has, using the regular expression with default type label as the described text message regular expression in described association field, wherein, described preset database is included in the regular expression with preset kind label in the plurality of field.

In certain embodiments, described from the feature of described extraction, identify that the type label of entity information includes: from the feature of described extraction, identify the position relationship between the verb of entity information, noun and verb and noun；And described coupling in response to the type label identified, associates the type label preset that the regular expression in field has described in initialized data base, the regular expression with default type label is included as described text message at the regular expression in described association field: in response to the verb identified, associate, described in position relationship coupling initialized data base between noun and verb and noun, the verb preset that the regular expression in field has, position relationship between noun and verb and noun, to have default verb, the regular expression of the position relationship between noun and verb and noun is as the described text message regular expression in described association field.

Second aspect, this application provides a kind of semantic device resolving spoken language text information, and described device includes: characteristic extracting module, for the spoken language text information received is carried out participle to extract feature；Field determines module, for being determined the association field of described spoken language text information by the noun in the feature extracted；Weight determination module, for associating the default feature in field described in the database default in response to the characteristic matching of described extraction, the described default feature weighted value in described association field is defined as the feature of the described extraction weighted value in described association field, wherein, described default database includes presetting the feature weighted value in multiple fields, and the plurality of field includes described association field；Score value determines module, for the feature based on the described extraction weighted value in described association field, determines the described text message score value at the regular expression in described association field；Expression formula acquisition module, for being ranked up described score value, obtains the regular expression of predetermined number according to the result of sequence；Resolve text module, for the regular expression that will obtain as the parsing text of described spoken language text information.

In certain embodiments, the described default feature in the described weight determination module weighted value in multiple fields is by determining with lower module: frequency of occurrences acquisition module, for the number of times that default feature occurred in each field in multiple fields divided by there is presetting total word number of the text message sample of feature, obtain presetting the frequency that feature occurs in each field；Reverse document-frequency acquisition module, for will appear from the quantity quantity divided by total text message sample of the text message sample of described default feature, obtain the reverse document-frequency of described default feature, wherein, text message sample and described total text message sample of the described default feature of described appearance is obtained by the historical data of the spoken language text information resolving semanteme；Weighted value acquisition module, for described default feature to be multiplied by the reverse document-frequency of described default feature in the frequency that each field occurs, obtain presetting the feature weighted value in each field, and according to the described default feature weighted value in each field, obtain the described default feature weighted value in multiple fields.

In certain embodiments, described score value determines that module includes: be added submodule, for in described association field, the weighted value hitting the feature of regular expression is added, obtains the described text message score value at the regular expression in described association field in the feature of described extraction.

In certain embodiments, described weight determination module includes: feature filters submodule, and in the feature filtering described extraction, the feature filtering vocabulary, the feature after being filtered are preset in hit；Weight determines submodule, for associating the default feature in field described in the database default in response to the characteristic matching after described filtration, by the described default feature feature after the weighted value in described association field the is defined as described filtration weighted value in described association field；And described addition submodule includes: in described association field, the weighted value that will hit the feature of regular expression in the feature after described filtration is added, and obtains the score value of the regular expression of described text message.

In certain embodiments, described score value determines that module also includes: expression formula determines module, including: type label identification module, for identifying the type label of entity information from the feature of described extraction；Expression formula matching module, for associating, described in the type label coupling initialized data base identified, the type label preset that the regular expression in field has, using the regular expression with default type label as the described text message regular expression in described association field, wherein, described preset database is included in the regular expression with preset kind label in the plurality of field.

In certain embodiments, described type label identification module is further used for: identify the position relationship between verb, noun and verb and the noun in entity information from the feature of described extraction；And described expression formula matching module is further used for: in response to the position relationship between verb, noun and verb and the noun preset that the regular expression associating field described in the position relationship coupling initialized data base between verb, noun and verb and the noun identified has, using the regular expression of position relationship that has between default verb, noun and verb and noun as the described text message regular expression in described association field.

What the application provided resolves the semantic method and apparatus of spoken language text information, by the spoken language text information received is carried out participle to extract feature, the association field of spoken language text information is determined afterwards by the noun in the feature extracted, the database preset in response to the characteristic matching extracted afterwards associates the default feature in field, the default feature weighted value in association field is defined as the feature of the extraction weighted value in association field, the feature based on the extraction weighted value in association field afterwards, determine the text message score value at the regular expression in association field, then score value is ranked up, result based on sequence obtains the regular expression of predetermined number, finally using the regular expression that obtains as the parsing text of spoken language text information.In the method, the feature the extracted weighted value in association field represents the feature of the extraction importance in association field, and the parsing text of spoken language text information is obtained according to the feature the extracted importance in association field, improve the accuracy obtaining semantic analysis result.

Accompanying drawing explanation

By reading the detailed description being made non-limiting example made with reference to the following drawings, other features, purpose and advantage will become more apparent upon:

Fig. 1 is that the application can apply to exemplary system architecture figure therein；

Fig. 2 is the indicative flowchart of an embodiment of the semantic method of the spoken language text information that resolves according to the application；

Fig. 3 is the exemplary block diagram of an embodiment of the semantic device of the spoken language text information that resolves according to the application；

Fig. 4 is adapted for the structural representation of the computer system for the terminal device or server realizing the embodiment of the present application.

Detailed description of the invention

With embodiment, the application is described in further detail below in conjunction with the accompanying drawings.It is understood that specific embodiment described herein is used only for explaining related invention, rather than the restriction to this invention.It also should be noted that, for the ease of describing, accompanying drawing illustrate only the part relevant to about invention.

It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined.Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.

Fig. 1 shows the exemplary system architecture 100 of the embodiment of the semantic method of the spoken language text information that resolves that can apply the application or the semantic device of parsing spoken language text information.

As it is shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105.Network 104 is in order to provide the medium of communication link between terminal device 101,102,103 and server 105.Network 104 can include various connection type, the most wired, wireless communication link or fiber optic cables etc..

User can use terminal device 101,102,103 mutual with server 105 by network 104, to receive or to send message etc..The client application of various support spoken voice identification, such as web browser applications, shopping class application, searching class application, JICQ, mailbox client, social platform software etc. can be installed on terminal device 101,102,103.

Terminal device 101,102,103 can be to have display screen and support the various electronic equipments of spoken voice identification, include but not limited to smart mobile phone, panel computer, E-book reader, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio frequency aspect 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desktop computer etc..

Server 105 can be to provide the server of various service, such as, provide, to the client application of support spoken voice identifications various on terminal device 101,102,103, the background server supported.The data such as the spoken voice signal received can be analyzed waiting and process by background server, and result (such as spoken semantic analysis result) is fed back to terminal device.

It should be noted that the semantic method of the spoken language text information that resolves that the embodiment of the present application is provided typically is performed by server 105, correspondingly, the semantic device resolving spoken language text information is generally positioned in server 105.

It should be understood that the number of terminal device, network and the server in Fig. 1 is only schematically.According to realizing needs, can have any number of terminal device, network and server.

The flow process 200 of an embodiment of the semantic method of the spoken language text information that resolves according to the application is shown with continued reference to Fig. 2, Fig. 2.The semantic method of the described spoken language text information that resolves, comprises the following steps:

Step 201, carries out participle to extract feature to the spoken language text information received.

In the present embodiment, if the electronic equipment itself receiving user's spoken voice signal has enough data-handling capacities, then the semantic method resolving spoken language text information can directly run on (the such as terminal device shown in Fig. 1 or server) in electronic equipment；If the electronic equipment (the such as terminal device shown in Fig. 1) receiving user's spoken voice signal itself does not possess enough data-handling capacities, then the spoken voice signal of reception can be transmitted to the electronic equipment (the such as server shown in Fig. 1) with higher position reason ability, in the electronic equipment with higher position reason ability, spoken voice signal is identified as spoken language text information, and runs the semantic method resolving spoken language text information further.Above-mentioned spoken language text information is by identifying that spoken voice signal obtains.Identify spoken voice signal method can be prior art or future development technology in be used for identify spoken voice signal method, this is not limited by the application.Above-mentioned radio connection can include but not limited to that 3G/4G connects, WiFi connects, bluetooth connects, WiMAX connects, Zigbee connects, UWB (ultra wideband) connects and other currently known or exploitation in the future radio connection.

In the present embodiment, the spoken language text information received is carried out participle and refer to that by spoken language text information cutting be multiple single words.The method that the spoken language text information received is carried out participle, can be currently known or exploitation in the future segmenting method, and this is not limited by the application.Such as, existing segmentation methods can be divided into three major types: segmenting method based on string matching, based on the segmenting method understood and segmenting method based on statistics.Combine according to whether with part-of-speech tagging process, the integral method that simple segmenting method and participle combine with mark can be divided into again.As a example by segmenting method based on string matching, can be joined by the entry in machine dictionary huge to spoken language text information and capacity according to certain strategy, if finding certain character string in dictionary, then the match is successful, obtains multiple single word.

After the spoken language text information received is carried out participle, the multiple single word obtained can be extracted feature, namely multiple single words are carried out the feature that feature extraction obtains extracting.Here feature refers to the base unit for representing text, and feature is generally configured with following characteristic: feature can identify content of text really, feature have target text is distinguished mutually with other texts ability, feature number can not too much, character separation is easier to realize.In Chinese text, word, word or phrase can be used as the feature representing text.Comparatively speaking, word has higher ability to express than word, and word is compared with phrase, and the cutting difficulty of word is more much smaller than the cutting difficulty of phrase.Therefore, current most of Chinese Text Classification System all use word as feature, referred to as Feature Words.These Feature Words, as the intermediate representation of document, are used for the Similarity Measure realizing between document and document, document and ownership goal.If using all of word all as feature, so the dimension of characteristic vector will be the hugest, thus cause amount of calculation the biggest, it is thus desirable to carry out feature extraction, the major function of feature extraction is to reduce word number to be processed in the case of not damaging text core information as far as possible, reduce dimension of a vector space with this, thus simplify calculating, improve speed and the efficiency of text-processing.Should be appreciated that when extracting feature, can use in prior art to extract and extract the method for feature in the method for feature or WeiLai Technology and carry out feature extraction, this is not limited by the application.As a example by prior art, the mode carrying out feature extraction at least includes following several: with mapping or primitive character is transformed to less new feature by the method for conversion；Some the most representational features are picked out from primitive character；Knowledge according to expert selects the most influential feature；And choose by the method for mathematics, find out the feature of information of most classifying.In a concrete application, the score value of each feature can be calculated according to certain feature evaluation function, then by score value, these features are ranked up, choose the highest feature of several score values as the feature extracted.

As example, short sentence " my Yao Qu Baidu mansion " can be carried out participle to extract feature, " I ", " wanting ", " going ", the word segmentation result of " Baidu mansion " can be obtained by participle, thus obtain " I ", " wanting ", " going ", " Baidu mansion " four features.

Step 202, is determined the association field of spoken language text information by the noun in the feature extracted.

In the present embodiment, based on the noun in the feature extracted in step 201, it may be determined that the association field of spoken language text information.Such as, the association field that may determine that " my Yao Qu Baidu mansion " based on above-mentioned " Baidu mansion " is map field.

In the present embodiment, the association field of spoken language text information can include but not limited to following one or more field: music field, map field, address list field, TV programme field, movie news field and television command field etc..

Step 203, associates the default feature in field in response in the database that the characteristic matching extracted is preset, and the default feature weighted value in association field is defined as the feature the extracted weighted value in association field.

In the present embodiment, above-mentioned default database can include but not limited to preset the feature weighted value in multiple fields, and wherein, multiple fields can include but not limited to association field.

Run the electronic equipment of the semantic method resolving spoken language text information, when mating of default feature in carrying out the characteristic matching extracted and the database preset, can first determine the association field in default database, to reduce the scope of coupling, and then improve the efficiency carrying out mating, afterwards the feature of said extracted is mated one by one with the default feature associating field in the database preset, if the characteristic matching extracted is to the default feature associating field in default database, then by default feature association field weighted value be defined as extract feature association field weighted value.

In some optional implementations of the present embodiment, the above-mentioned database preset presets the feature weighted value in multiple fields determined by following process: the number of times default feature occurred in each field in multiple fields, divided by total word number of the text message sample default feature occur, obtains presetting the frequency that feature occurs in each field；Will appear from the quantity presetting the text message sample of the feature quantity divided by total text message sample, obtain presetting the reverse document-frequency of feature, wherein, occur presetting the text message sample of feature and total text message sample is obtained by the historical data of the spoken language text information resolving semanteme；Default feature is multiplied by the frequency that each field occurs the reverse document-frequency of default feature, obtain presetting the feature weighted value in each field, and according to the described default feature weighted value in each field, obtain the described default feature weighted value in multiple fields.

In this implementation, determine that default feature is when the weighted value in multiple fields by calculating word frequency-reverse document-frequency TF-IDF, TF represents the frequency that default feature occurs in the text message sample of every field, can be obtained by total word number of the text message sample that the number of times default feature occurred presets feature divided by appearance, the number of times that default feature occurs in a document is the most, then the TF value presetting feature is the biggest；IDF represents reverse document-frequency, obtain divided by the quantity of total text message sample by will appear from the quantity presetting the text message sample of feature, meaning in multiple fields, the quantity if there is the text message sample of default feature is the fewest, then the IDF value of this feature is the biggest；The product of TF Yu IDF is the weighted value of default feature, namely presets the feature weighted value in multiple fields.Such as, for default feature " is competed ", weight in short sentence " please help me to inquire about the schedules of Warriors' match tomorrow " is greater than the weight in short sentence " reminds me to watch the match tomorrow ", say, that the feature weight in competitive sports field of " compete " is greater than the weight in prompting field.

In some optional implementations of the present embodiment, in response to the default feature associating field in the database that the characteristic matching extracted is preset, the default feature weighted value in association field is defined as the feature of the extraction weighted value in association field can include but not limited to: filter hit in the feature of extraction and preset the feature filtering vocabulary, the feature after being filtered；Associate the default feature in field in response in the database that the characteristic matching after filtering is preset, the default feature weighted value in association field is defined as the weighted value in association field of the feature after filtering.

Step 204, based on the feature the extracted weighted value in association field, determines the text message score value at the regular expression in association field.

In the present embodiment, the default feature in field is associated in the above-mentioned database preset in response to the characteristic matching extracted, the feature being defined as extracting by the default feature weighted value in association field is after the weighted value in association field, the text message score value at the regular expression in association field can be determined by the feature of extraction based on the weighted value in association field.

In some optional implementations of the present embodiment, based on the feature the extracted weighted value in association field, determine that text message can include but not limited at the score value of the regular expression in association field: in association field, the weighted value hitting the feature of regular expression in the feature of extraction is added, obtains the text message score value at the regular expression in association field.

In this implementation, the score value of this rule of regular expression is the weighted value sum of the feature of the extraction hitting it, it may be assumed that

Wherein, Weight_RuleRepresent the weighted value of this rule of regular expression, Weight_Feature _iRepresenting the weighted value of ith feature, the span of i is from 1 to n, and n represents that the feature of the extraction hitting this regular expression is n.

As a example by weather field, the weighted value of the different characteristic in weather field is approximately as shown in table:

Being expressed as follows of the regular expression in weather field:

First regular expression: (weather) (how | OK | how)？

Second regular expression: (temperature) (how much)？(spending)？

So, for short sentence " weather how " can mate the first regular expression (weather) (how | OK | how)？This rule, then the score value of this rule is:

0.0328802+0.00745463=0.04033483.

In some optional implementations of the present embodiment, preset the feature filtering vocabulary, the feature after being filtered with hit in the above-mentioned feature filtering extraction；In response to the default feature associating field in the database that the characteristic matching after filtering is preset, the default feature weighted value in association field is defined as the weighted value in association field of the feature after filtering corresponding, above-mentioned in association field, the weighted value hitting the feature of regular expression in the feature of extraction is added, obtain text message and can include but not limited at the score value of the regular expression in association field: in association field, the weighted value of the feature hitting regular expression in the feature after filtering is added, and obtains the score value of the regular expression of text message.

In some optional implementations of the present embodiment, based on the feature the extracted weighted value in association field, determine that text message can also include but not limited at the score value of the regular expression in association field: obtained the text message regular expression in association field by following steps: from the feature extracted, identify the type label of entity information；The type label preset that the regular expression in field has is associated in response in the type label coupling initialized data base identified, using the regular expression with default type label as the text message regular expression in association field, wherein, preset database can include but not limited to the regular expression with preset kind label in multiple fields.

Above-mentioned identifies that the type label of entity information can include but not limited to from the feature extracted: identify the position relationship between the verb of entity information and noun and verb and noun from the feature extracted；And associate, in response in the type label coupling initialized data base identified, the type label preset that the regular expression in field has, the regular expression with default type label can be included but not limited to as text message at the regular expression in association field: associate the position relationship between verb and noun and verb and the noun preset that the regular expression in field has in response in the position relationship coupling initialized data base between verb and noun and verb and the noun identified, using the regular expression of position relationship that has between default verb and noun and verb and noun as the text message regular expression in association field.

The above-mentioned type label identifying entity information from the feature extracted, can use the recognition methods in recognition methods well known in the prior art or WeiLai Technology to realize, and this is not limited by the application.It is for instance possible to use condition random field CRF algorithm identifies the type label of entity information from the feature extracted.

Step 205, is ranked up score value, obtains the regular expression of predetermined number according to the result of sequence.

In the present embodiment, can be ranked up at the score value of the regular expression in association field for the text message determined in step 204.Wherein, predetermined number can be one or more, can determine the quantity of the regular expression of acquisition according to the setting of user or technological development personnel.Such as, predetermined number can be set as three, according to score value after high to low being ranked up, obtain the highest three regular expressions of sequence；Predetermined number can also be set as one, according to score value after high to low being ranked up, only obtain the highest regular expression of sequence.

Step 206, using the regular expression that obtains as the parsing text of spoken language text information.

In the present embodiment, the regular expression parsing text as spoken language text information that can will obtain predetermined number in step 205 according to the result sorted.Such as, using three the highest for the sequence of above-mentioned acquisition regular expressions as spoken language text information resolve text maybe using the highest regular expression of sequence as the parsing text of spoken language text information.

When the quantity of the regular expression obtained is multiple, the regular expression of acquisition can be presented further to user, select to resolve text for user, thus improve the degree of accuracy of parsing and promote Consumer's Experience.

The method that above-described embodiment of the application provides, by the spoken language text information received is carried out participle to extract feature, the association field of spoken language text information is determined afterwards by the noun in the feature extracted, the database preset in response to the characteristic matching extracted afterwards associates the default feature in field, the default feature weighted value in association field is defined as the feature of the extraction weighted value in association field, the feature based on the extraction weighted value in association field afterwards, determine the text message score value at the regular expression in association field, then score value is ranked up, result based on sequence obtains the regular expression of predetermined number, finally using the regular expression that obtains as the parsing text of spoken language text information, improve the accuracy obtaining semantic analysis result.

With further reference to Fig. 3, as to the realization of method shown in above-mentioned each figure, this application provides an embodiment of a kind of semantic device resolving spoken language text information, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and this device specifically can apply in various electronic equipment.

As shown in Figure 3, the semantic device 300 of the spoken language text information that resolves described in the present embodiment can include but not limited to: characteristic extracting module 310, and field determines module 320, weight determination module 330, score value determines module 340, expression formula acquisition module 350 and parsing text module 360.

Wherein, characteristic extracting module 310, the spoken language text information to receiving that is configured to carries out participle to extract feature；Field determines module 320, is configured to be determined the association field of spoken language text information by the noun in the feature extracted；Weight determination module 330, it is configured to the database preset in response to the characteristic matching of extraction associates the default feature in field, the default feature weighted value in association field is defined as the feature of the extraction weighted value in association field, wherein, the database preset can include but not limited to preset the feature weighted value in multiple fields, and multiple fields can include but not limited to association field；Score value determines module 340, is configured to the feature based on the extraction weighted value in association field, determines the text message score value at the regular expression in association field；Expression formula acquisition module 350, is configured to be ranked up score value, obtains the regular expression of predetermined number according to the result of sequence；Resolve text module 360, be configured to the regular expression that obtains as the parsing text of spoken language text information.

In some optional implementations of the present embodiment, the default feature in weight determination module 330 weighted value in multiple fields is by determining (not shown) with lower module: frequency of occurrences acquisition module, reverse document-frequency acquisition module and weighted value acquisition module.Wherein, frequency of occurrences acquisition module, the number of times being configured in each field in multiple fields default feature occur, divided by total word number of the text message sample default feature occur, obtains presetting the frequency that feature occurs in each field；Reverse document-frequency acquisition module, it is configured to the quantity that will appear from the text message sample of the default feature quantity divided by total text message sample, obtain presetting the reverse document-frequency of feature, wherein, occur presetting the text message sample of feature and total text message sample is obtained by the historical data of the spoken language text information resolving semanteme；Weighted value acquisition module, it is configured to be multiplied by default feature in the frequency that each field occurs the reverse document-frequency of default feature, obtain presetting the feature weighted value in each field, and according to the described default feature weighted value in each field, obtain the described default feature weighted value in multiple fields.

In some optional implementations of the present embodiment, score value determines that module 340 can include but not limited to (not shown): be added submodule, it is configured in association field, the weighted value hitting the feature of regular expression in the feature of extraction is added, obtains the text message score value at the regular expression in association field.

In some optional implementations of the present embodiment, weight determination module 330 can include but not limited to (not shown): feature filters submodule, it is configured to filter hit in the feature of extraction and presets the feature filtering vocabulary, the feature after being filtered；Weight determines submodule, associates the default feature in field in the database that the characteristic matching after being configured in response to filtration is preset, and the default feature weighted value in association field is defined as the weighted value in association field of the feature after filtering；And addition submodule can be further used for: in association field, the weighted value of the feature hitting regular expression in the feature after filtering is added, and obtains the score value of the regular expression of text message.

In some optional implementations of the present embodiment, score value determines that module 340 can also include but not limited to (not shown): expression formula determines module, can include but not limited to: type label identification module, be configured to identify the type label of entity information from the feature extracted；Expression formula matching module, it is configured to associate, in response in the type label coupling initialized data base identified, the type label preset that the regular expression in field has, using the regular expression with default type label as the text message regular expression in association field, wherein, preset database can include but not limited to the regular expression with preset kind label in multiple fields.

In some optional implementations of the present embodiment, type label identification module is configured to further: identify the position relationship between verb and noun and verb and the noun in entity information from the feature extracted；And expression formula matching module is configured to further: associate the position relationship between verb and noun and verb and the noun preset that the regular expression in field has in response in the position relationship coupling initialized data base between verb and noun and verb and the noun identified, using the regular expression of position relationship that has between default verb and noun and verb and noun as the text message regular expression in association field.

It will be understood by those skilled in the art that the semantic device 300 of above-mentioned parsing spoken language text information also includes some other known features, such as processor, memory etc..

Should be appreciated that all modules described in device 300 are corresponding with each step in the method described with reference to Fig. 2.Thus, the operation and the feature that describe above with respect to the semantic method resolving spoken language text information are equally applicable to device 300 and the module wherein comprised, and do not repeat them here.Corresponding module in device 300 can cooperate to realize the scheme of the embodiment of the present application with the module in terminal device and/or server.

Below with reference to Fig. 4, it illustrates the structural representation of the computer system 400 being suitable to terminal device or server for realizing the embodiment of the present application.

As shown in Figure 4, computer system 400 includes CPU (CPU) 401, and it can be loaded into the program random access storage device (RAM) 403 and perform various suitable action and process according to the program being stored in read-only storage (ROM) 402 or from storage part 408.In RAM 403, also storage has system 400 to operate required various programs and data.CPU401, ROM 402 and RAM 403 is connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to bus 404.

It is connected to I/O interface 405: include the importation 406 of keyboard, mouse etc. with lower component；Output part 407 including such as cathode-ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage part 408 including hard disk etc.；And include the communications portion 409 of the NIC of such as LAN card, modem etc..Communications portion 409 performs communication process via the network of such as internet.Driver 410 is connected to I/O interface 405 also according to needs.Detachable media 411, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged on driver 410 as required, in order to the computer program read from it is mounted into storage part 408 as required.

Especially, according to embodiment of the disclosure, the process described above with reference to flow chart may be implemented as computer software programs.Such as, embodiment of the disclosure and include a kind of computer program, it includes the computer program being tangibly embodied on machine readable media, and described computer program comprises the program code for performing the method shown in flow chart.In such embodiments, this computer program can be downloaded and installed from network by communications portion 409, and/or is mounted from detachable media 411.

Flow chart in accompanying drawing and block diagram, it is illustrated that according to system, architectural framework in the cards, function and the operation of method and computer program product of the various embodiment of the application.In this, each square frame in flow chart or block diagram can represent a module, program segment or a part for code, and a part for described module, program segment or code comprises the executable instruction of one or more logic function for realizing regulation.It should also be noted that at some as in the realization replaced, the function marked in square frame can also occur to be different from the order marked in accompanying drawing.Such as, two square frames succeedingly represented can essentially perform substantially in parallel, and they can also perform sometimes in the opposite order, and this is depending on involved function.It will also be noted that, the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, can realize by the special hardware based system of the function or operation that perform regulation, or can realize with the combination of specialized hardware with computer instruction.

It is described in the embodiment of the present application involved module to realize by the way of software, it is also possible to realize by the way of hardware.Described module can also be arranged within a processor, for example, it is possible to be described as: a kind of processor includes connecing characteristic extracting module, and field determines module, weight determination module, and score value determines module, expression formula acquisition module and parsing text module.Wherein, the title of these modules is not intended that the restriction to this module itself under certain conditions, and such as, characteristic extracting module is also described as " the spoken language text information received carrying out participle to extract the module of feature ".

As on the other hand, present invention also provides a kind of nonvolatile computer storage media, this nonvolatile computer storage media can be the nonvolatile computer storage media described in above-described embodiment included in device；Can also be individualism, be unkitted the nonvolatile computer storage media allocating in terminal.Above-mentioned nonvolatile computer storage media storage has one or more program, when one or more program is performed by an equipment so that described equipment: the spoken language text information received is carried out participle to extract feature；The association field of spoken language text information is determined by the noun in the feature extracted；In response to the default feature associating field in the database that the characteristic matching extracted is preset, the default feature weighted value in association field is defined as the feature of the extraction weighted value in association field, wherein, the database preset can include but not limited to preset the feature weighted value in multiple fields, and multiple fields can include but not limited to association field；Based on the feature the extracted weighted value in association field, determine the text message score value at the regular expression in association field；Score value is ranked up, obtains the regular expression of predetermined number according to the result of sequence；Using the regular expression that obtains as the parsing text of spoken language text information.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Skilled artisan would appreciate that, invention scope involved in the application, it is not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, also should contain in the case of without departing from described inventive concept simultaneously, above-mentioned technical characteristic or its equivalent feature carry out being combined and other technical scheme of being formed.Such as features described above and (but not limited to) disclosed herein have the technical characteristic of similar functions and replace mutually and the technical scheme that formed.

Claims

1. resolve a semantic method for spoken language text information, including:

The spoken language text information received is carried out participle to extract feature；

The association field of described spoken language text information is determined by the noun in the feature extracted；

Presetting of field is associated in response to described in the database that the characteristic matching of described extraction is preset Feature, is defined as the spy of described extraction by the described default feature weighted value in described association field Levying the weighted value in described association field, wherein, described default database includes presetting feature At the weighted value in multiple fields, the plurality of field includes described association field；

Feature based on described extraction, at the weighted value in described association field, determines described text envelope The score value of the breath regular expression in described association field；

Described score value is ranked up, obtains the regular expressions of predetermined number according to the result of sequence Formula；

Using the regular expression that obtains as the parsing text of described spoken language text information.

Method the most according to claim 1, it is characterised in that described default feature is many The weighted value in individual field is determined by following process:

The number of times default feature occurred in each field in multiple fields is divided by occurring presetting Total word number of the text message sample of feature, obtains presetting what feature occurred in each field Frequency；

Will appear from the quantity of text message sample of described default feature divided by total text message sample Quantity, obtain the reverse document-frequency of described default feature, wherein, described appearance is described pre- If the text message sample of feature and described total text message sample are by the spoken language resolving semanteme The historical data of text message obtains；

Described default feature is multiplied by the inverse of described default feature in the frequency that each field occurs To document-frequency, obtain presetting the feature weighted value in each field, and according to described pre- If feature is at the weighted value in each field, obtain the described default feature weight in multiple fields Value.

3. according to the method described in claim 1 or 2 any one, it is characterised in that described Feature based on described extraction, at the weighted value in described association field, determines that described text message exists The score value of the regular expression in described association field includes:

In described association field, the feature of described extraction will be hit the feature of regular expression Weighted value be added, obtain the described text message regular expression in described association field point Value.

Method the most according to claim 3, it is characterised in that described carry in response to described Associate the default feature in field described in the database that the characteristic matching taken is preset, preset described The feature weighted value in described association field is defined as the feature of described extraction in described association field Weighted value include: filter in the feature of described extraction hit and preset the feature filtering vocabulary, Feature after filtration；In response to described in the database that the characteristic matching after described filtration is preset The default feature in association field, determines described default feature at the weighted value in described association field For the weighted value in described association field of the feature after described filtration；And

Described in described association field, the feature of described extraction will be hit regular expression The weighted value of feature is added, and obtains the described text message regular expression in described association field Score value include: in described association field, will in the feature after described filtration hit canonical table The weighted value of the feature reaching formula is added, and obtains the score value of the regular expression of described text message.

Method the most according to claim 4, it is characterised in that described based on described extraction The feature weighted value in described association field, determine that described text message is in described association field The score value of regular expression also include:

The text message regular expression in described association field is obtained by following steps:

The type label of entity information is identified from the feature of described extraction；

In response to the canonical table associating field described in the type label coupling initialized data base identified Reach the type label preset that formula has, the regular expression with default type label is made For the described text message regular expression in described association field, wherein, described preset number The regular expression with preset kind label in the plurality of field it is included according to storehouse.

Method the most according to claim 5, it is characterised in that described from described extraction Feature identifying, the type label of entity information includes: from the feature of described extraction, identify entity Position relationship between the verb of information, noun and verb and noun；And

Described coupling in response to the type label identified, is just associating field described in initialized data base The type label preset that then expression formula has, will have the regular expressions of default type label Formula includes at the regular expression in described association field as described text message: in response to identification Verb, noun and verb and noun between position relationship coupling initialized data base described in Between verb, noun and verb and the noun preset that the regular expression in association field has Position relationship, the position that has between default verb, noun and verb and noun is closed The regular expression of system is as the described text message regular expression in described association field.

7. resolve a semantic device for spoken language text information, including:

Characteristic extracting module, for carrying out participle to extract feature to the spoken language text information received；

Field determines module, for being determined that described spoken language text is believed by the noun in the feature extracted The association field of breath；

Weight determination module, in the database preset in response to the characteristic matching of described extraction The default feature in described association field, by the described default feature weighted value in described association field It is defined as the feature of the described extraction weighted value in described association field, wherein, described default Database includes presetting the feature weighted value in multiple fields, and the plurality of field includes described pass Connection field；

Score value determines module, for the feature based on the described extraction weight in described association field Value, determines the described text message score value at the regular expression in described association field；

Expression formula acquisition module, for being ranked up described score value, obtains according to the result of sequence Take the regular expression of predetermined number；

Resolving text module, the regular expression being used for obtaining is as described spoken language text information Parsing text.

Device the most according to claim 7, it is characterised in that described weight determination module In the described default feature weighted value in multiple fields by determining with lower module:

Frequency of occurrences acquisition module, is used for default feature in each field in multiple fields The number of times occurred, divided by there is presetting total word number of the text message sample of feature, is preset The frequency that feature occurs in each field；

Reverse document-frequency acquisition module, for will appear from the text message sample of described default feature This quantity, divided by the quantity of total text message sample, obtains the reverse file of described default feature Frequency, wherein, the text message sample of the described default feature of described appearance and described total text Message sample is obtained by the historical data of the spoken language text information resolving semanteme；

Weighted value acquisition module, for the frequency described default feature occurred in each field It is multiplied by the reverse document-frequency of described default feature, obtains presetting the feature power in each field Weight values, and according to the described default feature weighted value in each field, obtain described presetting Feature is at the weighted value in multiple fields.

9. according to the device described in claim 7 or 8 any one, it is characterised in that described Score value determines that module includes:

It is added submodule, in described association field, will the feature of described extraction be hit The weighted value of the feature of regular expression is added, and obtains described text message in described association field The score value of regular expression.

Device the most according to claim 9, it is characterised in that described weight determines mould Block includes: feature filters submodule, and in the feature filtering described extraction, hit is preset and filtered The feature of vocabulary, the feature after being filtered；Weight determines submodule, in response to described The default feature in field is associated, by described described in the database that characteristic matching after filtration is preset Preset feature feature after the weighted value in described association field is defined as described filtration in described pass The weighted value in connection field；And

Described addition submodule includes: in described association field, by the feature after described filtration The weighted value of the feature of middle hit regular expression is added, and obtains the canonical table of described text message Reach the score value of formula.

11. devices according to claim 10, it is characterised in that described score value determines mould Block also includes:

Expression formula determines module, including:

Type label identification module, for identifying the class of entity information from the feature of described extraction Type label；

Expression formula matching module, in response in the type label coupling initialized data base identified The type label preset that the regular expression in described association field has, will have default class The regular expression of type label is as the described text message regular expressions in described association field Formula, wherein, what described preset database was included in the plurality of field has preset kind mark The regular expression signed.

12. devices according to claim 11, it is characterised in that described type label is known Other module is further used for: identify the verb in entity information, name from the feature of described extraction Position relationship between word and verb and noun；And

Described expression formula matching module is further used for: in response to identify verb, noun and The canonical table in field is associated described in position relationship coupling initialized data base between verb and noun Reach the position relationship between verb, noun and verb and the noun preset that formula has, will tool The regular expression having the position relationship between default verb, noun and verb and noun is made For the described text message regular expression in described association field.