Specific embodiment
The disclosure is described in further detail with embodiment with reference to the accompanying drawing.It is understood that this place
The specific embodiment of description is only used for explaining related content, rather than the restriction to the disclosure.It also should be noted that being
Convenient for description, part relevant to the disclosure is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can
To be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with embodiment.
The conversational system of the interactive system of such as Task may include speech recognition module, natural language understanding mould
Block, dialogue management module, spatial term module and voice synthetic module etc..
Natural language understanding module can be used for carrying out semantic parsing to the natural language text that speech recognition module exports,
Non-structured natural language text is resolved to the structural knowledge for meeting natural language understanding agreement.
Natural language understanding agreement may include vertical field, and field is intended to, semantic three category information of slot.
Conversational system may include one or more vertical fields, and vertical field indicates natural language text fields,
Such as: natural language text " daphne odera for playing Zhou Jielun " belongs to music field, natural language text " checks Pekinese's tomorrow
Weather " belongs to weather field, natural language text " Tian An-men is gone in navigation " belongs to navigation field.Each vertical field has accordingly
Training corpus for training vertical domain classification model.
One vertical field may include that one or more fields are intended to, and field is intended to indicate in vertical field, natural
Language text it is specifically intended, for example, in weather field, natural language text " raining in Beijing tomorrow ", which belongs to, to be asked whether
Rainy intention, natural language text " air quality good or not today " belong to the intention of inquiry air quality, natural language text
This " blowing in Beijing " belongs to the intention for asking whether wind.
One vertical field includes one or more semantic slots, and semantic slot indicates the natural language text in vertical field
The actual conditions of restriction, for example, generally comprising " time " and " place " two kinds of semantic slots, natural language text in weather field
This " raining in Beijing tomorrow " defines that " time " condition is " tomorrow " and " place " condition is " Beijing ", natural language text
" air quality good or not today " defines that " time " condition is " today ", natural language text " blowing in Beijing " defines
" place " condition is " Beijing ".
Many vertical fields all include the related semantic slot in address, such as vertical field of navigating, the vertical field in restaurant, hotel hang down
Straight field etc., such as: in vertical field of navigating, " [location of Haidian District Zhongguancun Street 9] is removed in navigation " includes semanteme
Slot [location of Haidian District Zhongguancun Street 9];[street Xin Zhongguan and the friendship of Haidian street " are asked in vertical field at the restaurant
Prong location] restaurant " include semantic slot [street Xin Zhongguan and Haidian street intersection location];It is vertical in hotel
In field, hotel: " searching the hotel of [7 layers of location of Haidian District New Zhongguan Building] nearby " [Haidian District is new comprising semantic slot
7 floor location of the mansion Zhong Guan].
According to the first embodiment of the disclosure, a kind of address resolution method is provided, as shown in Figure 1, comprising: participle step
Rapid S11, annotation step S12, extraction step S13 and training step S14.
In step s 11, word segmentation processing is carried out to the corpus of the acquisition of original language material etc., to obtain as participle language
The word of material.Wherein, the corpus of the acquisition can be the corpus in corpus.For example, before this step can also include collecting
The step of corpus, can obtain the original language material information of text formatting, such as " Duo Fujie and Hu Xilulu from corpus
Mouth ", " Chaoyang District Beijing ", " Daheng Technology Building of Suzhou Street, Haidian District, Beijing City 3 " etc..
The exemplary diagram of the participle in the case of above-mentioned example is shown in Fig. 2.In Fig. 2, by original language material " Duo Fujie and lake
West Road crossing " participle is participle corpus " Duo Fujie ", "AND", " lake West Road " and " crossing " four words;By original language material " Beijing
Chaoyang District " participle is participle corpus " Beijing " and " Chaoyang District " two words;And by original language material " Suzhou Street, Haidian District, Beijing City
No. 3 Daheng Technology Buildings " participle is participle corpus " Beijing ", " Haidian District ", " Su Zhoujie ", " 3 ", " number " and " Daheng's science and technology
The word of mansion " six.
In an optional embodiment of the disclosure, the participle tool that can be used is Stanford CoreNLP.When
So, those skilled in the art should understand that other participle tools can also be selected, and the disclosure is not construed as limiting this.
In step s 12, a semantic slot is marked to each word according to address division mode respectively.
A kind of example of fine granularity address division mode suitable for conversational system is shown in Fig. 3, such as can be applicable in
In Task interactive system.Schematically illustrated in Fig. 3 by address according to level be divided into 11 types " country ",
" province/autonomous region/autonomous prefecture (western countries) ", " general city/municipality directly under the Central Government/special administrative region/Taiwan cities and counties ", " county-level city/county
City/districts under city administration ", " commercial circle ", " street/national highway/provincial highway ", " street architecture number ", " place/building/mechanism/shop/cell tool
Body name ", " details, doorplate building number, floor, azimuth information ", " small towns " and " rural area ".It should be noted that the division
Only exemplary division, those skilled in the art can realize other division modes according to the actual situation.Show in Fig. 3
Go out and distinguished matched semantic slot for the address after dividing, such as above-mentioned division mode, has matched semantic slot respectively
“country”、“province”、“city”、“county”、“business_district”、“street”、“street_
number","name","detail","town","village".Certainly the representation of semanteme slot can also use its other party
Formula indicates.
The address hierarchical relationship that fine granularity corresponding with above-mentioned division mode address divides agreement may refer to Fig. 4.Such as
Shown in Fig. 4, successively address hierarchical relationship is indicated from high to low from top to bottom.And according to user's use habit of conversational system,
In the address level below county-level city/county town/districts under city administration, it is divided into Liang Ge agreement branch, is " business_ respectively
The commercial circle district "-" street street/national highway/provincial highway "-" street_number street architecture number "-" place name/it builds
Build/mechanism/specific name of shop/cell " and " small towns town "-" rural area village ".
In an embodiment of the disclosure, agreement can be divided according to address above mentioned to mark one respectively to each word
A semanteme slot.As shown in figure 5, the semantic slot of " Duo Fujie " is labeled as " street ".In addition association is divided for being not belonging to address
The word of semanteme slot shown in view also marks semantic slot, such as " crossing " shown in Fig. 5 and is not belonging to divide in Fig. 3 and 4
Semantic slot individually marks the word for the semantic slot being not belonging in the division mode of address, at this moment can be not belonging to these draw
The word of the semantic slot divided is labeled as such as " other " (word " number " in Fig. 5 is also labeled as " other ").Wherein the mark can be with
Using artificial mask method or automatic marking method.
In step s 13, feature extraction processing is carried out to each word, including name entity is extracted by name Entity recognition
Feature and pass through part-of-speech tagging extract part of speech label characteristics.
According to the disclosure optional embodiment, the name Entity recognition tool and part-of-speech tagging tool that can be used
For Stanford CoreNLP.Such as when using Stanford CoreNLP being named Entity recognition, LOC (place)/NUM
(number)/O (a part that the word is not belonging to name entity) " etc. is named in entity tag system for Stanford CoreNLP
Label, and when carrying out part-of-speech tagging using Stanford CoreNLP, (the side NR (proper noun)/NN (other nouns)/LC
Position word)/CC (conjunction arranged side by side)/OD (sequence word) etc. is the label in Stanford CoreNLP part of speech label system.Such as figure
Shown in 6, the name entity tag of the word of address formats such as " Duo Fujie ", " Beijing ", " Beijing " is LOC, "AND", " crossing ",
The name entity tag that " number ", " Daheng Technology Building " etc. are not belonging to the word of name entity a part is O, and the digital shape such as " 3 "
The name entity tag of the word of formula is NUM;The part of speech mark of the word of proper nouns form such as " Duo Fujie ", " Beijing ", " Beijing "
Label be NR, "AND" etc. side by side connection word form word part of speech label be CC, " crossing ", " number ", " Daheng Technology Building " etc. its
The part of speech label of the word of his occlusion is NN, and the part of speech label of the word of the sequences word form such as " 3 " is OD etc..
In step S14, according to treated, participle corpus is trained to obtain address resolution model.For example, at this
CRF++ Open-Source Tools can be used in open, come training corpus is trained according to treated in step S13, to generate
Address resolution model.
In accordance with one embodiment of the present disclosure, as shown in fig. 7, address resolution method includes participle step S71, mark step
Rapid S72, extraction step S73, switch process S74 and training step S75.
Compared with above-mentioned address resolution method, switch process S74 is increased in this embodiment.Remaining participle
Step S71, annotation step S72, extraction step S73 and training step S75 can respectively with the participle step in above-mentioned method
S11, annotation step S12, extraction step S13 and training step S14 are identical.
In some cases, since the format needs by extraction step treated corpus format, so as to
Generation meets format required for training step.
For example, illustrating that the disclosure can be used CRF++ Open-Source Tools to carry out participle corpus in the above example
Training is to obtain address resolution model.It, can be by data conversion that step S73 is generated at CRF++ institute at this point, in step S74
The format of support.Fig. 8 is shown Fig. 6 is formatted obtained from data format.In fig. 8, first it is classified as participle language
Material, second is classified as name entity tag, and third is classified as part of speech label, and the 4th is classified as the semantic slot of mark.
Then the data after format conversion are trained by training step S73.Such as it is being converted into CRF++ support
In the case where data format, it is trained using CRF++.
According to the second embodiment of the disclosure, a kind of address resolution method is additionally provided.As shown in figure 9, this method can
To include: to understand step S91, participle step S92, extraction step S93 and analyzing step S94.
In step S91, natural language understanding is carried out to natural-sounding text.
Judge the related semantic slot in address, such as location whether are related in natural language understanding result, if be related to
The related semantic slot in address, then carry out subsequent address resolution processing.If be not related to, at subsequent address resolution
Reason.For example, natural language understanding result is that " [location of the street Haidian District China Guan Cun 9] is removed in navigation " is then thought as
The related semantic slot in address, and language understanding result is " several points now ", is not at this moment related to the related semantic slot in address then.
In step S92, word segmentation processing is carried out to the text for being related to the related semantic slot in address.For the tool of the word segmentation processing
Gymnastics is made, and is referred to the mode of above-mentioned step S11 to carry out.
In step S93, name substance feature is extracted by name Entity recognition to each word after participle and passes through word
Property mark extract part of speech label characteristics.It extracts name substance feature and extracts the specific processing mode of part of speech label characteristics, it can be with
It is carried out referring to the mode in above-mentioned steps S13.
In step S94, address resolution is carried out using address resolution model, is wherein drawn in address resolution model according to address
Each word of the text of the related semantic slot in address is labeled with a semantic slot by point mode.The address resolution model can be root
According to the first embodiment of disclosure address resolution model generated, that is to say, that according to suitable Task interactive system
Fine granularity address divide agreement and the address resolution model that generates.
Finally, being then output address parsing result.
According to the third embodiment of the disclosure, a kind of address resolution method is additionally provided.As shown in Figure 10, this method
Including step S101, S102, S103, S104 and S105.
In step s101, natural language understanding is carried out to natural-sounding text, obtains hanging down belonging to natural language text
Straight field and semantic slot.
If understand semantic slot related there are address in result naturally, step S102 is carried out, it is in step s 102, right
The text of the related semantic slot in address carries out word segmentation processing.Concrete operations for the word segmentation processing are referred to above-mentioned step
The mode of S11 carries out.
In step s 103, name substance feature is extracted by name Entity recognition to each word after participle and passes through word
Property mark extract part of speech label characteristics.For extracting name substance feature and extracting the specific processing mode of part of speech label characteristics,
The mode in above-mentioned steps S13 is referred to carry out.
In step S104, address resolution is carried out using address resolution model, wherein basis in the address resolution model
Each word of the text of the related semantic slot in the address is labeled with a semantic slot by address division mode.The address resolution mould
Type can be the first embodiment address resolution model generated according to the disclosure, that is to say, that according to suitable Task people
The fine granularity address of machine conversational system divides agreement and the address resolution model that generates.
In step s105, according to address parsing result, subsequent processing is carried out in corresponding vertical field.
Below in conjunction with specific example, referring to Figure 11, third embodiment to be described in detail.
With reference first to process labelled in solid arrow, user says " Haidian District Zhongguancun Street 9 is gone in navigation ", passes through
Speech recognition technology identifies that the natural language text of user is " Haidian District Zhongguancun Street 9 is gone in navigation ".Use nature language
Speech understands that NLU (Natural Language Understanding) obtains the affiliated vertical field and ground of the natural language text
The related semantic slot in location.At this moment vertical field is [vertical field of navigating], and the related semantic slot in address is [Haidian District Zhongguancun Street
No. 9 location], the result of natural language understanding is input in general address parsing module, successively by as described above
Participle, name Entity recognition and part-of-speech tagging processing, by treated, result is input in general address analytic modell analytical model.Universally
Location analytic modell analytical model can according to first embodiment generate model.By the dissection process of general address analytic modell analytical model come defeated
Parsing result out, parsing result at this time is input into corresponding vertical field, such as navigation correlation APP etc..
As shown in phantom in Figure 11, when user says " restaurant for asking for the street Xin Zhongguan and Haidian street intersection ",
By speech recognition technology, identify that the natural language text of user is " to ask for the street Xin Zhongguan and Haidian street intersection
Restaurant ".Obtaining vertical field belonging to the natural language text using NLU is [the vertical field in restaurant] and address correlative
Adopted slot is [street Xin Zhongguan and Haidian street intersection location], and the result of natural language understanding is input to general address
In parsing module, successively pass through participle, name Entity recognition and part-of-speech tagging processing as described above, result is defeated by treated
Enter into general address analytic modell analytical model.Parsing result, solution at this time are exported by the dissection process of general address analytic modell analytical model
Analysis result is input into corresponding vertical field, such as restaurant correlation APP etc..
Can be clearly understood from from Figure 11, the vertical field in vertical field and restaurant of navigating used it is same universally
Location parsing module.Figure 11 is merely illustrative, and the same general address solution can be used in multiple vertical fields or all vertical fields
Module is analysed, address resolution module can be separately maintained in this way to avoid for each vertical field, the maintenance of system can be reduced in this way
Cost.
According to the 4th of the disclosure the, embodiment there is provided a kind of address analyzing devices.As shown in Figure 12, address resolution
Model generating means 120 may include word segmentation module 121, labeling module 122, extraction module 123 and training module 124.
Word segmentation module 121 carries out word segmentation processing to the corpus of acquisition, to obtain the word as participle corpus.Labeling module
122, a semantic slot is marked to each word according to address division mode respectively.Extraction module 123 carries out feature to each word and mentions
Processing is taken, including name substance feature is extracted by name Entity recognition and part of speech label characteristics are extracted by part-of-speech tagging.Instruction
Practice module 124, participle corpus is trained to obtain address resolution model according to treated.
It can also include format converting module according to the disclosure optional embodiment, it is defeated for extraction module 123
Data out are formatted to meet the call format of training module 124.
Wherein concrete operations conducted in above-mentioned modules can be identical with method described in first embodiment.
For brevity, details are not described herein.
According to the 5th of the disclosure the, embodiment there is provided a kind of address analyzing devices.As shown in Figure 13, address solution
Analysing module 130 may include:
Understanding Module 131 carries out natural language understanding to natural-sounding text;
Word segmentation module 132 carries out address correlative justice slot when understanding semantic slot related there are address in result naturally
Word segmentation processing;
Extraction module 133 extracts name substance feature by name Entity recognition to each word after participle and passes through word
Property mark extract part of speech label characteristics;And
Parsing module 134 carries out address resolution using address resolution model, is wherein drawn in address resolution model according to address
Each word of the text of the related semantic slot in address is labeled with a semantic slot by point mode.
Wherein concrete operations conducted in above-mentioned modules can be identical with method described in second embodiment.
For brevity, details are not described herein.
According to the sixth embodiment of the disclosure, a kind of address analyzing device is provided.As shown in Figure 14, the reusable
Address analyzing device 140 may include:
Understanding Module 141 carries out natural language understanding to natural-sounding text, obtains vertical belonging to natural language text
Field and semantic slot;
Word segmentation module 142 carries out address correlative justice slot when understanding semantic slot related there are address in result naturally
Word segmentation processing;
Extraction module 143 extracts name substance feature by name Entity recognition to each word after participle and passes through word
Property mark extract part of speech label characteristics;
Parsing module 144 carries out address resolution using address resolution model, is wherein drawn in address resolution model according to address
Each word of the text of the related semantic slot in address is labeled with a semantic slot by point mode;And
Processing module 145 carries out subsequent processing in corresponding vertical field according to address parsing result.
Wherein concrete operations conducted in above-mentioned modules can be identical with method described in third embodiment.
For brevity, details are not described herein.
To sum up, agreement is divided according to the general address of the conversational system of the disclosure, can be compatible with and be related to the master of address parameter
API is flowed, and according to the reusable based on CRFs (Conditional Random Fields, condition random field) of the disclosure
General address resolver, can be avoided and separately maintain an address resolution module in each vertical field, reduce system
Maintenance cost.
In addition, it is necessary to which explanation, the vertical field such as restaurant, hotel, navigation mentioned in addition to disclosure citing, are also suitable
It is related to the vertical field of place semanteme slot in searching map, looking for movie theatre, look into public transport, plane ticket, train ticket etc..
The disclosure also provides a kind of computer equipment, and as shown in figure 15, which includes: communication interface 1000, memory
2000 and processor 3000.Communication interface 1000 carries out data interaction for being communicated with external device.Memory
The computer program that can be run on processor 3000 is stored in 2000.Processor 3000 executes real when the computer program
Method in existing above embodiment.The quantity of the memory 2000 and processor 3000 can be one or more.
Memory 2000 may include high speed RAM memory, can also further include nonvolatile memory (non-
Volatile memory), a for example, at least magnetic disk storage.
If communication interface 1000, memory 2000 and the independent realization of processor 3000, communication interface 1000, memory
2000 and processor 3000 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard
Architecture (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral
Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard
Component) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for expression, the figure
In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if communication interface 1000, memory 2000 and processor 3000 are integrated in one
On block chip, then communication interface 1000, memory 2000 and processor 3000 can be completed each other by internal interface
Communication.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the disclosure includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the disclosure
Embodiment person of ordinary skill in the field understood.Processor executes each method as described above and processing.
For example, the method implementation in the disclosure may be implemented as computer software programs, being tangibly embodied in machine can
Read medium, such as memory.In some embodiments, some or all of of computer software programs can be via memory
And/or communication interface and be loaded into and/or install.When computer software programs are loaded into memory and are executed by processor,
One or more steps in method as described above can be executed.Alternatively, in other embodiments, processor can lead to
It crosses other any modes (for example, by means of firmware) appropriate and is configured as executing one of above method.
Expression or logic and/or step described otherwise above herein in flow charts, may be embodied in any
In computer-readable medium, for instruction execution system, device or equipment (such as computer based system, including processor
System or other can be from instruction execution system, device or equipment instruction fetch and the system executed instruction) use, or combine these
Instruction execution system, device or equipment and use.
For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory
(CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie
Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media
Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the disclosure can be realized with hardware, software or their combination.In above-mentioned embodiment party
In formula, multiple steps or method can carry out reality in memory and by the software that suitable instruction execution system executes with storage
It is existing.It, and in another embodiment, can be in following technology well known in the art for example, if realized with hardware
Any one or their combination are realized: having a discrete logic for realizing the logic gates of logic function to data-signal
Circuit, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), field-programmable gate array
Arrange (FPGA) etc..
Those skilled in the art are understood that realize all or part of the steps of above embodiment method
It is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer readable storage medium
In, which when being executed, includes the steps that one or a combination set of method implementation.
In addition, can integrate in a processing module in each functional unit in each embodiment of the disclosure, it can also
To be that each unit physically exists alone, can also be integrated in two or more units in a module.It is above-mentioned integrated
Module both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module
If in the form of software function module realize and when sold or used as an independent product, also can store one calculating
In machine readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
In the description of this specification, reference term " an embodiment/mode ", " some embodiment/modes ",
The description of " example ", " specific example " or " some examples " etc. means the embodiment/mode or example is combined to describe specific
Feature, structure, material or feature are contained at least one embodiment/mode or example of the application.In this specification
In, schematic expression of the above terms are necessarily directed to identical embodiment/mode or example.Moreover, description
Particular features, structures, materials, or characteristics can be in any one or more embodiment/modes or example in an appropriate manner
In conjunction with.In addition, without conflicting with each other, those skilled in the art can be by different implementations described in this specification
Mode/mode or example and different embodiments/mode or exemplary feature are combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present application, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
It will be understood by those of skill in the art that above embodiment is used for the purpose of clearly demonstrating the disclosure, and simultaneously
Non- be defined to the scope of the present disclosure.For those skilled in the art, may be used also on the basis of disclosed above
To make other variations or modification, and these variations or modification are still in the scope of the present disclosure.