WO2022160445A1 - 语义理解方法、装置、设备及存储介质 - Google Patents

语义理解方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022160445A1
WO2022160445A1 PCT/CN2021/082961 CN2021082961W WO2022160445A1 WO 2022160445 A1 WO2022160445 A1 WO 2022160445A1 CN 2021082961 W CN2021082961 W CN 2021082961W WO 2022160445 A1 WO2022160445 A1 WO 2022160445A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
feature
language
understood
semantic understanding
Prior art date
Application number
PCT/CN2021/082961
Other languages
English (en)
French (fr)
Inventor
苏志铭
刘权
陈志刚
刘聪
胡国平
Original Assignee
科大讯飞股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科大讯飞股份有限公司 filed Critical 科大讯飞股份有限公司
Publication of WO2022160445A1 publication Critical patent/WO2022160445A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • the present application relates to the technical field of natural language processing, and more specifically, to a semantic understanding method, apparatus, device and storage medium.
  • Semantic understanding technology usually consists of two parts, one is intent understanding, which determines the user's intent, and the other is slot extraction, which extracts intent-related entities from user requests. .
  • the present application is proposed to provide a semantic understanding method, apparatus, device and storage medium, so as to realize the semantic understanding of an input request.
  • the specific plans are as follows:
  • a semantic understanding method comprising:
  • the obtaining entity words matching the to-be-understood text are used as matching entity words, including:
  • An entity word in the entity library that matches the to-be-understood text is determined as a matching entity word.
  • the process of acquiring the language feature of the language to which the text to be understood belongs includes:
  • the language embedding feature matrix includes the respective embedding feature representations corresponding to each language
  • the embedded feature representation corresponding to the language to which the text to be understood belongs is searched in the language embedded feature matrix, as the language feature of the language to which the text to be understood belongs.
  • the determination of the fusion text based on the to-be-understood text and the matched entity word includes:
  • the to-be-understood text and the matched entity words are spliced to obtain a fusion text.
  • determining the semantic understanding result of the to-be-understood text based on the fusion text and the language feature includes:
  • the semantic understanding result of the text to be understood is determined based on at least the entity type feature, the text feature and the language feature of each constituent unit.
  • the process of acquiring entity type features of each constituent unit in the fusion text includes:
  • entity type embedded feature matrix includes embedded feature representations corresponding to each entity type in the corresponding scene
  • the embedded feature representation corresponding to each constituent unit in the fusion text is searched in the entity type embedded feature matrix as the entity type feature of each constituent unit.
  • determining the semantic understanding result of the text to be understood based on the fusion text and the language feature further includes:
  • the semantic understanding result of the text to be understood is determined.
  • the process of acquiring the positional features of each constituent unit in the fused text includes:
  • the position embedding feature matrix includes embedding feature representations corresponding to each position number
  • the embedded feature representation corresponding to the position number of each constituent unit in the fusion text is searched in the position embedded feature matrix, as the position feature of each constituent unit.
  • determining the semantic understanding result of the to-be-understood text based on the fusion text and the language feature includes:
  • the fusion text and the language feature are processed by using a pre-trained semantic understanding model to obtain a semantic understanding result of the text to be understood output by the semantic understanding model.
  • the process of using a semantic understanding model to process the fusion text and the language features to obtain a semantic understanding result includes:
  • the embedded features of each constituent unit in the fusion text are obtained, and the embedded features at least include: text features, language features, entity type features, text features and language features in location features;
  • the embedded features of each constituent unit are coded to obtain coding features
  • the encoding feature is processed to obtain the output intent
  • the encoding feature is processed to obtain the slot type marked by each component unit in the text to be understood.
  • the process of using a semantic understanding model to process the fusion text and the language features to obtain a semantic understanding result further includes:
  • the encoding feature and the preconfigured slot embedding feature matrix are used for attention calculation to obtain a new encoding feature fused with the slot embedding feature matrix;
  • the slot embedding feature matrix contains the The embedded feature representation corresponding to each type of semantic slot in the scene to which the text to be understood belongs;
  • the new coding feature is processed to obtain the slot type marked by each component unit in the text to be understood.
  • parameter initialization is performed based on a pre-trained cross-language mask language model
  • the training text and the language features of the language to which it belongs are used as sample input, and the training is performed with the goal of predicting the occluded characters in the training text.
  • the semantic understanding model iteratively updates the language embedding feature matrix, the entity type embedding feature matrix, and the location embedding feature matrix;
  • the language embedding feature matrix includes embedding feature representations corresponding to each language
  • the entity type embedded feature matrix includes embedded feature representations corresponding to each entity type in the corresponding scenario
  • the position embedded feature matrix includes embedded feature representations corresponding to each position number respectively.
  • a semantic understanding device comprising:
  • a data acquisition unit configured to acquire an entity word that matches the text to be understood as a matching entity word, and a language feature of the language to which the text to be understood belongs, wherein the matching entity word is the language of the to-be-understood text and the context , the entity words that match the to-be-understood text;
  • a fusion unit configured to determine a fusion text based on the to-be-understood text and the matched entity word
  • a semantic understanding unit configured to determine a semantic understanding result of the text to be understood based on the fusion text and the language feature.
  • a semantic understanding device comprising: a memory and a processor;
  • the memory for storing programs
  • the processor is configured to execute the program to implement the various steps of the semantic understanding method as described above.
  • a storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, each step of the semantic understanding method as described above is implemented.
  • a computer program product when the computer program product runs on a terminal device, the terminal device enables the terminal device to execute each step of the above semantic understanding method.
  • the present application obtains entity words matching the to-be-understood text from various types of entity words in the scene to which the to-be-understood text belongs, and obtains the language feature of the language to which the to-be-understood text belongs, and then based on the to-be-understood text.
  • the text and the matching entity words determine the fusion text, and based on the fusion text and language features, determine the semantic understanding result of the text to be understood.
  • the solution of the present application can carry out semantic understanding for the to-be-understood texts in different languages and different scenarios, and in the process, considering the languages of the to-be-understood texts, it can distinguish the characteristics of different languages, and ensure the understanding of the to-be-understood texts in various languages. semantic understanding effect.
  • this application introduces entity words that match the to-be-understood text in the language and scenario of the to-be-understood text, fuses the matched entity word with the to-be-understood text, and determines the semantic understanding result of the to-be-understood text based on the fused text.
  • the matching entity words in the language and scene of the text to be understood are introduced, so that the semantic understanding scheme of the present application can be applied to the semantic understanding of the to-be-understood text in different languages and scenarios, and can improve the to-be-understood text in different languages and scenarios. semantic understanding accuracy.
  • FIG. 1 is a schematic flowchart of a semantic understanding method provided by an embodiment of the present application.
  • Figure 2a illustrates a schematic diagram of a language embedding feature matrix
  • Figure 2b illustrates a schematic diagram of an entity type embedding feature matrix
  • Figure 2c illustrates a schematic diagram of a starting position embedding feature matrix
  • Figure 2d illustrates a schematic diagram of an end position embedding feature matrix
  • Figure 3 illustrates a schematic diagram of the overall architecture of a semantic understanding model
  • Figure 4 illustrates a schematic diagram of a cross-language mask language model architecture
  • Figure 5 illustrates a schematic diagram of a slot embedding feature matrix
  • FIG. 6 is a schematic structural diagram of a semantic understanding apparatus disclosed in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a semantic understanding device provided by an embodiment of the present application.
  • the present application provides a semantic understanding method capable of handling the task of semantic understanding.
  • the semantic understanding method of the present application supports semantic understanding of information in various languages and scenarios.
  • the solution of the present application can be implemented based on a terminal with data processing capability, and the terminal can be a mobile phone, a computer, a server, a cloud, or the like.
  • the semantic understanding method of the present application may include the following steps:
  • Step S100 acquiring an entity word matching the text to be understood as a matching entity word, and a language feature of the language to which the text to be understood belongs.
  • the text to be understood is the text that needs to be semantically understood.
  • the text to be understood may be the text input by the user, or the recognized text obtained by recognizing the speech input by the user.
  • the text corresponding to the user's request may be determined based on the request sent by the user to the system as the text to be understood.
  • the matching entity word is an entity word that matches the to-be-understood text in the language and the scene to which the to-be-understood text belongs.
  • the present application may configure a corresponding entity library for each scene in each language, and the entity library includes various types of entity words in the corresponding language and scene.
  • typical entity types are “singer”, “song”, “tag” and so on.
  • the corresponding entity words may include the names, nicknames, codes, etc. of various singers, such as “Andy Lau”, “Hua Tsai”, “Jacky Cheung” and so on.
  • the corresponding entity words may include the titles of various songs, such as “forget love”, “kiss goodbye” and so on.
  • the corresponding entity words may include various genres of songs, such as "pop", “rock”, “light music” and so on.
  • each entity word may be stored separately according to the entity word type.
  • the entity library may include the "singer” entity library, "Entity Library.
  • various types of entity words in the entity library can also be mixed and stored together, which is not strictly limited in this application.
  • the language of the text to be understood may be input by the user, or may be determined by analyzing the language of the text to be understood.
  • the scene to which the text to be understood belongs may be input by the user, or the scene to which it belongs may be determined by performing text analysis on the text to be understood. This application does not strictly limit the acquisition process of the language and scene to which the text to be understood belongs.
  • search for an entity library corresponding to the language and scene described in the to-be-understood text After obtaining the language and scene to which the text to be understood belongs, search for an entity library corresponding to the language and scene described in the to-be-understood text. Further, in the found entity library, an entity word matching the text to be understood is determined as a matching entity word.
  • the process of determining the matching entity word may be to match the to-be-understood text with each entity word in the entity library according to the method of string matching, so as to determine the entity word in the entity library that matches the to-be-understood text as the matching entity word.
  • the language feature of the language to which the text to be understood belongs it can identify the language to which the text to be understood belongs.
  • the present application may pre-configure a language embedding feature matrix, and the language embedding feature matrix includes embedding feature representations corresponding to each language. Then, for the language to which the text to be understood belongs, the embedded feature matrix of the language to which it belongs can be queried to find the embedded feature representation corresponding to the language to which the text to be understood belongs, as the language feature.
  • FIG. 2a illustrates a schematic diagram of a language embedding feature matrix.
  • each row represents a language embedding feature representation corresponding to a language
  • the labels on the right are language identifiers corresponding to different languages.
  • zh can be used to represent Chinese
  • en can be used to represent English.
  • Step S110 Determine a fusion text based on the to-be-understood text and the matched entity word.
  • this step can The fusion text of the text to be understood and the matching entity word is determined, so that the subsequent semantic understanding can be performed based on the fusion text.
  • the process of merging the to-be-understood text and the matching entity words may be to directly perform text splicing of the to-be-understood text and the matching entity words, and the obtained spliced text is used as the fused text.
  • the text to be understood is "I want to listen to Zhao Lei's southern girl”
  • the matching entity words include: "Zhao Lei", “Southern” and “Southern girl”.
  • the fusion text obtained after splicing can be "I want to listen to Zhao Lei's southern girl Zhao Lei, the southern girl in the south”.
  • the fusion text obtained in this step also includes the information of the text to be understood, as well as the matching entity word information in the language and scene to which it belongs. That is, the fusion text contains richer information, which is related to the language and scene of the text to be understood, so that it can adapt to the language of the text to be understood and the semantic expression in the scene.
  • semantic understanding is performed based on the fusion text , which can improve the semantic understanding effect.
  • Step S120 Determine a semantic understanding result of the text to be understood based on the fusion text and the language feature.
  • a semantic understanding result of the to-be-understood text is determined based on the fused text and the language features.
  • the semantic understanding result can include any one or both of the following two items:
  • the intent understanding result is the intent category corresponding to the text to be understood. Taking the intent in the Chinese music scene as an example, it may include various types of intents such as “play music” and “cut songs”.
  • the result of slot extraction can be regarded as a sequence labeling process, that is, each constituent unit in the text to be understood is marked with the corresponding slot category.
  • different types of entity words can be used as different slot categories. For example, the above “Zhao Lei”, “" Southern Girl” is a different slot category.
  • the semantic understanding method provided by the embodiments of the present application can perform semantic understanding on the text to be understood in different languages and in different scenarios, and in the process, the language to which the text to be understood belongs is considered, and the characteristics of different languages can be distinguished, so as to ensure the understanding of different languages.
  • this application introduces entity words that match the to-be-understood text in the language and scenario of the to-be-understood text, fuses the matched entity word with the to-be-understood text, and determines the semantic understanding result of the to-be-understood text based on the fused text.
  • the matching entity words in the language and scene of the text to be understood are introduced, so that the semantic understanding scheme of the present application can be applied to the semantic understanding of the to-be-understood text in different languages and scenarios, and can improve the to-be-understood text in different languages and scenarios. semantic understanding accuracy.
  • the process of determining the semantic understanding result of the text to be understood based on the fusion text and the language feature may include:
  • the text feature is a feature representing the text-level meaning of the constituent unit, which may be the word embedding vector of the constituent unit.
  • the word embedding vector of each component unit can be determined as a text feature by querying a dictionary.
  • An entity type feature is a feature that characterizes the entity type that makes up a unit. Since the fusion text is the result of the fusion of the text to be understood and the matching entity word, the matching entity word is used as the constituent unit, and its entity type feature can be the feature representation of the corresponding type of the matching entity word, and the to-be-understood text is used as the constituent unit, Its entity type features can be uniformly represented by the same feature.
  • the present application may preconfigure entity type embedded feature matrices corresponding to different scenarios, and the entity type embedded feature matrix includes embedded feature representations corresponding to each entity type in the corresponding scenario. Then, for each component unit in the fusion text, the embedded feature representation corresponding to each component unit can be found in the entity type embedded feature matrix corresponding to the scene to which the text to be understood belongs, as the entity type feature of each component unit.
  • Each row in Figure 2b represents the entity type embedded feature representation corresponding to one entity type
  • the labels on the right side are the entity type identifiers corresponding to different entity types.
  • O can be used to represent the entity type corresponding to each constituent unit in the text to be understood
  • use song to represent the entity type corresponding to "song” and so on.
  • the entity type feature of each constituent unit in the fusion text is further obtained, and based on the entity type feature, text feature and language feature, the semantic understanding result of the text to be understood is determined.
  • the entity type features of each constituent unit in the fusion text are additionally considered, that is, the reference data for semantic understanding is more abundant, and the semantic understanding of the understood text can be more accurately based on the entity type features.
  • the method of the present application may further include:
  • the position feature represents the position of the constituent unit in the fused text.
  • the position feature of the text to be understood as a constituent unit may be the feature of the absolute position of the constituent unit in the text to be understood.
  • the location feature of the matching entity word can be the feature of the absolute position of the matching entity word in the text to be understood.
  • the position feature may include a start position feature and an end position feature.
  • the fusion text consists of the to-be-understood text "I want to listen to Zhao Lei's southern girl", and the matching entity words "Zhao Lei", “Southern”, and “Southern girl” in sequence. Then the position number of each constituent unit in the fusion text can be expressed as:
  • the above-mentioned different position numbers may correspond to different position features.
  • the present application may pre-configure a position embedded feature matrix, where the position embedded feature matrix includes embedded feature representations corresponding to each position number respectively. Then, for each constituent unit in the fused text, the embedded feature representation corresponding to the position number of each constituent unit in the fused text can be found in the position embedding feature matrix as the position feature of each constituent unit.
  • FIG. 2c and Fig. 2d illustrate a schematic diagram of a start position embedded feature matrix and an end position embedded feature matrix, respectively.
  • each row represents a starting position embedded feature representation corresponding to a starting position number
  • the labels on the right are the position numbers corresponding to different starting positions.
  • the starting position numbers can be sorted sequentially from 0.
  • each row represents an end position embedded feature representation corresponding to an end position number
  • the labels on the right are the position numbers corresponding to different end positions.
  • the end position numbers can be sorted from 0 in sequence.
  • step S2 may include:
  • the semantic understanding result of the text to be understood is determined.
  • the positional feature of each constituent unit in the fusion text is further obtained, and based on the positional feature, entity type feature, text feature and language feature, the semantic understanding result of the text to be understood is determined.
  • the location features of each component in the fusion text are additionally considered, that is, the reference data for semantic understanding is more abundant, and the semantic understanding of the text to be understood can be more accurately performed based on the location features.
  • semantic understanding solutions generally customize the semantic understanding model for a single language and a single scene, that is, each language and each scene needs to deploy a set of semantic understanding models and a set of semantic understanding models. It can only semantically understand the user request of one scene in one language.
  • the process of determining the semantic understanding result of the to-be-understood text based on the fusion text and the language features can be determined by a pre-trained semantic understanding model.
  • the implementation is to use the pre-trained semantic understanding model to process the fusion text and the language feature, so as to obtain the semantic understanding result of the text to be understood output by the semantic understanding model.
  • the semantic understanding model of the present application can be applied to semantic understanding in all scenarios in all languages, and can be trained using data in all scenarios in all languages.
  • the language features of the language to which the fusion text and the text to be understood belong are used as the input of the semantic understanding model.
  • different languages are distinguished by introducing the language features of the language to which the text to be understood belongs, so as to ensure that the model can learn the characteristics of different languages during training.
  • the fusion text fuses the to-be-understood text and the matching entity word
  • the matching entity word is the entity word that matches the to-be-understood text in the language and the scene to which the to-be-understood text belongs.
  • the semantic understanding model provided in this embodiment is a unified model, and the semantic understanding model can realize cross-language and cross-scene semantic understanding.
  • computing resources can be greatly reduced.
  • the semantic understanding model of the present application can mix training data in different languages and different scenarios during training, which can make full use of the semantic commonality of multiple languages, save the amount of data labeling for each language and each scene, and make full use of the quantitative advantages of large languages. , to improve the effect of small languages.
  • the process of using a semantic understanding model to process fused text and language features to obtain a semantic understanding result may refer to the following introduction.
  • a semantic understanding model can include an embedding layer, an encoding layer, an intent understanding layer, and a slot extraction layer.
  • the intent understanding layer and the slot extraction layer can be retained or discarded according to the needs of the task.
  • the intent understanding layer can be retained and the slot extraction layer can be discarded; when the task only needs to perform slot extraction, it can be The slot extraction layer is retained, and the intent understanding layer is discarded.
  • the two structural layers are retained at the same time.
  • the above-mentioned embedding layer can obtain the embedded features of each constituent unit in the fused text.
  • the fused text fuses the to-be-understood text and the matching entity words at the same time
  • each constituent unit in the fused text may include the constituent units of the to-be-understood text and each matching entity word, for example, each character and each matching entity word in the to-be-understood text All can be used as a unit of fusion text.
  • the embedded features may include at least text features and language features. In addition, it may further include entity type features, location features, and the like.
  • each constituent unit in the fusion text the language is the same, that is, it is equivalent to the language to which the text to be understood belongs. Therefore, it can be seen that the language feature of each constituent unit in the fusion text is the same as the language feature of the language to which the text to be understood belongs. Language characteristics of the unit.
  • the determination process of language feature, entity type feature, and location feature uses the preconfigured language embedding feature matrix, entity type embedding feature matrix, and location embedding feature matrix, respectively.
  • the embedded feature matrices of can be iteratively updated with the training of the semantic understanding model. After the training of the semantic understanding model, the three types of embedded feature matrices are fixed.
  • the language features of the language to which the text to be understood belongs can be obtained by querying the language embedding feature matrix; by querying the entity type embedding feature matrix, each component unit in the fusion text can be obtained.
  • the entity type features of the fused text are obtained by querying the position embedding feature matrix to obtain the position features of each constituent unit in the fused text.
  • each embedded feature may be added, and the result of the addition is used as the final embedded feature of each constituent unit.
  • the encoding layer of the semantic understanding model encodes the embedded feature of each constituent unit to obtain the encoded feature.
  • the encoding layer can adopt the Transformer Encoder model structure, or other optional neural network structure.
  • an intent understanding layer and a slot extraction layer are respectively set to realize the tasks of intent understanding and slot extraction respectively.
  • the intent understanding layer processes the encoded features to obtain the output intent.
  • the slot extraction layer processes the encoded features to obtain the slot types marked by each constituent unit in the text to be understood.
  • slot extraction is performed for the text to be understood. Therefore, the present application can extract the coding feature corresponding to the text to be understood from the coding feature corresponding to the fusion text, and send it to the slot extraction layer for processing. , to get the slot type marked by each constituent unit in the text to be understood.
  • the encoding feature of the encoding layer encoding the fusion text is expressed as (h 1 , h 2 ,...,h m ), where the first n encoding features are the encoding features of n characters contained in the text to be understood Representation, the last m-n+1 are the encoded feature representations of matching entity words.
  • the fused text is obtained by splicing the to-be-understood text and the matching entity words, as shown in the Token Embedding layer in Figure 3.
  • the text features of each constituent unit in the fused text are obtained through the character embedding layer.
  • the language feature of each constituent unit in the fusion text is obtained through the language embedding layer (Language Embedding).
  • the corresponding language embedding feature representation can be queried in the language embedding feature matrix shown in Figure 2a.
  • the semantic understanding model can also include an Entity Type Embedding layer, which is used to obtain the entity type features of each constituent unit in the fused text.
  • Entity Type Embedding layer which is used to obtain the entity type features of each constituent unit in the fused text.
  • “O” is used to collectively represent the entity type of each constituent unit in the text to be understood
  • “artist” is used to represent the singer entity type
  • “song” is used to represent the song entity type.
  • the corresponding entity type embedded feature representation can be queried in the entity type embedded feature matrix illustrated in Figure 2b.
  • the semantic understanding model may further include a position embedding layer
  • the position embedding layer may include a start position embedding layer (Start Position Embedding) and an end position embedding layer (End Position Embedding).
  • Two position embedding layers obtain the position features of each component unit in the fused text respectively.
  • Arabic numerals are used to represent the position numbers of the constituent units.
  • the corresponding position embedding feature representation can be queried in the start position embedding feature matrix and the end position embedding feature matrix of the examples in Figure 2c and Figure 2d.
  • the encoding layer can choose Transformer Encoder or other neural network structure.
  • the output coding features of the coding layer are represented as (h 1 , h 2 ,...,h m ), where the first n coding features are the coding feature representations of the n characters contained in the text to be understood, and the last m-n+1 is the encoded feature representation for matching entity words.
  • an intent understanding task processing layer and a slot extraction task processing layer are respectively set.
  • the intent understanding task layer can encode the encoded features (h 1 , h 2 ,..., h m ) output by the encoding layer into a vector through a self-attention module, and then connect a binary classification neural to the vector. Network, to determine the intention to belong to, that is, to get the result of intention understanding. As shown in Figure 3, the obtained intent understanding result is "play music: play_music".
  • the slot extraction task processing layer can be implemented by the conditional random field module CRF.
  • the coding features corresponding to the text to be understood can be extracted from the coding features (h 1 , h 2 , ..., h m ) and sent to the CRF layer for processing. Processing to obtain the slot type marked by each constituent unit in the text to be understood.
  • the corresponding output of the text to be understood is "O O O B-artist I-artist O B-song I-song I-song E-song".
  • parameter initialization may be performed based on a pre-trained cross-language mask language model.
  • the application can collect large-scale unsupervised multilingual corpus in advance, and use the collected corpus to train a cross-language mask language model (MaskLanguageModel), as shown in Figure 4, the structure of the model can be Transformer or other neural network structure .
  • the trained cross-language mask language model is used to initialize the parameters of the semantic understanding model of the present application.
  • the training corpus of the cross-language mask language model can be unsupervised data, it can be obtained in large quantities, so that the model can learn more corpora in languages. Furthermore, using the cross-language mask language model to initialize the parameters of the semantic understanding model can enable the semantic understanding model to have good generalization under limited supervised corpus.
  • the training text and the language features of the language to which it belongs are used as sample input, and the training is carried out with the goal of predicting the occluded characters in the training text. That is, during the training process, a word is randomly replaced with the [mask] character, and the training goal is to predict the original word at that position.
  • the process of processing and merging text and language features by the semantic understanding model to obtain a semantic understanding result is further introduced.
  • the semantic understanding model may further add a slot attention layer.
  • the slot attention layer performs attention calculation on the encoding feature output by the encoding layer and the preconfigured slot embedding feature matrix, and obtains a new encoding feature fused with the slot embedding feature matrix.
  • the slot embedding feature matrix includes embedding feature representations corresponding to each type of semantic slot in the scene to which the text to be understood belongs.
  • the slot embedding feature matrix can be iteratively updated along with the training of the semantic understanding model. After the training of the semantic understanding model is completed, the slot embedding feature matrix is fixed.
  • the slot attention layer can perform attention calculation based on the slot embedding feature matrix and the coding features output by the coding layer, and obtain a new coding feature fused with the slot embedding feature matrix.
  • Each row in the slot embedding feature matrix corresponds to the embedding feature representation of a slot, such as B-artist, I-artist, E-artist, B-song, I-song, etc.
  • FIG. 5 is equivalent to further subdividing each entity type into the three slot types B, I, and E of the entity type in the example of FIG. 2b embedded in the feature matrix.
  • the slot attention layer performs attention calculation on the encoding feature output by the encoding layer and the preconfigured slot embedding feature matrix, so as to obtain a new encoding feature of the fusion slot embedding feature matrix, and the new encoding feature is fused Therefore, when the slot extraction layer of the semantic understanding model processes the new coding feature, it is more accurate to treat the type of slot marked by each component in the understanding text. .
  • the attention calculation process of the slot attention layer can be realized by referring to the following formula:
  • h t represents the t-th encoding feature in the encoding features (h 1 , h 2 ,..., h n ) output by the encoding layer corresponding to the text to be understood
  • slot j represents the j-th encoding feature in the slot embedding feature matrix
  • a tj represents the attention weight of the t-th encoded feature to the j-th slot-embedded feature representation
  • g t represents the obtained t-th new encoding feature.
  • the new encoded features of the finally obtained fused slot embedded feature matrix can be expressed as: (g 1 , g 2 ,...,g n ).
  • FIG. 6 is a schematic structural diagram of a semantic understanding apparatus disclosed in an embodiment of the present application.
  • the apparatus may include:
  • the data acquisition unit 11 is used to acquire the entity word matching the text to be understood as the matching entity word, and the language feature of the language to which the text to be understood belongs, wherein the matching entity word is the language and scene to which the text to be understood belongs. Next, the entity words that match the to-be-understood text;
  • a fusion unit 12 configured to determine a fusion text based on the to-be-understood text and the matched entity word;
  • the semantic understanding unit 13 is configured to determine the semantic understanding result of the text to be understood based on the fusion text and the language feature.
  • the process in which the above-mentioned data acquisition unit acquires the entity words that match the text to be understood as the matching entity words may include:
  • An entity word in the entity library that matches the to-be-understood text is determined as a matching entity word.
  • the process in which the above-mentioned data acquisition unit acquires the language feature of the language to which the text to be understood belongs can include:
  • the language embedding feature matrix includes the respective embedding feature representations corresponding to each language
  • the embedded feature representation corresponding to the language to which the text to be understood belongs is searched in the language embedded feature matrix, as the language feature of the language to which the text to be understood belongs.
  • the above-mentioned fusion unit determines the process of fusing the text based on the to-be-understood text and the matching entity word, which may include:
  • the to-be-understood text and the matched entity words are spliced to obtain a fusion text.
  • the process of determining the semantic understanding result of the text to be understood by the semantic understanding unit based on the fusion text and the language feature may include:
  • the semantic understanding result of the text to be understood is determined based on at least the entity type feature, the text feature and the language feature of each constituent unit.
  • the process in which the semantic understanding unit obtains the entity type features of each constituent unit in the fusion text may include:
  • entity type embedded feature matrix includes embedded feature representations corresponding to each entity type in the corresponding scene
  • the embedded feature representation corresponding to each constituent unit in the fusion text is searched in the entity type embedded feature matrix as the entity type feature of each constituent unit.
  • the process of determining the semantic understanding result of the text to be understood by the semantic understanding unit based on the fusion text and the language feature may further include:
  • the semantic understanding unit determines the semantic understanding result of the text to be understood based on the entity type feature, text feature, location feature and the language feature of each constituent unit.
  • the process in which the above-mentioned semantic understanding unit obtains the positional features of each constituent unit in the fusion text may include:
  • the position embedding feature matrix includes embedding feature representations corresponding to each position number
  • the embedded feature representation corresponding to the position number of each constituent unit in the fusion text is searched in the position embedded feature matrix, as the position feature of each constituent unit.
  • the process of determining the semantic understanding result of the text to be understood by the above-mentioned semantic understanding unit based on the fusion text and the language features can be implemented by a semantic understanding model, and specifically, a pre-trained semantic understanding model can be used for processing.
  • the fusion text and the language feature are used to obtain a semantic understanding result of the text to be understood output by the semantic understanding model.
  • the process of using a semantic understanding model to process the fusion text and the language features to obtain a semantic understanding result may include:
  • the embedded features of each constituent unit in the fusion text are obtained, and the embedded features at least include: text features, language features, entity type features, text features and language features in location features;
  • the embedded features of each constituent unit are coded to obtain coding features
  • the encoding feature is processed to obtain the output intent
  • the encoding feature is processed to obtain the slot type marked by each component unit in the text to be understood.
  • the process of using a semantic understanding model to process the fusion text and the language features to obtain a semantic understanding result may further include:
  • the encoding feature and the preconfigured slot embedding feature matrix are used for attention calculation to obtain a new encoding feature fused with the slot embedding feature matrix;
  • the slot embedding feature matrix contains the The embedded feature representation corresponding to each type of semantic slot in the scene to which the text to be understood belongs;
  • the new coding feature is processed to obtain the slot type marked by each component unit in the text to be understood.
  • the apparatus of the present application may further include: a semantic understanding model training unit for training the semantic understanding model, and the semantic understanding model training process is to initialize parameters based on the pre-trained cross-language mask language model;
  • the training text and the language features of the language to which it belongs are used as sample input, and the training is performed with the goal of predicting the occluded characters in the training text.
  • the semantic understanding model training unit iteratively updates the language embedding feature matrix, the entity type embedding feature matrix, and the location embedding feature matrix;
  • the language embedding feature matrix includes embedding feature representations corresponding to each language
  • the entity type embedded feature matrix includes embedded feature representations corresponding to each entity type in the corresponding scenario
  • the position embedded feature matrix includes embedded feature representations corresponding to each position number respectively.
  • FIG. 7 shows a block diagram of the hardware structure of the semantic understanding device.
  • the hardware structure of the semantic understanding device may include: at least one processor 1, at least one communication interface 2, at least one memory 3, and at least one communication interface. bus4;
  • the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 communicate with each other through the communication bus 4;
  • the processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
  • the memory 3 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), etc., such as at least one disk memory;
  • the memory stores a program
  • the processor can call the program stored in the memory, and the program is used for:
  • refinement function and extension function of the program may refer to the above description.
  • An embodiment of the present application further provides a storage medium, where the storage medium can store a program suitable for the processor to execute, and the program is used for:
  • refinement function and extension function of the program may refer to the above description.
  • an embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any one of the above semantic understanding methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种语义理解方法、装置、设备及存储介质,对于待理解文本,在其所属场景下的各类型实体词中获取与待理解文本相匹配的实体词,以及获取待理解文本所属语种的语种特征(S100),基于将所述待理解文本与匹配实体词进行拼接,确定得到融合文本(S110),基于融合文本及语种特征,确定待理解文本的语义理解结果(S120)。本方法、装置、设备及存储介质能够针对不同语种、不同场景下的待理解文本进行语义的理解,并且过程中考虑待理解文本所属语种,能够区分不同语种特性,保障对各种不同语种的待理解文本的语义理解效果。同时,通过引入待理解文本所属语种及场景下的匹配实体词,使语义理解方案能够适用于对不同语种、不同场景下待理解文本的语义理解,提升待理解文本的语义理解准确度。

Description

语义理解方法、装置、设备及存储介质
本申请要求于2021年1月28日提交至中国国家知识产权局、申请号为202110117912.7、发明名称为“语义理解方法、装置、设备及存储介质”的专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自然语言处理技术领域,更具体的说,是涉及一种语义理解方法、装置、设备及存储介质。
背景技术
人机交互中最核心的技术是语义理解,语义理解技术通常由两部分组成,一个是意图理解,判断出用户的意图,另一个则是槽抽取,即从用户请求中提取出意图相关的实体。
语义理解的准确度直接影响到人机交互的实际体验,只有准确理解用户请求信息,人机交互***才能够给出正确的反馈。因此,提供一种能够准确理解输入请求的方案,成为行业内热门的研究方向。
发明内容
鉴于上述问题,提出了本申请以便提供一种语义理解方法、装置、设备及存储介质,以实现对输入请求的语义理解。具体方案如下:
在本申请的第一方面,提供了一种语义理解方法,包括:
获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
基于所述待理解文本与所述匹配实体词,确定融合文本;
基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
优选地,所述获取与待理解文本匹配的实体词作为匹配实体词,包括:
获取与待理解文本所属的语种及场景匹配的实体库,所述实体库中包 含对应语种及场景下各类型实体词;
确定所述实体库中与所述待理解文本匹配的实体词,作为匹配实体词。
优选地,获取所述待理解文本所属语种的语种特征的过程,包括:
获取预配置的语种嵌入特征矩阵,所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
在所述语种嵌入特征矩阵中查找与所述待理解文本所属语种对应的嵌入特征表示,作为待理解文本所属语种的语种特征。
优选地,所述基于所述待理解文本与所述匹配实体词,确定融合文本,包括:
将所述待理解文本与所述匹配实体词进行拼接,得到融合文本。
优选地,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,包括:
获取所述融合文本中各组成单元的实体类型特征,及各组成单元的文本特征;
至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果。
优选地,获取所述融合文本中各组成单元的实体类型特征的过程,包括:
获取预配置的与待理解文本所属的场景对应实体类型嵌入特征矩阵,所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
在所述实体类型嵌入特征矩阵中查找与所述融合文本中各组成单元对应的嵌入特征表示,作为各组成单元的实体类型特征。
优选地,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,还包括:
获取所述融合文本中各组成单元的位置特征,所述位置特征表征组成单元在所述融合文本中所处位置;
所述至少基于各组成单元的实体类型特征、文本特征及所述语种特征, 确定所述待理解文本的语义理解结果,包括:
基于各组成单元的实体类型特征、文本特征、位置特征及所述语种特征,确定所述待理解文本的语义理解结果。
优选地,获取所述融合文本中各组成单元的位置特征的过程,包括:
获取预配置的位置嵌入特征矩阵,所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示;
在所述位置嵌入特征矩阵中查找与所述融合文本中各组成单元的位置编号对应的嵌入特征表示,作为各组成单元的位置特征。
优选地,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,包括:
利用预训练的语义理解模型处理所述融合文本及所述语种特征,以得到语义理解模型输出的待理解文本的语义理解结果。
优选地,利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,包括:
基于语义理解模型的嵌入层,获取所述融合文本中各组成单元的嵌入特征,所述嵌入特征至少包括:文本特征、语种特征、实体类型特征、位置特征中的文本特征和语种特征;
基于语义理解模型的编码层,对各组成单元的嵌入特征进行编码处理,得到编码特征;
基于语义理解模型的意图理解层,处理所述编码特征,得到输出的意图;
基于语义理解模型的槽抽取层,处理所述编码特征,得到待理解文本中各组成单元标注的槽类型。
优选地,所述利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,还包括:
基于语义理解模型的槽注意力层,将所述编码特征与预配置的槽嵌入特征矩阵进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征;所述槽嵌入特征矩阵中包含所述待理解文本所属的场景下的各类型语义槽分 别对应的嵌入特征表示;
基于语义理解模型的槽抽取层,处理所述新的编码特征,得到待理解文本中各组成单元标注的槽类型。
优选地,所述语义理解模型训练过程,基于预训练的跨语种掩码语言模型进行参数初始化;
所述跨语种掩码语言模型训练时,以训练文本及其所属语种的语种特征作为样本输入,以预测训练文本中被遮挡的字符为目标进行训练。
优选地,所述语义理解模型训练过程,迭代更新语种嵌入特征矩阵、实体类型嵌入特征矩阵、位置嵌入特征矩阵;
其中,
所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示。
在本申请的第二方面,提供了一种语义理解装置,包括:
数据获取单元,用于获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
融合单元,用于基于所述待理解文本与所述匹配实体词,确定融合文本;
语义理解单元,用于基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
在本申请的第三方面,提供了一种语义理解设备,包括:存储器和处理器;
所述存储器,用于存储程序;
所述处理器,用于执行所述程序,实现如上所述的语义理解方法的各个步骤。
在本申请的第四方面,提供了一种存储介质,其上存储有计算机程序, 所述计算机程序被处理器执行时,实现如上所述的语义理解方法的各个步骤。
在本申请的第五方面中,提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行上述语义理解方法的各个步骤。
借由上述技术方案,本申请对于待理解文本,在其所属场景下的各类型实体词中获取与待理解文本相匹配的实体词,以及获取待理解文本所属语种的语种特征,进而基于待理解文本与匹配实体词确定融合文本,基于融合文本及语种特征,确定待理解文本的语义理解结果。由此可见,本申请方案能够针对不同语种、不同场景下的待理解文本进行语义的理解,并且过程中考虑待理解文本所属语种,能够区分不同语种特性,保障对各种不同语种的待理解文本的语义理解效果。同时,本申请引入了待理解文本所属语种及场景下,与待理解文本相匹配的实体词,将匹配实体词与待理解文本进行融合,并基于融合文本确定待理解文本的语义理解结果,通过引入待理解文本所属语种及场景下的匹配实体词,使得本申请的语义理解方案能够适用于对不同语种、不同场景下的待理解文本的语义理解,并且能够提升不同语种、场景下待理解文本的语义理解准确度。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1为本申请实施例提供的语义理解方法的一流程示意图;
图2a示例了一种语种嵌入特征矩阵的示意图;
图2b示例了一种实体类型嵌入特征矩阵的示意图;
图2c示例了一种起始位置嵌入特征矩阵的示意图;
图2d示例了一种结束位置嵌入特征矩阵的示意图;
图3示例了一种语义理解模型的整体架构示意图;
图4示例了一种跨语种掩码语言模型架构示意图;
图5示例了一种槽嵌入特征矩阵的示意图;
图6为本申请实施例公开的一种语义理解装置结构示意图;
图7为本申请实施例提供的语义理解设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供了一种语义理解方法,能够处理语义理解的任务。本申请的语义理解方法支持对多种不同语种、不同场景的信息进行语义的理解。
本申请方案可以基于具备数据处理能力的终端实现,该终端可以是手机、电脑、服务器、云端等。
接下来,结合图1所述,本申请的语义理解方法可以包括如下步骤:
步骤S100、获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征。
其中,待理解文本为需要进行语义理解的文本。待理解文本可以是用户输入的文本,也可以是对用户输入的语音进行识别,得到的识别文本。
在人机交互***中,可以基于用户向***发出的请求,确定用户请求对应的文本,作为待理解文本。
其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词。
一种可选的实施方式下,本申请针对每一语种下的每一场景,可以配置对应的实体库,实体库中包含对应语种及场景下各类型实体词。
示例如,针对中文音乐场景,典型的实体类型有“歌手”、“歌曲”、 “标签”等。对于“歌手”这一实体类型,对应的实体词可以包括各种歌手的姓名、别称、代号等,如“刘德华”、“华仔”、“张学友”等。对于“歌曲”这一实体类型,对应的实体词可以包括各种歌曲的曲名,如“忘情水”、“吻别”等。对于“标签”这一实体类型,对应的实体词可以包括各种歌曲派别,如“流行”、“摇滚”、“轻音乐”等。
可选的,一个语种及场景对应的实体库中,各实体词可以按照实体词类型分别存储,如上述例子中,实体库中可以分别包括“歌手”实体库、“歌曲”实体库、“标签”实体库。当然,实体库中各类型实体词还可以混合存储在一起,对此本申请不做严格限定。
对于待理解文本确定其所属的语种及场景。其中,待理解文本的语种可以是用户输入的,也可以是通过对待理解文本进行语种分析确定的。待理解文本所属场景可以是用户输入的,也可以是通过对待理解文本进行文本分析,以确定其所属场景。对于待理解文本所属语种及场景的获取过程,本申请不做严格限定。
在获得待理解文本所属语种及场景之后,查找与待理解文本所述语种及场景对应的实体库。进而,在查找到的实体库中,确定与待理解文本匹配的实体词,作为匹配实体词。
具体的,确定匹配实体词的过程,可以是按照字符串匹配的方式,将待理解文本与实体库中各实体词进行匹配,从而确定实体库中与待理解文本匹配的实体词,作为匹配实体词。
示例性的说明如,待理解文本为“我想听赵雷的南方姑娘”。将该待理解文本与中文音乐场景下的实体库中各实体词进行匹配,可以匹配到“歌手”类型的实体词“赵雷”,以及匹配到“歌曲”类型的实体词“南方”和“南方姑娘”。
进一步的,对于待理解文本所属语种的语种特征,其能够标识待理解文本所属的语种。
在一些可选的实施方式中,本申请可以预先配置语种嵌入特征矩阵,该语种嵌入特征矩阵中包含有各个语种分别对应的嵌入特征表示。则对于 待理解文本所属语种而言,可以通过查询语种嵌入特征矩阵,从中查找与待理解文本所属语种对应的嵌入特征表示,作为语种特征。
参见图2a,其示例了一种语种嵌入特征矩阵的示意图。
图2a中每一行代表一种语种对应的语种嵌入特征表示,右侧的标记为不同语种对应的语种标识,示例如可以用zh代表中文,用en代表英文等。
步骤S110、基于所述待理解文本与所述匹配实体词,确定融合文本。
具体的,本实施例中通过引入待理解文本所属语种及场景下的匹配实体词,可以实现对不同语种、场景下的待理解文本进行语义理解,为了适应不同的语种及场景,本步骤中可以确定待理解文本与匹配实体词的融合文本,以便后续基于融合文本来进行语义的理解。
一种可选的的实施方式下,待理解文本与匹配实体词进行融合的过程,可以是直接将待理解文本与匹配实体词进行文本拼接,得到的拼接文本作为融合文本。以前述例子进行说明,待理解文本为“我想听赵雷的南方姑娘”,匹配实体词包括:“赵雷”、“南方”和“南方姑娘”。则拼接后得到的融合文本可以是“我想听赵雷的南方姑娘赵雷南方南方姑娘”。
当然,除了通过拼接的方式进行融合之外,待理解文本和匹配实体词还可以存在其它的融合方式,如待理解文本和匹配实体词通过注意力机制进行嵌入向量的融合等。
本步骤中得到的融合文本中,同时包含有待理解文本的信息,以及所属语种、场景下的匹配实体词信息。也即,融合文本包含有更加丰富的信息,其与待理解文本所属语种及场景存在关联,从而能够适应待理解文本所属语种及场景下的语义表达方式,后续步骤中基于融合文本进行语义理解时,能够提高语义理解效果。
步骤S120、基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
具体的,在得到融合文本及待理解文本所属语种的语种特征之后,基于融合文本及语种特征,确定待理解文本的语义理解结果。
其中,语义理解结果可以包括以下两项中的任意一项或两项:
意图理解结果、槽抽取结果。
其中,意图理解结果为,待理解文本对应的意图类别,以中文音乐场景下的意图为例,其可以包括“播放音乐”、“切歌”等多种不同类型的意图。
槽抽取结果可以看作一个序列标注过程,即将待理解文本中每个组成单元标注出对应的槽类别,如不同类型的实体词可以作为不同的槽类别,示例如,上述“赵雷”、“南方姑娘”即为不同的槽类别。
本申请实施例提供的语义理解方法,能够针对不同语种、不同场景下的待理解文本进行语义的理解,并且过程中考虑待理解文本所属语种,能够区分不同语种特性,保障对各种不同语种的待理解文本的语义理解效果。同时,本申请引入了待理解文本所属语种及场景下,与待理解文本相匹配的实体词,将匹配实体词与待理解文本进行融合,并基于融合文本确定待理解文本的语义理解结果,通过引入待理解文本所属语种及场景下的匹配实体词,使得本申请的语义理解方案能够适用于对不同语种、不同场景下的待理解文本的语义理解,并且能够提升不同语种、场景下待理解文本的语义理解准确度。
在本申请的一些实施例中,上述步骤S120,基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,可以包括:
S1、获取所述融合文本中各组成单元的实体类型特征,及各组成单元的文本特征。
其中,文本特征是表征组成单元的文本层次含义的特征,其可以是组成单元的词嵌入向量。具体的,可以通过查询词典的形式,确定每一组成单元的词嵌入向量,作为文本特征。
实体类型特征是表征组成单元的实体类型的特征。由于融合文本是待理解文本与匹配实体词融合后的结果,其中由匹配实体词作为的组成单元,其实体类型特征可以是匹配实体词对应类型的特征表示,由待理解文本作 为的组成单元,其实体类型特征可以统一使用同一特征表示。
在一些可选的实施方式中,本申请可以预先配置不同场景分别对应的实体类型嵌入特征矩阵,该实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示。则对于融合文本中各组成单元而言,可以在待理解文本所属场景对应的实体类型嵌入特征矩阵中,查找与各组成单元对应的嵌入特征表示,作为各组成单元的实体类型特征。
参见图2b,其示例了一种实体类型嵌入特征矩阵的示意图。
图2b中每一行代表一种实体类型对应的实体类型嵌入特征表示,右侧的标记为不同实体类型对应的实体类型标识,示例如可以用O代表待理解文本中各组成单元对应的实体类型,用artist代表“歌手”对应的实体类型,用song代表“歌曲”对应的实体类型,等。
S2、至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果。
本实施例中,进一步获取了融合文本中各组成单元的实体类型特征,进而基于该实体类型特征、文本特征及语种特征,确定待理解文本的语义理解结果。相比于前述方案,额外考虑了融合文本中各组成单元的实体类型特征,也即进行语义理解的参考数据更加丰富,能够基于实体类型特征来更加准确的对待理解文本进行语义理解。
再进一步的,在上述步骤S2之前,本申请方法还可以进一步包括:
S3、获取所述融合文本中各组成单元的位置特征。
其中,所述位置特征表征组成单元在所述融合文本中所处位置。
由于融合文本是待理解文本与匹配实体词融合后的结果,其中由待理解文本作为的组成单元,其位置特征可以是组成单元在待理解文本中所处绝对位置的特征。由匹配实体词作为的组成单元,其位置特征可以是匹配实体词在待理解文本中所处绝对位置的特征。
其中,位置特征可以包括起始位置特征和结束位置特征。
示例如融合文本由待理解文本“我想听赵雷的南方姑娘”,和匹配实体词“赵雷”、“南方”、“南方姑娘”顺序拼接组成。则融合文本中各组成单 元的位置编号可以表示为:
Figure PCTCN2021082961-appb-000001
表1
上述不同的位置编号,可以对应不同的位置特征。
在一些可选的实施方式中,本申请可以预先配置位置嵌入特征矩阵,该位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示。则对于融合文本中各组成单元而言,可以在位置嵌入特征矩阵中查找与融合文本中各组成单元的位置编号对应的嵌入特征表示,作为各组成单元的位置特征。
参见图2c和图2d,其分别示例了一种起始位置嵌入特征矩阵和结束位置嵌入特征矩阵的示意图。
图2c中每一行代表一个起始位置编号对应的起始位置嵌入特征表示,右侧的标记为不同起始位置对应的位置编号,示例如起始位置编号可以从0开始依次排序。
图2d中每一行代表一个结束位置编号对应的结束位置嵌入特征表示,右侧的标记为不同结束位置对应的位置编号,示例如结束位置编号可以从0开始依次排序。
在此基础上,上述步骤S2的具体实现过程可以包括:
基于各组成单元的实体类型特征、文本特征、位置特征及所述语种特征,确定所述待理解文本的语义理解结果。
本实施例中,进一步获取了融合文本中各组成单元的位置特征,进而基于该位置特征、实体类型特征、文本特征及语种特征,确定待理解文本的语义理解结果。相比于前述方案,额外考虑了融合文本中各组成单元的 位置特征,也即进行语义理解的参考数据更加丰富,能够基于位置特征来更加准确的对待理解文本进行语义理解。
需要说明的是,现有的语义理解方案,一般都是针对单语种、单场景进行语义理解模型的定制,即每个语种、每个场景都需要部署一套语义理解模型,一套语义理解模型也只能够对一种语种下的一个场景的用户请求进行语义理解。
现有技术至少存在如下两个缺点:
其一,在将语义理解模型集成部署到人机交互***时,需要部署大量模型,比如有N个语种M个场景,则至少需要部署N*M个模型,这就会消耗大量计算资源。
其二,现有技术方案没有充分利用多语种的语义共性。本案申请人发现,尽管语种不同,但是一些语义相似的数据实则可以通过模型共享的方式进行充分利用,这样则可以节省每个语种每个场景的数据标注量,充分利用大语种的数量优势,提升小语种的效果。
基于此,本实施例提供的语义理解方案中,上述步骤S120,基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,可以通过预训练的语义理解模型来实现,即利用预训练的语义理解模型处理所述融合文本及所述语种特征,以得到语义理解模型输出的待理解文本的语义理解结果。
本申请的语义理解模型可以适用于所有语种的所有场景下的语义理解,训练时可以使用所有语种的所有场景下的数据进行训练。由于本申请中将融合文本及待理解文本所属语种的语种特征作为语义理解模型的输入。其中,通过引入待理解文本所属语种的语种特征,来区分不同的语种,保证模型在训练时能够学习不同语种的特性。
进一步,通过引入融合文本,该融合文本融合有待理解文本及匹配实体词,匹配实体词为待理解文本所属语种及场景下与待理解文本匹配的实体词,也即本申请考虑了待理解文本所属语种、场景下的实体词,从而使 得本申请的语义理解模型能够适用于对不同语种、不同场景下的待理解文本的语义理解,并且能够提升不同语种、场景下待理解文本的语义理解准确度。并能够保证语义理解模型的统一性。
显然,相比于现有技术的解决方案,本实施例提供的语义理解模型为统一的一个模型,该语义理解模型可以实现跨语种、跨场景的语义理解。在人机交互***中部署时,可以大大降低计算资源。
并且,本申请语义理解模型在训练时可以混合使用不同语种不同场景下的训练数据,能够充分利用多语种的语义共性,节省每个语种每个场景的数据标注量,充分利用大语种的数量优势,提升小语种的效果。
本申请的一些实施例中,利用语义理解模型处理融合文本及语种特征,得到语义理解结果的过程可以参考下述介绍。
语义理解模型可以包括嵌入层、编码层、意图理解层和槽抽取层。
其中,意图理解层和槽抽取层可以根据任务需要而保留或舍弃,如当任务只需要进行意图理解,则可以保留意图理解层,而舍弃槽抽取层;当任务只需要进行槽抽取,则可以保留槽抽取层,而舍弃意图理解层,当任务同时需要进行意图理解和槽抽取时,则同时保留该两个结构层。
上述嵌入层可以获取所述融合文本中各组成单元的嵌入特征。
其中,融合文本同时融合有待理解文本及匹配实体词,则融合文本中各组成单元可以包括待理解文本的组成单元和各匹配实体词,示例如,待理解文本中每个字符和各个匹配实体词均可以作为融合文本的组成单元。
所述嵌入特征至少可以包括文本特征和语种特征。除此之外,还可以进一步包括实体类型特征、位置特征等。
其中,文本特征、语种特征、实体类型特征、位置特征的含义可以参照前文相关介绍,此处不再赘述。
需要说明的是,对于融合文本中各组成单元而言,其语种是一致的,即均等同于待理解文本所属的语种。因此可知,融合文本中每一组成单元的语种特征与待理解文本所属语种的语种特征是相同的,则在前述获取了 待理解文本所属语种的语种特征之后,即可直接得到融合文本中各组成单元的语种特征。
进一步需要说明的是,前述实施例介绍的语种特征、实体类型特征、位置特征的确定过程,分别用到了预先配置的语种嵌入特征矩阵、实体类型嵌入特征矩阵和位置嵌入特征矩阵,该三种类型的嵌入特征矩阵可以是随着语义理解模型的训练而不断迭代更新的,在语义理解模型训练结束后,三种类型的嵌入特征矩阵固定下来。在利用训练好的语义理解模型对待理解文本进行语义理解时,可以通过查询语种嵌入特征矩阵,得到待理解文本所属语种的语种特征;通过查询实体类型嵌入特征矩阵,来获取融合文本中各组成单元的实体类型特征;通过查询位置嵌入特征矩阵,来获取融合文本中各组成单元的位置特征。
进一步,上述嵌入层在获取到融合文本中各组成单元的上述各种嵌入特征之后,可以将各嵌入特征进行相加,相加后结果作为各组成单元的最终嵌入特征。
在嵌入层获取的融合文本中各组成单元的嵌入特征之后,由语义理解模型的编码层,对各组成单元的嵌入特征进行编码处理,以得到编码特征。
其中,编码层可以采用Transformer Encoder模型结构,或其它可选的神经网络结构。
在编码层之上,分别设置有意图理解层和槽抽取层,分别实现意图理解和槽抽取任务。
意图理解层处理所述编码特征,得到输出的意图。
槽抽取层处理所述编码特征,得到待理解文本中各组成单元标注的槽类型。
进一步可选的,一般性的,槽抽取均是针对待理解文本进行的,因此,本申请可以从融合文本对应的编码特征中抽取出待理解文本对应的编码特征,送入槽抽取层进行处理,以得到待理解文本中各组成单元标注的槽类型。示例如,编码层对融合文本进行编码输出的编码特征表示为(h 1,h 2,...,h m),其中,前n个编码特征为待理解文本包含的n个字符的编码 特征表示,后m-n+1个为匹配实体词的编码特征表示。
接下来,结合图3,对语义理解模型的整体架构进行介绍。
如图3所示:
待理解文本为“我想听赵雷的南方姑娘”,通过与中文音乐场景下的实体库进行匹配,得到匹配实体词:“赵雷”、“南方”、“南方姑娘”。
将待理解文本与匹配实体词进行拼接,得到融合文本,如图3中字符嵌入层(Token Embedding)所示。通过字符嵌入层获取融合文本中每个组成单元的文本特征。
进一步,通过语种嵌入层(Language Embedding)获取融合文本中每个组成单元的语种特征,图3中用“zh”标识代表中文语种。可以在图2a示例的语种嵌入特征矩阵中查询对应的语种嵌入特征表示。
除此之外,语义理解模型还可以包括实体类型嵌入层(Entity Type Embedding),实体类型嵌入层用于获取融合文本中每个组成单元的实体类型特征。图3中用“O”标识统一代表待理解文本中各组成单元的实体类型,用“artist”标识代表歌手实体类型,用“song”标识代表歌曲实体类型。可以在图2b示例的实体类型嵌入特征矩阵中查询对应的实体类型嵌入特征表示。
再进一步的,语义理解模型还可以包括位置嵌入层,位置嵌入层可以包括起始位置嵌入层(Start Position Embedding)和结束位置嵌入层(End Position Embedding)。两个位置嵌入层分别获取融合文本中各组成单元的位置特征。图3中用***数字编号来代表各组成单元的位置编号。可以在图2c和图2d示例的起始位置嵌入特征矩阵和结束位置嵌入特征矩阵中查询对应的位置嵌入特征表示。
融合文本经过各嵌入层提取嵌入特征之后,对提取的各嵌入特征进行相加,得到总的嵌入特征,送入编码层(Encoder Layer)进行编码处理。编码层可以选用Transformer Encoder或其它神经网络结构。
编码层输出编码特征表示为(h 1,h 2,...,h m),其中,前n个编码特征为待理解文本包含的n个字符的编码特征表示,后m-n+1个为匹配实体词的编 码特征表示。
在编码层之上分别设置了意图理解任务处理层和槽抽取任务处理层。
其中,意图理解任务层可以通过一个自注意力模块self attention将编码层输出的编码特征(h 1,h 2,...,h m)编码成一个向量,然后对该向量连接一个二分类神经网络,判别所属意图,即得到意图理解结果。如图3所示,得到的意图理解结果为“播放音乐:play_music”。
槽抽取任务处理层可以通过条件随机场模块CRF实现,本申请可以从编码特征(h 1,h 2,...,h m)中抽取出待理解文本对应的编码特征,送入CRF层进行处理,以得到待理解文本中各组成单元标注的槽类型。如图3所示,待理解文本“我想听赵雷的南方姑娘”对应输出为“O O O B-artist I-artist O B-song I-song I-song E-song”。
在本申请的一些实施例中,对于语义理解模型的训练过程,其可以基于预训练的跨语种掩码语言模型进行参数初始化。
具体的,本申请可以预先收集大规模无监督多语种语料,利用收集的语料训练一个跨语种的掩码语言模型(MaskLanguageModel),如图4所示,模型的结构可以是Transformer或其它神经网络结构。训练好的跨语种掩码语言模型用于对本申请的语义理解模型进行参数初始化。
由于跨语种掩码语言模型的训练语料可以是无监督的数据,因此可以大量获取到,从而让模型能够学习到更多语种的语料。进而,利用跨语种掩码语言模型对语义理解模型进行参数初始化,可以让语义理解模型在有限的有监督语料下也能够具备不错的泛化性。
跨语种掩码语言模型训练过程中,以训练文本及其所属语种的语种特征作为样本输入,以预测训练文本中被遮挡的字符为目标进行训练。也即,在训练过程中随机以[mask]字符替换掉某个字,训练目标为预测该位置上原始的字。
如图4所示,本实施例中,在跨语种掩码语言模型训练过程,除了在输入样本中每个字上添加位置编码信息之外,进一步添加了语种信息,即 图4中的语种嵌入层(Language Embedding),其中用“zh”表示中文语种。基于此,可以实现训练得到跨语种的掩码语言模型。
在本申请的一些实施例中,进一步对语义理解模型处理融合文本及语种特征,得到语义理解结果的过程进行介绍。
在前述实施例的基础上,语义理解模型还可以进一步增加槽注意力层。
其中,槽注意力层将编码层输出的所述编码特征与预配置的槽嵌入特征矩阵进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征。
其中,所述槽嵌入特征矩阵中包含所述待理解文本所属的场景下的各类型语义槽分别对应的嵌入特征表示。
可以理解的是,槽嵌入特征矩阵可以是随着语义理解模型的训练而不断迭代更新的,在语义理解模型训练结束后,槽嵌入特征矩阵固定下来。在利用训练好的语义理解模型对待理解文本进行语义理解时,槽注意力层可以基于槽嵌入特征矩阵与编码层输出的编码特征进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征。
参见图5,其示例了一种槽嵌入特征矩阵的示意图。
槽嵌入特征矩阵中每一行对应一个槽slot的嵌入特征表示,如B-artist、I-artist、E-artist、B-song、I-song等。
对比于图2b可知,槽嵌入特征矩阵包含的行数是图2b示例的实体类型嵌入特征矩阵包含的行数的三倍。也即,图5相当于将图2b示例的实体类型嵌入特征矩阵中每一实体类型进一步细分为B、I、E三种槽类型。
本实施例中,槽注意力层通过将编码层输出的所述编码特征与预配置的槽嵌入特征矩阵进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征,该新的编码特征融合了槽嵌入特征矩阵的信息,其对不同类型槽的区分能力更强,因此,当语义理解模型的槽抽取层处理该新的编码特征时,对待理解文本中各组成单元标注的槽类型更加准确。
槽注意力层进行注意力计算的过程可以参照如下公式实现:
Figure PCTCN2021082961-appb-000002
a(h,intent)=w T[h;slot;h·slot]
g t=∑ Nα tjslot j
其中,h t表示编码层输出的与待理解文本对应的编码特征(h 1,h 2,...,h n)中,第t个编码特征,slot j表示槽嵌入特征矩阵中第j个槽嵌入特征表示,a tj表示第t个编码特征对第j个槽嵌入特征表示的注意力权重,g t表示得到的第t个新的编码特征。
则最终得到的融合槽嵌入特征矩阵的新的编码特征可以表示为:(g 1,g 2,...,g n)。
下面对本申请实施例提供的语义理解装置进行描述,下文描述的语义理解装置与上文描述的语义理解方法可相互对应参照。
参见图6,图6为本申请实施例公开的一种语义理解装置结构示意图。
如图6所示,该装置可以包括:
数据获取单元11,用于获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
融合单元12,用于基于所述待理解文本与所述匹配实体词,确定融合文本;
语义理解单元13,用于基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
可选的,上述数据获取单元获取与待理解文本匹配的实体词作为匹配实体词的过程,可以包括:
获取与待理解文本所属的语种及场景匹配的实体库,所述实体库中包含对应语种及场景下各类型实体词;
确定所述实体库中与所述待理解文本匹配的实体词,作为匹配实体词。
可选的,上述数据获取单元获取所述待理解文本所属语种的语种特征 的过程,可以包括:
获取预配置的语种嵌入特征矩阵,所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
在所述语种嵌入特征矩阵中查找与所述待理解文本所属语种对应的嵌入特征表示,作为待理解文本所属语种的语种特征。
可选的,上述融合单元基于所述待理解文本与所述匹配实体词,确定融合文本的过程,可以包括:
将所述待理解文本与所述匹配实体词进行拼接,得到融合文本。
可选的,上述语义理解单元基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,可以包括:
获取所述融合文本中各组成单元的实体类型特征,及各组成单元的文本特征;
至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果。
可选的,上述语义理解单元获取所述融合文本中各组成单元的实体类型特征的过程,可以包括:
获取预配置的与待理解文本所属的场景对应实体类型嵌入特征矩阵,所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
在所述实体类型嵌入特征矩阵中查找与所述融合文本中各组成单元对应的嵌入特征表示,作为各组成单元的实体类型特征。
可选的,上述语义理解单元基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,还可以包括:
获取所述融合文本中各组成单元的位置特征,所述位置特征表征组成单元在所述融合文本中所处位置。在此基础上,语义理解单元基于各组成单元的实体类型特征、文本特征、位置特征及所述语种特征,确定所述待理解文本的语义理解结果。
可选的,上述语义理解单元获取所述融合文本中各组成单元的位置特 征的过程,可以包括:
获取预配置的位置嵌入特征矩阵,所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示;
在所述位置嵌入特征矩阵中查找与所述融合文本中各组成单元的位置编号对应的嵌入特征表示,作为各组成单元的位置特征。
可选的,上述语义理解单元基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,可以通过语义理解模型实现,具体的可以利用预训练的语义理解模型处理所述融合文本及所述语种特征,以得到语义理解模型输出的待理解文本的语义理解结果。
可选的,利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,可以包括:
基于语义理解模型的嵌入层,获取所述融合文本中各组成单元的嵌入特征,所述嵌入特征至少包括:文本特征、语种特征、实体类型特征、位置特征中的文本特征和语种特征;
基于语义理解模型的编码层,对各组成单元的嵌入特征进行编码处理,得到编码特征;
基于语义理解模型的意图理解层,处理所述编码特征,得到输出的意图;
基于语义理解模型的槽抽取层,处理所述编码特征,得到待理解文本中各组成单元标注的槽类型。
可选的,利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,还可以包括:
基于语义理解模型的槽注意力层,将所述编码特征与预配置的槽嵌入特征矩阵进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征;所述槽嵌入特征矩阵中包含所述待理解文本所属的场景下的各类型语义槽分别对应的嵌入特征表示;
基于语义理解模型的槽抽取层,处理所述新的编码特征,得到待理解文本中各组成单元标注的槽类型。
可选的,本申请装置还可以包括:语义理解模型训练单元,用于对语义理解模型进行训练,所述语义理解模型训练过程,基于预训练的跨语种掩码语言模型进行参数初始化;
所述跨语种掩码语言模型训练时,以训练文本及其所属语种的语种特征作为样本输入,以预测训练文本中被遮挡的字符为目标进行训练。
可选的,所述语义理解模型训练单元训练语义理解模型的过程,迭代更新语种嵌入特征矩阵、实体类型嵌入特征矩阵、位置嵌入特征矩阵;
其中,
所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示。
本申请实施例提供的语义理解装置可应用于语义理解设备,如终端:手机、电脑等。可选的,图7示出了语义理解设备的硬件结构框图,参照图7,语义理解设备的硬件结构可以包括:至少一个处理器1,至少一个通信接口2,至少一个存储器3和至少一个通信总线4;
在本申请实施例中,处理器1、通信接口2、存储器3、通信总线4的数量为至少一个,且处理器1、通信接口2、存储器3通过通信总线4完成相互间的通信;
处理器1可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路等;
存储器3可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory)等,例如至少一个磁盘存储器;
其中,存储器存储有程序,处理器可调用存储器存储的程序,所述程序用于:
获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种 及场景下,与所述待理解文本相匹配的实体词;
基于所述待理解文本与所述匹配实体词,确定融合文本;
基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
可选的,所述程序的细化功能和扩展功能可参照上文描述。
本申请实施例还提供一种存储介质,该存储介质可存储有适于处理器执行的程序,所述程序用于:
获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
基于所述待理解文本与所述匹配实体词,确定融合文本;
基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
可选的,所述程序的细化功能和扩展功能可参照上文描述。
进一步地,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行上述语义理解方法中的任意一种实现方式。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间可以根据需要进行组合, 且相同相似部分互相参见即可。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (19)

  1. 一种语义理解方法,其特征在于,包括:
    获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
    基于所述待理解文本与所述匹配实体词,确定融合文本;
    基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
  2. 根据权利要求1所述的方法,其特征在于,所述获取与待理解文本匹配的实体词作为匹配实体词,包括:
    获取与待理解文本所属的语种及场景匹配的实体库,所述实体库中包含对应语种及场景下各类型实体词;
    确定所述实体库中与所述待理解文本匹配的实体词,作为匹配实体词。
  3. 根据权利要求1所述的方法,其特征在于,获取所述待理解文本所属语种的语种特征的过程,包括:
    获取预配置的语种嵌入特征矩阵,所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
    在所述语种嵌入特征矩阵中查找与所述待理解文本所属语种对应的嵌入特征表示,作为待理解文本所属语种的语种特征。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述待理解文本与所述匹配实体词,确定融合文本,包括:
    将所述待理解文本与所述匹配实体词进行拼接,得到融合文本。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,包括:
    获取所述融合文本中各组成单元的实体类型特征,及各组成单元的文本特征;
    至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果。
  6. 根据权利要求5所述的方法,其特征在于,获取所述融合文本中各组成单元的实体类型特征的过程,包括:
    获取预配置的与待理解文本所属的场景对应实体类型嵌入特征矩阵,所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
    在所述实体类型嵌入特征矩阵中查找与所述融合文本中各组成单元对应的嵌入特征表示,作为各组成单元的实体类型特征。
  7. 根据权利要求5所述的方法,其特征在于,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,还包括:
    获取所述融合文本中各组成单元的位置特征,所述位置特征表征组成单元在所述融合文本中所处位置;
    所述至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果,包括:
    基于各组成单元的实体类型特征、文本特征、位置特征及所述语种特征,确定所述待理解文本的语义理解结果。
  8. 根据权利要求7所述的方法,其特征在于,获取所述融合文本中各组成单元的位置特征的过程,包括:
    获取预配置的位置嵌入特征矩阵,所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示;
    在所述位置嵌入特征矩阵中查找与所述融合文本中各组成单元的位置编号对应的嵌入特征表示,作为各组成单元的位置特征。
  9. 根据权利要求1所述的方法,其特征在于,所述基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果,包括:
    利用预训练的语义理解模型处理所述融合文本及所述语种特征,以得到语义理解模型输出的待理解文本的语义理解结果。
  10. 根据权利要求9所述的方法,其特征在于,利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,包括:
    基于语义理解模型的嵌入层,获取所述融合文本中各组成单元的嵌入 特征,所述嵌入特征至少包括:文本特征、语种特征、实体类型特征、位置特征中的文本特征和语种特征;
    基于语义理解模型的编码层,对各组成单元的嵌入特征进行编码处理,得到编码特征;
    基于语义理解模型的意图理解层,处理所述编码特征,得到输出的意图;
    基于语义理解模型的槽抽取层,处理所述编码特征,得到待理解文本中各组成单元标注的槽类型。
  11. 根据权利要求10所述的方法,其特征在于,所述利用语义理解模型处理所述融合文本及所述语种特征,得到语义理解结果的过程,还包括:
    基于语义理解模型的槽注意力层,将所述编码特征与预配置的槽嵌入特征矩阵进行注意力计算,得到融合槽嵌入特征矩阵的新的编码特征;所述槽嵌入特征矩阵中包含所述待理解文本所属的场景下的各类型语义槽分别对应的嵌入特征表示;
    基于语义理解模型的槽抽取层,处理所述新的编码特征,得到待理解文本中各组成单元标注的槽类型。
  12. 根据权利要求9所述的方法,其特征在于,所述语义理解模型训练过程,基于预训练的跨语种掩码语言模型进行参数初始化;
    所述跨语种掩码语言模型训练时,以训练文本及其所属语种的语种特征作为样本输入,以预测训练文本中被遮挡的字符为目标进行训练。
  13. 根据权利要求12所述的方法,其特征在于,所述语义理解模型训练过程,迭代更新语种嵌入特征矩阵、实体类型嵌入特征矩阵、位置嵌入特征矩阵;
    其中,
    所述语种嵌入特征矩阵中包含各语种分别对应的嵌入特征表示;
    所述实体类型嵌入特征矩阵中包含对应场景下的各实体类型分别对应的嵌入特征表示;
    所述位置嵌入特征矩阵中包含各位置编号分别对应的嵌入特征表示。
  14. 一种语义理解装置,其特征在于,包括:
    数据获取单元,用于获取与待理解文本匹配的实体词作为匹配实体词,及所述待理解文本所属语种的语种特征,其中,所述匹配实体词为所述待理解文本所属语种及场景下,与所述待理解文本相匹配的实体词;
    融合单元,用于基于所述待理解文本与所述匹配实体词,确定融合文本;
    语义理解单元,用于基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果。
  15. 根据权利要求14所述的装置,其特征在于,所述数据获取单元获取与待理解文本匹配的实体词作为匹配实体词的过程,可以包括:
    获取与待理解文本所属的语种及场景匹配的实体库,所述实体库中包含对应语种及场景下各类型实体词;确定所述实体库中与所述待理解文本匹配的实体词,作为匹配实体词。
  16. 根据权利要求14所述的装置,其特征在于,所述语义理解单元基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,包括:
    获取所述融合文本中各组成单元的实体类型特征,及各组成单元的文本特征;
    至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果。
  17. 根据权利要求16所述的装置,其特征在于,所述语义理解单元基于所述融合文本及所述语种特征,确定所述待理解文本的语义理解结果的过程,还包括:
    获取所述融合文本中各组成单元的位置特征,所述位置特征表征组成单元在所述融合文本中所处位置;
    则所述语义理解单元至少基于各组成单元的实体类型特征、文本特征及所述语种特征,确定所述待理解文本的语义理解结果的过程,包括:
    基于各组成单元的实体类型特征、文本特征、位置特征及所述语种特 征,确定所述待理解文本的语义理解结果。
  18. 一种语义理解设备,其特征在于,包括:存储器和处理器;
    所述存储器,用于存储程序;
    所述处理器,用于执行所述程序,实现如权利要求1~13中任一项所述的语义理解方法的各个步骤。
  19. 一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1~13中任一项所述的语义理解方法的各个步骤。
PCT/CN2021/082961 2021-01-28 2021-03-25 语义理解方法、装置、设备及存储介质 WO2022160445A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110117912.7 2021-01-28
CN202110117912.7A CN112800775B (zh) 2021-01-28 2021-01-28 语义理解方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022160445A1 true WO2022160445A1 (zh) 2022-08-04

Family

ID=75812464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082961 WO2022160445A1 (zh) 2021-01-28 2021-03-25 语义理解方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112800775B (zh)
WO (1) WO2022160445A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535896B (zh) * 2021-06-23 2024-04-19 北京达佳互联信息技术有限公司 搜索方法、装置、电子设备及存储介质
CN113656561A (zh) * 2021-10-20 2021-11-16 腾讯科技(深圳)有限公司 实体词识别方法、装置、设备、存储介质及程序产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315737A (zh) * 2017-07-04 2017-11-03 北京奇艺世纪科技有限公司 一种语义逻辑处理方法及***
CN110110061A (zh) * 2019-04-26 2019-08-09 同济大学 基于双语词向量的低资源语种实体抽取方法
US20190278800A1 (en) * 2016-05-24 2019-09-12 Koninklijke Philips N.V. System and method for imagery mnemonic creation
CN110941716A (zh) * 2019-11-05 2020-03-31 北京航空航天大学 一种基于深度学习的信息安全知识图谱的自动构建方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933809A (zh) * 2017-03-27 2017-07-07 三角兽(北京)科技有限公司 信息处理装置及信息处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190278800A1 (en) * 2016-05-24 2019-09-12 Koninklijke Philips N.V. System and method for imagery mnemonic creation
CN107315737A (zh) * 2017-07-04 2017-11-03 北京奇艺世纪科技有限公司 一种语义逻辑处理方法及***
CN110110061A (zh) * 2019-04-26 2019-08-09 同济大学 基于双语词向量的低资源语种实体抽取方法
CN110941716A (zh) * 2019-11-05 2020-03-31 北京航空航天大学 一种基于深度学习的信息安全知识图谱的自动构建方法

Also Published As

Publication number Publication date
CN112800775A (zh) 2021-05-14
CN112800775B (zh) 2024-05-31

Similar Documents

Publication Publication Date Title
JP7398402B2 (ja) 実体リンキング方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN107256267B (zh) 查询方法和装置
CN107735804B (zh) 用于不同标记集合的转移学习技术的***和方法
CN109408622B (zh) 语句处理方法及其装置、设备和存储介质
CN109165302B (zh) 多媒体文件推荐方法及装置
CN110619051B (zh) 问题语句分类方法、装置、电子设备及存储介质
JP7301922B2 (ja) 意味検索方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN111324771B (zh) 视频标签的确定方法、装置、电子设备及存储介质
CN112749326B (zh) 信息处理方法、装置、计算机设备及存储介质
WO2020133039A1 (zh) 对话语料中实体的识别方法、装置和计算机设备
CN111709243A (zh) 一种基于深度学习的知识抽取方法与装置
CN111695345A (zh) 文本中实体识别方法、以及装置
WO2021135455A1 (zh) 语义召回方法、装置、计算机设备及存储介质
WO2022160445A1 (zh) 语义理解方法、装置、设备及存储介质
CN110162675B (zh) 应答语句的生成方法、装置、计算机可读介质及电子设备
WO2021073179A1 (zh) 命名实体的识别方法和设备、以及计算机可读存储介质
WO2021159812A1 (zh) 癌症分期信息处理方法、装置及存储介质
JP2022518645A (ja) 映像配信時効の決定方法及び装置
CN111859950A (zh) 一种自动化生成讲稿的方法
CN115186675A (zh) 语言模型训练及自然语言任务处理方法、装置及相关设备
CN113343692B (zh) 搜索意图的识别方法、模型训练方法、装置、介质及设备
CN115115432B (zh) 基于人工智能的产品信息推荐方法及装置
CN114792092B (zh) 一种基于语义增强的文本主题抽取方法及装置
CN116978028A (zh) 视频处理方法、装置、电子设备及存储介质
CN112632962B (zh) 人机交互***中实现自然语言理解方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922044

Country of ref document: EP

Kind code of ref document: A1