TWI832792B - Context-aware and user history based intent evaluation system and method thereof - Google Patents
Context-aware and user history based intent evaluation system and method thereof Download PDFInfo
- Publication number
- TWI832792B TWI832792B TW112130727A TW112130727A TWI832792B TW I832792 B TWI832792 B TW I832792B TW 112130727 A TW112130727 A TW 112130727A TW 112130727 A TW112130727 A TW 112130727A TW I832792 B TWI832792 B TW I832792B
- Authority
- TW
- Taiwan
- Prior art keywords
- intention
- module
- intent
- sub
- semantic
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000011156 evaluation Methods 0.000 title abstract 2
- 230000011218 segmentation Effects 0.000 claims abstract description 26
- 238000003058 natural language processing Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 22
- 238000004458 analytical method Methods 0.000 claims description 20
- 238000010187 selection method Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 12
- 238000010801 machine learning Methods 0.000 claims description 3
- 230000003190 augmentative effect Effects 0.000 claims 2
- 230000000875 corresponding effect Effects 0.000 description 48
- 230000002452 interceptive effect Effects 0.000 description 5
- 241000282887 Suidae Species 0.000 description 4
- 230000001960 triggered effect Effects 0.000 description 4
- 230000003321 amplification Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
Description
本發明是有關於一種意圖評選系統及其方法,且特別是有關於一種基於上下文情境與使用歷程的意圖評選系統及其方法。The present invention relates to an intention selection system and a method thereof, and in particular to an intention selection system and a method based on context and usage history.
聲控智慧型裝置的普及化下,例如智慧型手機、智慧型音箱、車載人工智慧平台與聲控智慧手錶等聲控裝置隨處可見,如何能從使用者上下文情境與歷程中精確判斷使用者的意圖,而不用透過互動式過程收集使用者必要資訊來釐清使用者意圖,這可減少使用者進行語音指令的次數並且提高使用者使用體驗,讓觸發聲控裝置更簡易更便利。With the popularization of voice-controlled smart devices, such as smart phones, smart speakers, in-vehicle artificial intelligence platforms and voice-controlled smart watches, voice-controlled devices can be found everywhere. How can we accurately determine the user's intention from the user's context and process, and There is no need to collect necessary information from the user through an interactive process to clarify the user's intention. This can reduce the number of voice commands required by the user and improve the user experience, making it easier and more convenient to trigger the voice control device.
習知技術有二:There are two common techniques:
一、使用互動式方式反問使用者來收集必要資訊,用以進行意圖釐清。1. Use interactive methods to question users to collect necessary information for clarifying intentions.
二、設定意圖相似度門檻值,高於門檻值就回覆擁有最高相似度的意圖,如否則回應無法理解。2. Set the intent similarity threshold. If the intent is higher than the threshold, reply to the intent with the highest similarity. Otherwise, the response will be incomprehensible.
以上習知技術不僅需互動式方式反問使用者來收集必要資訊,口令指令次數較多,且可能因意圖相似度較低而導致聲控裝置無法回應,使用者使用體驗較差。The above-mentioned conventional technology not only requires interactive questioning of the user to collect necessary information, but also requires a large number of password commands. In addition, the voice control device may not be able to respond due to low intention similarity, resulting in a poor user experience.
本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法,可進行精確意圖識別,並且提高使用者使用體驗,讓觸發聲控裝置更簡易更便利。The present invention provides an intention selection system and method based on context and usage history, which can accurately identify intentions, improve user experience, and make triggering voice control devices easier and more convenient.
本發明的一種基於上下文情境與使用歷程的意圖評選系統,應用於聲控裝置中,聲控裝置接收使用者的語音指令後,將語音指令轉換為文字,意圖評選系統包括收發器、儲存媒體以及處理器。儲存媒體用以儲存多個模組。處理器耦接至儲存媒體以及收發器,處理器存取並執行儲存媒體所儲存的自然語言處理模組以及語意評選模組。自然語言處理模組包括斷詞子模組、詞性標註子模組及命名實體識別子模組。斷詞子模組用以使用斷詞符號對文字進行分詞,以將文字分為多個最小且有詞義的單位。詞性標註子模組與斷詞子模組電性連接,用以對各單位進行詞性標註,以對各單位分別標註詞性。命名實體識別子模組與詞性標註子模組電性連接,用以對經詞性標註後的各單位進行命名實體的識別,以依據命名實體分別對各單位標註命名實體標籤。語意評選模組與自然語言處理模組電性連接,包括意圖資料庫、語意解析子模組以及最佳意圖輸出子模組。意圖資料庫用以儲存意圖資料,意圖資料包括意圖編號、與意圖編號對應的意圖、與意圖編號對應的語意模型及與意圖編號對應的關鍵詞彙。語意解析子模組與意圖資料庫電性連接,用以依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度,並且判斷各餘弦相似度是否大於相似度門檻值。最佳意圖輸出子模組與語意解析子模組電性連接,用以選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。An intention selection system based on context and usage history of the present invention is applied to a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection system includes a transceiver, a storage medium and a processor. . Storage media is used to store multiple modules. The processor is coupled to the storage medium and the transceiver, and the processor accesses and executes the natural language processing module and the semantic selection module stored in the storage medium. The natural language processing module includes word segmentation sub-module, part-of-speech tagging sub-module and named entity recognition sub-module. The word segmentation sub-module is used to segment the text using word segmentation symbols to divide the text into multiple smallest and meaningful units. The part-of-speech tagging sub-module is electrically connected to the word segmentation sub-module, and is used to tag each unit with a part-of-speech, so as to mark each unit with a part-of-speech respectively. The named entity identification sub-module is electrically connected to the part-of-speech tagging sub-module, and is used to identify the named entities of each unit after being tagged by the part-of-speech, and to label each unit with a named entity tag based on the named entity. The semantic selection module is electrically connected to the natural language processing module, including an intent database, a semantic parsing sub-module and an optimal intent output sub-module. The intent database is used to store intent data. The intent data includes intent numbers, intents corresponding to the intent numbers, semantic models corresponding to the intent numbers, and key words corresponding to the intent numbers. The semantic parsing sub-module is electrically connected to the intent database, and is used to obtain each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intent number, and determine whether each cosine similarity is greater than the similarity threshold. value. The best intention output sub-module is electrically connected to the semantic analysis sub-module, and is used to select the intention number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold.
本發明的一種基於上下文情境與使用歷程的意圖評選方法,應用於聲控裝置中,聲控裝置接收使用者的語音指令後,將語音指令轉換為文字,意圖評選方法包括使用斷詞符號對文字進行分詞,以將文字分為多個最小且有詞義的單位;對各單位進行詞性標註,以對各單位分別標註詞性;對經詞性標註後的各單位進行命名實體的識別,以依據命名實體分別對各單位標註命名實體標籤;依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度,並且判斷各餘弦相似度是否大於相似度門檻值;以及選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。An intention selection method based on context and usage history of the present invention is applied to a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection method includes segmenting the text using hyphenation symbols. , in order to divide the text into multiple smallest and meaningful units; perform part-of-speech tagging on each unit to mark the part-of-speech of each unit respectively; identify named entities for each unit after the part-of-speech tagging, so as to identify the named entities respectively. Label each unit with a named entity label; obtain each cosine similarity based on each unit labeled with the named entity label and the semantic model corresponding to the intent number, and determine whether each cosine similarity is greater than the similarity threshold; and select the output greater than the similarity The intention number corresponding to the maximum cosine similarity among the threshold cosine similarities.
基於上述,本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法,不僅可依當下對話的情境與使用者歷程記錄,進行精確意圖識別並無須互動式方法來釐清使用者的意圖,且減少使用者進行語音指令的次數並且提高使用者使用體驗,讓觸發聲控裝置更簡易更便利。Based on the above, the present invention provides an intention selection system and method based on context and usage history. It can not only perform accurate intention recognition based on the current conversation situation and user history records, but also does not require interactive methods to clarify the user's intention. It also reduces the number of voice commands required by the user and improves the user experience, making it easier and more convenient to trigger the voice control device.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and described in detail with reference to the accompanying drawings.
本發明的部份實施例接下來將會配合附圖來詳細描述,以下的描述所引用的元件符號,當不同附圖出現相同的元件符號將視為相同或相似的元件。這些實施例只是本發明的一部份,並未揭示所有本發明的可實施方式。Some embodiments of the present invention will be described in detail with reference to the accompanying drawings. The component symbols cited in the following description will be regarded as the same or similar components when the same component symbols appear in different drawings. These embodiments are only part of the present invention and do not disclose all possible implementations of the present invention.
圖1是依照本發明的一實施例的一種基於上下文情境與使用歷程的意圖評選系統的示意圖。Figure 1 is a schematic diagram of an intention selection system based on context and usage history according to an embodiment of the present invention.
請參照圖1,基於上下文情境與使用歷程的意圖評選系統10應用於聲控裝置(圖中未示)中,聲控裝置接收使用者的語音指令後,將語音指令轉換為文字,意圖評選系統10包括收發器310、儲存媒體320以及處理器330。Please refer to Figure 1. The intention selection system 10 based on context and usage history is applied to a voice control device (not shown in the figure). After the voice control device receives the user's voice command, it converts the voice command into text. The intention selection system 10 includes transceiver 310, storage medium 320 and processor 330.
收發器310以無線或有線的方式傳送及接收訊號。收發器310還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。意圖評選系統10經由收發器310與聲控裝置通訊連接或電性連接。The transceiver 310 transmits and receives signals in a wireless or wired manner. Transceiver 310 may also perform, for example, low noise amplification, impedance matching, mixing, up or down frequency conversion, filtering, amplification, and similar operations. The intention selection system 10 is communicatively or electrically connected to the voice control device via the transceiver 310 .
儲存媒體320用以儲存意圖評選系統10運行時所需的各項軟體、資料及各類程式碼。儲存媒體320例如是任何型態的固定式或可移動式的隨機存取記憶體(random access memory,RAM)、唯讀記憶體(read-only memory,ROM)、快閃記憶體(flash memory)、硬碟(hard disk drive,HDD)、固態硬碟(solid state drive,SSD)或類似元件或上述元件的組合,而用於儲存可由處理器330執行的多個模組或各種應用程式。在一實施例中,儲存媒體320可儲存自然語言處理模組100以及語意評選模組200。The storage medium 320 is used to store various software, data and various program codes required when the intention selection system 10 is run. The storage medium 320 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), or flash memory (flash memory). , hard disk drive (HDD), solid state drive (SSD) or similar components or a combination of the above components, used to store multiple modules or various application programs that can be executed by the processor 330 . In one embodiment, the storage medium 320 can store the natural language processing module 100 and the semantic selection module 200.
處理器330例如是中央處理單元(central processing unit,CPU),或是其他可程式化之一般用途或特殊用途的微控制單元(micro control unit,MCU)、微處理器(microprocessor)、數位信號處理器(digital signal processor,DSP)、可程式化控制器、特殊應用積體電路(application specific integrated circuit,ASIC)、圖形處理器(graphics processing unit,GPU)、影像訊號處理器(image signal processor,ISP)、影像處理單元(image processing unit,IPU)、算數邏輯單元(arithmetic logic unit,ALU)、複雜可程式邏輯裝置(complex programmable logic device,CPLD)、現場可程式化邏輯閘陣列(field programmable gate array,FPGA)或其他類似元件或上述元件的組合。處理器330可耦接至儲存媒體320以及收發器310,並且存取和執行儲存於儲存媒體320中的各種應用程式以及儲存媒體320所儲存的自然語言處理模組100以及語意評選模組200。其中,自然語言處理模組100包括斷詞子模組110、詞性標註子模組120及命名實體識別子模組130。語意評選模組200與自然語言處理模組100電性連接,包括意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230。The processor 330 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (MCU), microprocessor, digital signal processing Digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processing unit (GPU), image signal processor (ISP) ), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (field programmable gate array) , FPGA) or other similar components or a combination of the above components. The processor 330 can be coupled to the storage medium 320 and the transceiver 310, and access and execute various applications stored in the storage medium 320 as well as the natural language processing module 100 and the semantic selection module 200 stored in the storage medium 320. Among them, the natural language processing module 100 includes a word segmentation sub-module 110, a part-of-speech tagging sub-module 120 and a named entity recognition sub-module 130. The semantic selection module 200 is electrically connected to the natural language processing module 100 and includes an intent database 250, a semantic parsing sub-module 240, a best intent output sub-module 260, a context parsing sub-module 210, and a personal usage history sub-module. group 220 and user process sub-module 230.
其中,斷詞子模組110用以使用斷詞符號對文字進行分詞,以將文字分為多個最小且有詞義的單位。斷詞符號可用逗號表示。Among them, the word segmentation sub-module 110 is used to segment the text using word segmentation symbols to divide the text into multiple smallest and meaningful units. The word breaking symbol can be represented by a comma.
詞性標註子模組120與斷詞子模組110電性連接,用以對各單位進行名詞、動詞、專有名詞等的詞性標註,以對各單位分別標註詞性。於本實施例中,詞性定義使用Chinese Treebank 3.0,即美國賓州大學所定義的中文詞性。舉例而言,PN為代名詞,VV為動詞,NN為一般名詞。The part-of-speech tagging sub-module 120 is electrically connected to the word segmentation sub-module 110, and is used to tag each unit with part-of-speech tags such as nouns, verbs, proper nouns, etc., and to tag each unit with part-of-speech respectively. In this embodiment, Chinese Treebank 3.0 is used to define part of speech, which is the Chinese part of speech defined by the University of Pennsylvania. For example, PN is a pronoun, VV is a verb, and NN is a general noun.
命名實體識別子模組130與詞性標註子模組120電性連接,用以對經詞性標註後的各單位進行命名實體的識別,以依據命名實體分別對各單位標註命名實體標籤。在一實施例中,命名實體識別子模組130可通過機器學習訓練來擴增命名實體,其中命名實體可包括人、事、時、地及物。本發明並不以此為限。The named entity identification sub-module 130 is electrically connected to the part-of-speech tagging sub-module 120, and is used to identify the named entities of each unit after the part-of-speech tagging, and to label each unit with a named entity tag according to the named entity. In one embodiment, the named entity recognition sub-module 130 can augment named entities through machine learning training, where named entities can include people, things, times, places, and things. The present invention is not limited thereto.
意圖資料庫250用以儲存意圖資料,意圖資料包括意圖編號、與意圖編號對應的意圖、與意圖編號對應的語意模型及與意圖編號對應的關鍵詞彙。每個意圖均代表著特定的行為,意圖的輸出可使意圖評選系統10依據意圖進行相對應的行為。The intent database 250 is used to store intent data, which includes intent numbers, intents corresponding to the intent numbers, semantic models corresponding to the intent numbers, and key words corresponding to the intent numbers. Each intention represents a specific behavior, and the output of the intention can enable the intention selection system 10 to perform corresponding behavior according to the intention.
語意解析子模組240與意圖資料庫250電性連接,用以依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型及公式(2)獲取各餘弦相似度,並且判斷各餘弦相似度是否大於相似度門檻值。其中公式(2)為: 公式(2) The semantic parsing sub-module 240 is electrically connected to the intent database 250, and is used to obtain the similarity of each cosine based on each unit marked with the named entity tag and the semantic model corresponding to the intent number and formula (2), and determine each cosine. Whether the similarity is greater than the similarity threshold. Among them, formula (2) is: Formula (2)
其中,u為使用者的語音指令, 為依據意圖資料庫的語意模型將使用者的語音指令轉成詞頻(term frequency,TF)的向量, 為u向量的長度, 為意圖k, 為意圖k語意模型依據該意圖資料庫的該語意模型轉成TF的向量, 為 向量的長度, 為該 向量與該 向量的內積。 Among them, u is the user’s voice command, In order to convert the user's voice command into a vector of term frequency (TF) based on the semantic model of the intent database, is the length of u vector, For intention k, is the vector that the semantic model of intention k is converted into TF based on the semantic model of the intention database, for the length of the vector, for this vector with the Inner product of vectors.
最佳意圖輸出子模組260與語意解析子模組240電性連接,用以選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖,意圖的輸出可使聲控裝置依據意圖進行相對應的行為,以符合使用者的語音指令。The best intention output sub-module 260 is electrically connected to the semantic parsing sub-module 240, and is used to select the intention that outputs the intention number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold. The output of the intention can make The voice control device performs corresponding actions according to the intention to comply with the user's voice instructions.
上下文解析子模組210與語意解析子模組240電性連接,用以於各餘弦相似度均非大於相似度門檻值時,依據word2vec模型及關鍵詞彙產生相似詞彙,依據公式1、關鍵詞彙、相似詞彙及語音指令的前N筆對話的上下文獲取對應於意圖編號的各意圖權重,判斷各意圖權重是否大於權重門檻值,以輸出大於權重門檻值的意圖權重對應的意圖編號的意圖,其中公式1為: 公式(1) The context analysis sub-module 210 is electrically connected to the semantic analysis sub-module 240, and is used to generate similar words based on the word2vec model and key words when each cosine similarity is not greater than the similarity threshold. According to Formula 1, key words, The context of the first N conversations with similar words and voice commands obtains the intention weight corresponding to the intention number, determines whether the weight of each intention is greater than the weight threshold, and outputs the intention number corresponding to the intention weight greater than the weight threshold, where the formula 1 is: Formula 1)
其中 為語音指令隸屬於intent-k的分數, 為代表隸屬於k語意的關鍵詞彙, 為由k語意的關鍵詞彙透過word2vec模型找到的相似詞彙, 為語音指令的前一句話。 in is the score of the voice command belonging to intent-k, represents the key words belonging to the semantic meaning of k, It is the similar words found through the word2vec model from the key words of k semantics. is the previous sentence of the voice command.
使用者歷程子模組230與語意解析子模組240電性連接,用以儲存使用者歷程記錄。在一實施例中,語意解析子模組240依據使用者的個人資訊對使用者歷程記錄進行意圖分群,以獲取意圖分群編號、與意圖分群編號對應的分群意圖及所有使用者平均頻率,並且依據相似詞彙、關鍵詞彙、所有使用者平均頻率及對應意圖編號的意圖輸出對應於意圖分群編號的分群意圖。The user history sub-module 230 is electrically connected to the semantic parsing sub-module 240 for storing user history records. In one embodiment, the semantic parsing sub-module 240 performs intent grouping on user history records based on the user's personal information to obtain the intent grouping number, the grouping intent corresponding to the intent grouping number, and the average frequency of all users, and based on Similar words, key words, the average frequency of all users, and the intent output corresponding to the intent number correspond to the grouping intent of the intent group number.
使用者歷程子模組230可用於解決因個人使用歷程記錄缺乏相關資訊,使用者歷程子模組230儲存的使用者歷程記錄指的是使用聲控裝置的所有使用者的使用者歷程記錄,會依使用者的個人資訊例如性別、年齡、職業、使用時間、使用頻率與使用日期進行意圖分群。舉例來說:當使用者A發出語音指令為“我要聽音樂”,然而使用者A並無相關的個人使用歷程記錄,這時候使用者歷程子模組230會依使用者A的個人資料找出所有使用者中適合的意圖分群,再依上下文解析子模組210的關鍵詞彙,找出”我要聽音樂”最常使用的歌手與歌曲,然後播放給使用者A聆聽。如無上下文解析子模組210的關鍵詞彙,則依意圖分群中所有使用者平均頻率排序最高者對應的分群意圖進行播放。The user history sub-module 230 can be used to solve the problem of lack of relevant information in personal usage history records. The user history records stored in the user history sub-module 230 refer to the user history records of all users who use the voice-controlled device, and will be processed accordingly. Users' personal information such as gender, age, occupation, usage time, usage frequency and usage date are used for purpose grouping. For example: when user A issues a voice command of "I want to listen to music", but user A does not have relevant personal usage history records, then the user history sub-module 230 will find the user's user history based on user A's personal information. Suitable intention groups are found among all users, and then the key words of the sub-module 210 are analyzed according to the context, and the most commonly used singers and songs in "I want to listen to music" are found, and then played for user A to listen. If there is no key word of the context analysis sub-module 210, then the group intention corresponding to the highest average frequency ranking of all users in the intention group will be played.
個人使用歷程子模組220與語意解析子模組240電性連接,用以儲存使用者的個人使用歷程記錄,其中個人使用歷程記錄包括時間、意圖編號、使用的關鍵詞彙及頻率。在一實施例中,於各餘弦相似度均非大於相似度門檻值時,語意解析子模組240將經命名實體標籤標註後的該些單位與個人使用歷程記錄中的使用的關鍵詞彙進行比對,依據頻率獲取並且輸出對應於語音指令的時間的對應於意圖編號的意圖。The personal usage history sub-module 220 is electrically connected to the semantic parsing sub-module 240 to store the user's personal usage history record, where the personal usage history record includes time, intention number, key words used and frequency. In one embodiment, when each cosine similarity is not greater than the similarity threshold, the semantic parsing sub-module 240 compares the units marked with named entity tags with the key words used in the personal usage history record. Yes, the intention corresponding to the intention number corresponding to the time of the voice command is acquired and output based on the frequency.
本發明可針對資訊不足的語音指令,不需要與使用者多次互動而精確判斷出使用者的意圖。個人使用歷程子模組220可以依據個人使用經驗(個人使用歷程記錄),從中自動取得當下不足的資訊。The present invention can address voice commands with insufficient information and accurately determine the user's intention without requiring multiple interactions with the user. The personal usage history sub-module 220 can automatically obtain insufficient current information based on personal usage experience (personal usage history records).
舉例來說:我要聽音樂這個指令,上下文解析子模組210判斷為播放音樂的意圖,然其中並沒有指明要播放誰的歌曲,這時候就會參考個人使用歷程記錄,使用者可能前陣子常聽張惠妹的聽海,這時候我要聽音樂這個指令的意圖就會納入張惠妹的聽海這一個人使用歷程記錄,進而播放張惠妹的聽海給使用者聽。此外,個人使用歷程記錄會依早班(08:01-16:00)、小夜(16:01-24:00)、大夜(00:01-08:00)這三個時間區間進行意圖分群,相同意圖下其關鍵詞彙會不同,用來輔助語意理解的會是最高頻率的關鍵詞彙。For example: the command "I want to listen to music", the context analysis sub-module 210 determines that it is the intention to play music, but it does not specify whose song is to be played. At this time, it will refer to the personal usage history record. The user may have played it a while ago. If you often listen to Zhang Huimei's Tinghai, then the intention of the command "I want to listen to music" will be included in the personal usage history record of Zhang Huimei's Tinghai, and then Zhang Huimei's Tinghai will be played to the user. In addition, personal usage history records will be divided into intention groups based on three time intervals: morning shift (08:01-16:00), midnight (16:01-24:00), and late night (00:01-08:00). , the key words will be different under the same intention, and the key words with the highest frequency will be used to assist semantic understanding.
自然語言處理模組100、語意評選模組200、斷詞子模組110、詞性標註子模組120、命名實體識別子模組130、意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230可透過軟體、韌體、硬體電路的其中之一或其任意組合來實作,且本揭露不對自然語言處理模組100、語意評選模組200、斷詞子模組110、詞性標註子模組120、命名實體識別子模組130、意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230的實作方式作出限制。Natural language processing module 100, semantic selection module 200, word segmentation sub-module 110, part-of-speech tagging sub-module 120, named entity recognition sub-module 130, intent database 250, semantic analysis sub-module 240, best intent output The sub-module 260, the context analysis sub-module 210, the personal usage history sub-module 220 and the user history sub-module 230 can be implemented through one of software, firmware, hardware circuits or any combination thereof, and This disclosure does not include the natural language processing module 100, the semantic selection module 200, the word segmentation sub-module 110, the part-of-speech tagging sub-module 120, the named entity recognition sub-module 130, the intent database 250, the semantic parsing sub-module 240, and finally The implementation methods of the good intention output sub-module 260, the context analysis sub-module 210, the personal usage history sub-module 220 and the user history sub-module 230 are restricted.
以下結合幾個實施例來說明基於上下文情境與使用歷程的意圖評選系統10如何應用於聲控裝置。The following describes how the intention selection system 10 based on context and usage history is applied to voice-controlled devices in conjunction with several embodiments.
於一第一實施例中,利用語意解析子模組240與意圖資料庫250即可匹配出使用者的精確意圖。In a first embodiment, the user's precise intention can be matched using the semantic parsing sub-module 240 and the intention database 250 .
以使用者為住在新竹的A先生為例,A先生每天都會由新竹通勤到台北內湖上班,A先生在上班或下班途中都會使用車輛上的聲控裝置聽音樂。A先生對聲控裝置發出的語音指令是[我要聽五月天的乾杯]。該語音指令經由聲控裝置轉換為文字後輸入文字到意圖評選系統10,斷詞子模組110對文字使用斷詞符號例如逗號對文字進行分詞,以將文字分為多個最小且有詞義的單位,經斷詞後得到各最小且有詞義的單位為:[我,要,聽,五月天,的,乾杯]。之後將各單位傳送至詞性標註子模組120,進行詞性標註,得到的結果為[我_PN,要_VV,聽_VV,五月天_NR,的_DER,乾杯_NR],其中斷詞符號可用逗號表示,PN為代名詞,VV為動詞,NN為一般名詞。最後再將經詞性標注的結果傳送至命名實體識別子模組130,進行命名實體的識別,得到的結果為[我_PN_o,要_VV_o,聽_VV_o,五月天_NR_singer/song,的_DER_o,乾杯_NR_song],其中o代表沒有特定的人事時地物等命名實體標籤,story代表故事這個詞本身是一則故事、song代表歌曲,也代表在這個實施例中,故事它也是一首歌,如此,經過自然語言處理模組100的上述解析後,將經命名實體識別的結果傳給語意解析子模組240,語意解析子模組240將該結果與意圖資料庫250進行餘弦相似度計算,高於相似度門檻值的最高餘弦相似度對應的意圖編號的意圖就會作為最佳意圖由最佳意圖輸出子模組260輸出。
如表1所示,意圖共有M個(意圖編號分別為I 1,I 2,I 3,I 4…I M),依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型及公式(2)獲取各餘弦相似度的方式如下: As shown in Table 1, there are a total of M intentions (the intention numbers are I 1 , I 2 , I 3 , I 4 ...I M respectively). According to each unit marked with the named entity label and the semantic model corresponding to the intention number and The way to obtain each cosine similarity using formula (2) is as follows:
首先要先建立整體的詞彙集合,找出統一的向量維度,依表1建立出的詞彙集合(各意圖編號對應的語意模型)如下:First of all, we must first establish the overall vocabulary set and find the unified vector dimension. The vocabulary set established according to Table 1 (the semantic model corresponding to each intention number) is as follows:
corpus = ['播放_VV_o', '張惠妹_NR_singer', '張學友_NR_singer', '音樂_NN_o', '五月天_NR_singer', '五月天_NR_song','聽海_NR_song', '情書_NR_song', '故事_NR_song','乾杯_NR_song','聽_VV_o','播放_VV_o','故事_NN_o','故事_NR_story','聽_VV_o','播放_VV_o','廣播_NN_o','中央警廣_NR_channel','有_VV_o','播放_VV_o','行事曆_NN_o','行程_NN_o','聽_VV_o','播放_VV_o','蒸蒸日上_NR_idiom','成語_NN_o','三隻小豬_NR_story','醜小鴨_NR_story']corpus = ['Play_VV_o', 'Zhang Huimei_NR_singer', 'Jacky Cheung_NR_singer', 'Music_NN_o', 'Mayday_NR_singer', 'Mayday_NR_song', 'Tinghai_NR_song', 'Love Letter_NR_song ', 'Story_NR_song','Cheers_NR_song','Listen_VV_o','Play_VV_o','Story_NN_o','Story_NR_story','Listen_VV_o','Play_VV_o', 'Broadcast_NN_o','Central Police Broadcasting_NR_channel','Have_VV_o','Play_VV_o','Calendar_NN_o','Itinerary_NN_o','Listen_VV_o','Play_VV_o' ,'Zhizhishang_NR_idiom','idiom_NN_o','Three Little Pigs_NR_story','Ugly Duckling_NR_story']
而後,將A先生的指令透過corpus建立成向量u,向量的內容是指令於corpus出現的次數:Then, Mr. A’s instruction is created into a vector u through corpus. The content of the vector is the number of times the instruction appears in corpus:
u= [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] u = [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
之後再分別將I 1,I 2,…,I M依上述方式建立向量,分別如下: Then I 1 , I 2 ,..., I M are then used to create vectors in the above manner, as follows:
I 1= [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] I 1 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I 2= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1] I 2 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
I 3= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0] I 3 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]
I 4= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0] I 4 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
I M= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] I M = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
使用餘弦相似度公式(2)計算 u與 I 1的相似度: Use the cosine similarity formula (2) to calculate the similarity between u and I 1 :
cos( u, I 1)=(0*1+0*1+0*1+0*1+1*1+1*1+0*1+0*1+0*1+1*1+1*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0)/(4^(0.5))*(10^(0.5)) =3/(2*(10^(0.5)) =0.4743 cos( u , I 1 )=(0*1+0*1+0*1+0*1+1*1+1*1+0*1+0*1+0*1+1*1+1 *0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0)/(4 ^(0.5))*(10^(0.5)) =3/(2*(10^(0.5)) =0.4743
以此類推,計算I 2,I 3, I 4,I M的相似度分別為:0.20,0.25,0,0.25,於本實施例中,相似度門檻值設定為0.45,A先生的指令與I 1的餘弦相似度最高,最大餘弦相似度為0.4743且高於相似度門檻值,因此選擇輸出最大餘弦相似度對應的意圖編號I 1的意圖,故直接觸發播放五月天的乾杯這首音樂。 By analogy, the calculated similarities of I 2 , I 3 , I 4 and I M are respectively: 0.20, 0.25, 0, 0.25. In this embodiment, the similarity threshold is set to 0.45. Mr. A’s instruction and I 1 has the highest cosine similarity, and the maximum cosine similarity is 0.4743 and is higher than the similarity threshold. Therefore, the intention number I 1 corresponding to the maximum cosine similarity is selected to output, so the music of Mayday's Cheers is directly triggered to be played.
於一第二實施例中,在語意解析子模組240無法解析下,透過上下文解析子模組210與使用者歷程子模組230匹配出使用者A先生的精確意圖。In a second embodiment, when the semantic parsing sub-module 240 cannot parse, the precise intention of the user Mr. A is matched through the context parsing sub-module 210 and the user process sub-module 230.
當A先生第一次使用智慧聲控裝置,發出語音指令[我要聽音樂],該語音指令經由聲控裝置轉換為文字後輸入文字到意圖評選系統10,斷詞子模組110對文字進行斷詞,經斷詞後得到各最小且有詞義的單位為[我,要,聽,音樂],之後各單位經詞性標註子模組120進行詞性標註,得到的結果為[我_PN,要_VV,聽_VV,音樂_NN],接著再經命名實體識別子模組130進行命名實體識別,得到的結果為[我_PN_o,要_VV_o,聽_VV_o,音樂_NN_o],如此,經過自然語言處理模組100的上述解析後,語意解析子模組240再將經命名實體識別的結果與意圖資料庫250進行餘弦相似度運算,結合表1及公式(2),對應於意圖編號I1,I2,I3,I4,IM的各餘弦相似度分別為0.2236,0.2887,0.3536,0,0.3536,均沒有超過設定的相似度門檻值0.45。When Mr. A uses the smart voice control device for the first time and issues a voice command [I want to listen to music], the voice command is converted into text through the voice control device and then input into the text into the intention selection system 10. The word segmentation sub-module 110 segments the text. , after word segmentation, the smallest and meaningful units are obtained as [I, want, listen, music], and then each unit is tagged by the part-of-speech tagging sub-module 120, and the result obtained is [I_PN, want_VV , listen_VV, music_NN], and then perform named entity recognition through the named entity recognition sub-module 130, and the result obtained is [I_PN_o, want_VV_o, listen_VV_o, music_NN_o]. In this way, after natural After the above analysis by the language processing module 100, the semantic analysis sub-module 240 then performs a cosine similarity operation on the named entity recognition result and the intention database 250. Combining Table 1 and formula (2), it corresponds to the intention number I1, The cosine similarities of I2, I3, I4, and IM are 0.2236, 0.2887, 0.3536, 0, and 0.3536 respectively, none of which exceeds the set similarity threshold of 0.45.
這時候觸發上下文解析子模組210,依據表2中意圖資料庫250的關鍵詞彙,將關鍵詞彙輸入至由維基百科訓練出來的word2vec模型產生對應的相似詞彙,相似詞彙如表3。
由word2vec模型產生的相似詞彙
在一實施例中,表3中意圖編號I
3的相似詞彙有個N.A.代表無法透過word2vec模型取得對應的相似詞彙。舉例而言,A先生在下達[我要聽音樂]這一語音指令之前,先與聲控裝置講了非具有特定意圖的對話內容,如表4中A先生上下文資訊:
其中上下文解析子模組210會使用表2中的關鍵詞彙與表3中的相似詞彙以及公式(1)計算對應於意圖編號的各意圖的意圖權重。舉例而言,I 1的意圖權重計算如下: The context parsing sub-module 210 will use the key words in Table 2, similar words in Table 3 and formula (1) to calculate the intention weight of each intention corresponding to the intention number. For example, the intent weight of I 1 is calculated as follows:
依公式(1),依表4中的上下文資訊:最近在回憶蕭亞軒與孫燕姿同期的天后有誰,其中符合I 1的關鍵詞彙與相似詞彙有兩個詞彙,分別是蕭亞軒與孫燕姿。由於上下文資訊只有一句,所以公式(1)的值為: Score intent1= 2*(1/1) According to formula (1), according to the contextual information in Table 4: Who are the queens who were recently recalled in the same period as Elva Hsiao and Stefanie Sun? Among them, there are two key words and similar words that match I 1 , namely Elva Hsiao and Stefanie Sun. Since there is only one sentence of contextual information, the value of formula (1) is: Score intent1 = 2*(1/1)
如此,計算出對應於意圖編號I 1的意圖權重為2。以此類推,計算對應於意圖編號I 2,I 3, I 4,I M的意圖權重分別為0,0,0,0。 In this way, the intention weight corresponding to the intention number I 1 is calculated to be 2. By analogy, the intention weights corresponding to the intention numbers I 2 , I 3 , I 4 and I M are calculated as 0, 0, 0, 0 respectively.
由於權重門檻值設定為1.8,只要大於權重門檻值就會觸發對應於意圖編號的意圖,於本實施例中,輸出意圖編號I 1的意圖,即播放音樂。 Since the weight threshold is set to 1.8, as long as it is greater than the weight threshold, the intention corresponding to the intention number will be triggered. In this embodiment, the intention with intention number I 1 is output, that is, playing music.
由於A先生是第一次使用智慧聲控裝置,故個人使用歷程子模組220中並無資料(個人使用歷程記錄)可參考。Since Mr. A is using a smart voice control device for the first time, there is no data (personal usage history record) in the personal usage history sub-module 220 for reference.
使用者歷程子模組230儲存智慧聲控裝置所有的使用者歷程記錄,依個人資訊例如性別、年齡、職業、使用時間、使用頻率與使用日期進行意圖分群。於本實施例中,A先生是上班族,性別男、年齡30歲、職業為服務業,使用時間為2023/5/19/12:10,是周五的中午時段。使用者歷程子模組230依據A先生的個人資訊得到的意圖分群為G
1I
1如表5(依個資的意圖分群表):
藉由上述,上下文解析子模組210決定了對應意圖編號I 1的意圖(播放音樂),這時候透過個人使用歷程子模組220與使用者歷程子模組230與其相似詞彙對應回來的關鍵詞彙為張惠妹與張學友,再依所有使用者平均頻率過濾,決定輸出所有使用者平均頻率中最高頻率對應的意圖分群編號為G1I1的分群意圖,即播放張惠妹的聽海。換言之,依據上下文解析子模組210,其關鍵詞彙與相似詞彙有找到張惠妹與張學友的歌曲,依上下文解析子模組210決定撥放兩者最高頻率的張惠妹的聽海。 Through the above, the context analysis sub-module 210 determines the intention (playing music) corresponding to the intention number I 1. At this time, the key words corresponding to similar words are returned through the personal usage history sub-module 220 and the user history sub-module 230. For Zhang Huimei and Jacky Cheung, filter based on the average frequency of all users, and decide to output the group intention corresponding to the highest frequency among the average frequencies of all users, with the group number G1I1, which is to play Zhang Huimei's Tinghai. In other words, according to the context analysis sub-module 210, the key words and similar words are found in the songs of Zhang Huimei and Jacky Cheung, and the context analysis sub-module 210 decides to play the song of Zhang Huimei with the highest frequency of the two.
於一第三實施例中,在語意解析子模組240無法解析下,且無上下文的情況時,單純透過個人使用歷程子模組220匹配出使用者的精確意圖。In a third embodiment, when the semantic parsing sub-module 240 cannot parse and there is no context, the user's precise intention is simply matched through the personal usage history sub-module 220 .
於本實施例中,B先生於當天早班時間(08:01-16:00)使用聲控裝置時,只講了[故事]這個指令,透過自然語言處理模組100的解析處理(請參考前述斷詞、詞性標註、命名實體的識別)後,得到[故事_NN_o, 故事_NR_story,故事_NR_song]。語意解析子模組240與意圖資料庫250進行餘弦相似度計算後,分別對於[故事_NN_o,故事_NR_story,故事_NR_song]分別計算出的餘弦相似度如表6(故事對於意圖資料庫250的餘弦相似度):
由表6可看出,對應於意圖編號I
1,I
2,I
3,I
4,I
M的餘弦相似度均未超過設定的相似度門檻值0.45,這時候會觸發上下文解析子模組210,由於B先生當日並無上下文資訊可參考,故上下文解析子模組210並無作用。接下來參考個人使用歷程子模組220中儲存的個人使用歷程記錄,因B先生之前使用過該裝置,其個人使用歷程子模組紀錄如表7(B先生個人使用歷程):
接著,個人使用歷程子模組220會將語音指令中的各單位與個人使用歷程子模組220中的使用的關鍵詞彙做比對,依據頻率取出對應於語音指令的時間對應於意圖編號的意圖(包括使用的關鍵詞彙),如果有相同時間與相同關鍵詞彙的情況,則會隨機挑選一筆。最後,B先生的語音指令發出的時間是早班時間,故挑選表7中使用的關鍵詞彙包含故事且為最高頻率的意圖I 2,其使用的關鍵詞彙為故事與三隻小豬這個意圖進行播放。 Next, the personal usage history sub-module 220 will compare each unit in the voice command with the key words used in the personal usage history sub-module 220, and extract the intention corresponding to the intention number corresponding to the time of the voice command according to the frequency. (Including the key words used), if there are the same time and the same key words, one will be randomly selected. Finally, Mr. B’s voice command was issued during morning shift time, so the key words used in Table 7 were selected to include stories and be the highest frequency intention I 2 , and the key words used were story and the intention of the three little pigs. Play.
下文中,將搭配圖1中的各項裝置、元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整,且並不僅限於此。In the following, the method described in the embodiment of the present invention will be described with reference to various devices, components and modules in FIG. 1 . Each process of this method can be adjusted according to the implementation situation, and is not limited to this.
圖2是依照本發明的一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。FIG. 2 is a flow chart of an intent selection method based on context and usage history according to an embodiment of the present invention.
請結合圖1及圖2,於步驟S201中,斷詞子模組110使用斷詞符號對文字進行分詞,以將文字分為多個最小且有詞義的單位。Please refer to FIG. 1 and FIG. 2 . In step S201 , the word segmentation sub-module 110 uses word segmentation symbols to segment the text, so as to divide the text into multiple smallest and meaningful units.
於步驟S202中,詞性標註子模組120對各單位進行詞性標註,以對各單位分別標註詞性。In step S202, the part-of-speech tagging sub-module 120 performs part-of-speech tagging on each unit, so as to tag each unit with a part-of-speech respectively.
於步驟S203中,命名實體識別子模組130對經詞性標註後的各單位進行命名實體的識別,以依據命名實體分別對各單位標註命名實體標籤。In step S203, the named entity recognition sub-module 130 performs named entity recognition on each unit after the part-of-speech tagging, so as to label each unit with a named entity tag according to the named entity.
於步驟S204中,語意解析子模組240依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度,並且判斷各餘弦相似度是否大於相似度門檻值。In step S204, the semantic parsing sub-module 240 obtains each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intention number, and determines whether each cosine similarity is greater than the similarity threshold.
於步驟S205中,倘若餘弦相似度大於相似度門檻值,最佳意圖輸出子模組260選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。In step S205, if the cosine similarity is greater than the similarity threshold, the best intention output sub-module 260 selects the intention to output the intention number corresponding to the maximum cosine similarity among the cosine similarities greater than the similarity threshold.
於步驟S206中,倘若各餘弦相似度均非大於相似度門檻值,上下文解析子模組210依據word2vec模型及關鍵詞彙產生相似詞彙,依據公式1、關鍵詞彙、相似詞彙及語音指令的前N筆對話的上下文獲取對應於意圖編號的各意圖權重,判斷各意圖權重是否大於權重門檻值,以輸出大於權重門檻值的意圖權重對應的意圖編號的意圖。In step S206, if each cosine similarity is not greater than the similarity threshold, the context analysis sub-module 210 generates similar words based on the word2vec model and key words, based on Formula 1, key words, similar words and the first N words of the voice command The context of the conversation obtains the weight of each intention corresponding to the intention number, determines whether the weight of each intention is greater than the weight threshold, and outputs the intention with the intention number corresponding to the intention weight greater than the weight threshold.
於步驟S207中,語意解析子模組240依據使用者的個人資訊對使用者歷程子模組230儲存的使用者歷程記錄進行意圖分群,以獲取意圖分群編號、與意圖分群編號對應的分群意圖及所有使用者平均頻率,並且語意解析子模組240依據相似詞彙、關鍵詞彙、所有使用者平均頻率及對應意圖編號的意圖輸出對應於意圖分群編號的分群意圖。In step S207, the semantic parsing sub-module 240 performs intention grouping on the user history records stored by the user process sub-module 230 based on the user's personal information to obtain the intention grouping number, the grouping intention corresponding to the intention grouping number, and The average frequency of all users, and the semantic parsing sub-module 240 outputs the grouping intention corresponding to the intention grouping number based on similar words, key words, the average frequency of all users and the intention corresponding to the intention number.
圖3是依照本發明的另一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。FIG. 3 is a flow chart of an intent selection method based on context and usage history according to another embodiment of the present invention.
請結合圖1及圖3,於步驟S301中,斷詞子模組110使用斷詞符號對文字進行分詞,以將文字分為多個最小且有詞義的單位。Please refer to FIG. 1 and FIG. 3 . In step S301 , the word segmentation sub-module 110 uses word segmentation symbols to segment the text, so as to divide the text into multiple smallest and meaningful units.
於步驟S302中,詞性標註子模組120對各單位進行詞性標註,以對各單位分別標註詞性。In step S302, the part-of-speech tagging sub-module 120 performs part-of-speech tagging on each unit, so as to tag each unit with a part-of-speech respectively.
於步驟S303中,命名實體識別子模組130對經詞性標註後的各單位進行命名實體的識別,以依據命名實體分別對各單位標註命名實體標籤。In step S303, the named entity recognition sub-module 130 performs named entity recognition on each unit after the part-of-speech tagging, so as to label each unit with a named entity tag according to the named entity.
於步驟S304中,語意解析子模組240依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度,並且判斷各餘弦相似度是否大於相似度門檻值。In step S304, the semantic parsing sub-module 240 obtains each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intention number, and determines whether each cosine similarity is greater than the similarity threshold.
於步驟S305中,倘若餘弦相似度大於相似度門檻值,最佳意圖輸出子模組260選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。In step S305, if the cosine similarity is greater than the similarity threshold, the best intention output sub-module 260 selects the intention to output the intention number corresponding to the maximum cosine similarity among the cosine similarities greater than the similarity threshold.
於步驟S306中,倘若各餘弦相似度均非大於相似度門檻值時,語意解析子模組240將經命名實體標籤標註後的各單位與個人使用歷程子模組220儲存的使用者的個人使用歷程記錄中的使用的關鍵詞彙進行比對,依據頻率獲取並且輸出對應於語音指令的時間的對應於意圖編號的意圖。In step S306, if each cosine similarity is not greater than the similarity threshold, the semantic parsing sub-module 240 combines the user's personal usage stored by the named entity tags of each unit and personal usage history sub-module 220. The key words used in the history record are compared, and the intention corresponding to the intention number corresponding to the time of the voice command is obtained and output.
基於上述,本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法,不僅可依當下對話的情境與使用者歷程記錄,進行精確意圖識別並無須互動式方法來釐清使用者的意圖,且減少使用者進行語音指令的次數並且提高使用者使用體驗,讓觸發聲控裝置更簡易更便利。Based on the above, the present invention provides an intention selection system and method based on context and usage history, which not only can accurately identify the intention based on the current conversation situation and user history records, but does not require interactive methods to clarify the user's intention. It also reduces the number of voice commands required by the user and improves the user experience, making it easier and more convenient to trigger the voice control device.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above through embodiments, they are not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field may make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.
10:意圖評選系統 310:收發器 320:儲存媒體 330:處理器 100:自然語言處理模組 200:語意評選模組 110:斷詞子模組 120:詞性標註子模組 130:命名實體識別子模組 210:上下文解析子模組 220:個人使用歷程子模組 230:使用者歷程子模組 240:語意解析子模組 250:意圖資料庫 260:最佳意圖輸出子模組 S201、S202、S203、S204、S205、S206、S207、S301、S302、S303、S304、S305、S306:步驟10: Intention selection system 310:Transceiver 320:Storage media 330: Processor 100:Natural language processing module 200: Semantic selection module 110: Word segmentation sub-module 120: Part-of-speech tagging sub-module 130: Named entity recognition submodule 210:Context parsing sub-module 220: Personal usage history sub-module 230:User process submodule 240: Semantic parsing sub-module 250:Intent database 260: Best Intention Output Submodule S201, S202, S203, S204, S205, S206, S207, S301, S302, S303, S304, S305, S306: steps
圖1是依照本發明的一實施例的一種基於上下文情境與使用歷程的意圖評選系統的示意圖。 圖2是依照本發明的一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。 圖3是依照本發明的另一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。 Figure 1 is a schematic diagram of an intention selection system based on context and usage history according to an embodiment of the present invention. FIG. 2 is a flow chart of an intent selection method based on context and usage history according to an embodiment of the present invention. FIG. 3 is a flow chart of an intent selection method based on context and usage history according to another embodiment of the present invention.
10:意圖評選系統 10: Intention selection system
310:收發器 310:Transceiver
320:儲存媒體 320:Storage media
330:處理器 330: Processor
100:自然語言處理模組 100:Natural language processing module
200:語意評選模組 200: Semantic selection module
110:斷詞子模組 110: Word segmentation sub-module
120:詞性標註子模組 120: Part-of-speech tagging sub-module
130:命名實體識別子模組 130: Named entity recognition submodule
210:上下文解析子模組 210:Context parsing sub-module
220:個人使用歷程子模組 220: Personal usage history sub-module
230:使用者歷程子模組 230:User process submodule
240:語意解析子模組 240: Semantic parsing sub-module
250:意圖資料庫 250:Intent database
260:最佳意圖輸出子模組 260: Best Intention Output Submodule
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112130727A TWI832792B (en) | 2023-08-16 | 2023-08-16 | Context-aware and user history based intent evaluation system and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112130727A TWI832792B (en) | 2023-08-16 | 2023-08-16 | Context-aware and user history based intent evaluation system and method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
TWI832792B true TWI832792B (en) | 2024-02-11 |
Family
ID=90824970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW112130727A TWI832792B (en) | 2023-08-16 | 2023-08-16 | Context-aware and user history based intent evaluation system and method thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI832792B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9493130B2 (en) * | 2011-04-22 | 2016-11-15 | Angel A. Penilla | Methods and systems for communicating content to connected vehicle users based detected tone/mood in voice input |
TW202020692A (en) * | 2018-11-20 | 2020-06-01 | 財團法人資訊工業策進會 | Semantic analysis method, semantic analysis system, and non-transitory computer-readable medium |
CN112784574A (en) * | 2021-02-02 | 2021-05-11 | 网易(杭州)网络有限公司 | Text segmentation method and device, electronic equipment and medium |
CN114999482A (en) * | 2022-05-30 | 2022-09-02 | 东风汽车有限公司东风日产乘用车公司 | Line-of-sight-based voice recognition method, device, equipment and storage medium |
CN115273840A (en) * | 2022-06-27 | 2022-11-01 | 海信视像科技股份有限公司 | Voice interaction device and voice interaction method |
US20230135179A1 (en) * | 2021-10-21 | 2023-05-04 | Meta Platforms, Inc. | Systems and Methods for Implementing Smart Assistant Systems |
CN116417003A (en) * | 2021-12-31 | 2023-07-11 | 科大讯飞股份有限公司 | Voice interaction system, method, electronic device and storage medium |
CN116547746A (en) * | 2020-09-21 | 2023-08-04 | 亚马逊技术公司 | Dialog management for multiple users |
-
2023
- 2023-08-16 TW TW112130727A patent/TWI832792B/en active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9493130B2 (en) * | 2011-04-22 | 2016-11-15 | Angel A. Penilla | Methods and systems for communicating content to connected vehicle users based detected tone/mood in voice input |
TW202020692A (en) * | 2018-11-20 | 2020-06-01 | 財團法人資訊工業策進會 | Semantic analysis method, semantic analysis system, and non-transitory computer-readable medium |
CN116547746A (en) * | 2020-09-21 | 2023-08-04 | 亚马逊技术公司 | Dialog management for multiple users |
CN112784574A (en) * | 2021-02-02 | 2021-05-11 | 网易(杭州)网络有限公司 | Text segmentation method and device, electronic equipment and medium |
US20230135179A1 (en) * | 2021-10-21 | 2023-05-04 | Meta Platforms, Inc. | Systems and Methods for Implementing Smart Assistant Systems |
CN116417003A (en) * | 2021-12-31 | 2023-07-11 | 科大讯飞股份有限公司 | Voice interaction system, method, electronic device and storage medium |
CN114999482A (en) * | 2022-05-30 | 2022-09-02 | 东风汽车有限公司东风日产乘用车公司 | Line-of-sight-based voice recognition method, device, equipment and storage medium |
CN115273840A (en) * | 2022-06-27 | 2022-11-01 | 海信视像科技股份有限公司 | Voice interaction device and voice interaction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240069860A1 (en) | Search and knowledge base question answering for a voice user interface | |
US8972260B2 (en) | Speech recognition using multiple language models | |
US10037758B2 (en) | Device and method for understanding user intent | |
Adel et al. | Recurrent neural network language modeling for code switching conversational speech | |
US7949531B2 (en) | Conversation controller | |
US7949530B2 (en) | Conversation controller | |
US11823678B2 (en) | Proactive command framework | |
US7949532B2 (en) | Conversation controller | |
WO2019158014A1 (en) | Computer-implemented method for dialoguing with user and computer system | |
US9589563B2 (en) | Speech recognition of partial proper names by natural language processing | |
US10224030B1 (en) | Dynamic gazetteers for personalized entity recognition | |
KR20190082900A (en) | A speech recognition method, an electronic device, and a computer storage medium | |
US10366690B1 (en) | Speech recognition entity resolution | |
TW201203222A (en) | Voice stream augmented note taking | |
JPWO2016067418A1 (en) | Dialog control apparatus and dialog control method | |
CN109979450B (en) | Information processing method and device and electronic equipment | |
WO2021179701A1 (en) | Multilingual speech recognition method and apparatus, and electronic device | |
US11195522B1 (en) | False invocation rejection for speech processing systems | |
CN112669842A (en) | Man-machine conversation control method, device, computer equipment and storage medium | |
KR20170090127A (en) | Apparatus for comprehending speech | |
US8401855B2 (en) | System and method for generating data for complex statistical modeling for use in dialog systems | |
JP4354299B2 (en) | Case search program, case search method, and case search device | |
US11817093B2 (en) | Method and system for processing user spoken utterance | |
TWI832792B (en) | Context-aware and user history based intent evaluation system and method thereof | |
JP4475628B2 (en) | Conversation control device, conversation control method, and program thereof |