TWI832792B

TWI832792B - Context-aware and user history based intent evaluation system and method thereof

Info

Publication number: TWI832792B
Application number: TW112130727A
Authority: TW
Inventors: 陳仲詠
Original assignee: 中華電信股份有限公司
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2024-02-11

Abstract

A context-aware and user history based intent evaluation system and method thereof, which are applied to a voice control device, the method includes natural language processing using word segmentation, part-of-speech tagging, and named entity recognition, obtaining each cosine similarity according to each unit tagged and semantic model corresponding to intent number, determining whether each cosine similarity is greater than a similarity threshold, and selecting and outputting the intent that the intent number corresponding to the largest cosine similarity among the cosine similarities greater than the similarity threshold. If the threshold is not reached, use the context and usage history for intent recognition to help accurately judge intent without interacting with the user to improve user experience.

Description

基於上下文情境與使用歷程的意圖評選系統及其方法Intention selection system and method based on context and usage history

本發明是有關於一種意圖評選系統及其方法，且特別是有關於一種基於上下文情境與使用歷程的意圖評選系統及其方法。The present invention relates to an intention selection system and a method thereof, and in particular to an intention selection system and a method based on context and usage history.

聲控智慧型裝置的普及化下，例如智慧型手機、智慧型音箱、車載人工智慧平台與聲控智慧手錶等聲控裝置隨處可見，如何能從使用者上下文情境與歷程中精確判斷使用者的意圖，而不用透過互動式過程收集使用者必要資訊來釐清使用者意圖，這可減少使用者進行語音指令的次數並且提高使用者使用體驗，讓觸發聲控裝置更簡易更便利。With the popularization of voice-controlled smart devices, such as smart phones, smart speakers, in-vehicle artificial intelligence platforms and voice-controlled smart watches, voice-controlled devices can be found everywhere. How can we accurately determine the user's intention from the user's context and process, and There is no need to collect necessary information from the user through an interactive process to clarify the user's intention. This can reduce the number of voice commands required by the user and improve the user experience, making it easier and more convenient to trigger the voice control device.

習知技術有二:There are two common techniques:

一、使用互動式方式反問使用者來收集必要資訊，用以進行意圖釐清。1. Use interactive methods to question users to collect necessary information for clarifying intentions.

二、設定意圖相似度門檻值，高於門檻值就回覆擁有最高相似度的意圖，如否則回應無法理解。2. Set the intent similarity threshold. If the intent is higher than the threshold, reply to the intent with the highest similarity. Otherwise, the response will be incomprehensible.

以上習知技術不僅需互動式方式反問使用者來收集必要資訊，口令指令次數較多，且可能因意圖相似度較低而導致聲控裝置無法回應，使用者使用體驗較差。The above-mentioned conventional technology not only requires interactive questioning of the user to collect necessary information, but also requires a large number of password commands. In addition, the voice control device may not be able to respond due to low intention similarity, resulting in a poor user experience.

本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法，可進行精確意圖識別，並且提高使用者使用體驗，讓觸發聲控裝置更簡易更便利。The present invention provides an intention selection system and method based on context and usage history, which can accurately identify intentions, improve user experience, and make triggering voice control devices easier and more convenient.

本發明的一種基於上下文情境與使用歷程的意圖評選系統，應用於聲控裝置中，聲控裝置接收使用者的語音指令後，將語音指令轉換為文字，意圖評選系統包括收發器、儲存媒體以及處理器。儲存媒體用以儲存多個模組。處理器耦接至儲存媒體以及收發器，處理器存取並執行儲存媒體所儲存的自然語言處理模組以及語意評選模組。自然語言處理模組包括斷詞子模組、詞性標註子模組及命名實體識別子模組。斷詞子模組用以使用斷詞符號對文字進行分詞，以將文字分為多個最小且有詞義的單位。詞性標註子模組與斷詞子模組電性連接，用以對各單位進行詞性標註，以對各單位分別標註詞性。命名實體識別子模組與詞性標註子模組電性連接，用以對經詞性標註後的各單位進行命名實體的識別，以依據命名實體分別對各單位標註命名實體標籤。語意評選模組與自然語言處理模組電性連接，包括意圖資料庫、語意解析子模組以及最佳意圖輸出子模組。意圖資料庫用以儲存意圖資料，意圖資料包括意圖編號、與意圖編號對應的意圖、與意圖編號對應的語意模型及與意圖編號對應的關鍵詞彙。語意解析子模組與意圖資料庫電性連接，用以依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度，並且判斷各餘弦相似度是否大於相似度門檻值。最佳意圖輸出子模組與語意解析子模組電性連接，用以選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。An intention selection system based on context and usage history of the present invention is applied to a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection system includes a transceiver, a storage medium and a processor. . Storage media is used to store multiple modules. The processor is coupled to the storage medium and the transceiver, and the processor accesses and executes the natural language processing module and the semantic selection module stored in the storage medium. The natural language processing module includes word segmentation sub-module, part-of-speech tagging sub-module and named entity recognition sub-module. The word segmentation sub-module is used to segment the text using word segmentation symbols to divide the text into multiple smallest and meaningful units. The part-of-speech tagging sub-module is electrically connected to the word segmentation sub-module, and is used to tag each unit with a part-of-speech, so as to mark each unit with a part-of-speech respectively. The named entity identification sub-module is electrically connected to the part-of-speech tagging sub-module, and is used to identify the named entities of each unit after being tagged by the part-of-speech, and to label each unit with a named entity tag based on the named entity. The semantic selection module is electrically connected to the natural language processing module, including an intent database, a semantic parsing sub-module and an optimal intent output sub-module. The intent database is used to store intent data. The intent data includes intent numbers, intents corresponding to the intent numbers, semantic models corresponding to the intent numbers, and key words corresponding to the intent numbers. The semantic parsing sub-module is electrically connected to the intent database, and is used to obtain each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intent number, and determine whether each cosine similarity is greater than the similarity threshold. value. The best intention output sub-module is electrically connected to the semantic analysis sub-module, and is used to select the intention number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold.

本發明的一種基於上下文情境與使用歷程的意圖評選方法，應用於聲控裝置中，聲控裝置接收使用者的語音指令後，將語音指令轉換為文字，意圖評選方法包括使用斷詞符號對文字進行分詞，以將文字分為多個最小且有詞義的單位；對各單位進行詞性標註，以對各單位分別標註詞性；對經詞性標註後的各單位進行命名實體的識別，以依據命名實體分別對各單位標註命名實體標籤；依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度，並且判斷各餘弦相似度是否大於相似度門檻值；以及選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。An intention selection method based on context and usage history of the present invention is applied to a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection method includes segmenting the text using hyphenation symbols. , in order to divide the text into multiple smallest and meaningful units; perform part-of-speech tagging on each unit to mark the part-of-speech of each unit respectively; identify named entities for each unit after the part-of-speech tagging, so as to identify the named entities respectively. Label each unit with a named entity label; obtain each cosine similarity based on each unit labeled with the named entity label and the semantic model corresponding to the intent number, and determine whether each cosine similarity is greater than the similarity threshold; and select the output greater than the similarity The intention number corresponding to the maximum cosine similarity among the threshold cosine similarities.

基於上述，本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法，不僅可依當下對話的情境與使用者歷程記錄，進行精確意圖識別並無須互動式方法來釐清使用者的意圖，且減少使用者進行語音指令的次數並且提高使用者使用體驗，讓觸發聲控裝置更簡易更便利。Based on the above, the present invention provides an intention selection system and method based on context and usage history. It can not only perform accurate intention recognition based on the current conversation situation and user history records, but also does not require interactive methods to clarify the user's intention. It also reduces the number of voice commands required by the user and improves the user experience, making it easier and more convenient to trigger the voice control device.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and described in detail with reference to the accompanying drawings.

本發明的部份實施例接下來將會配合附圖來詳細描述，以下的描述所引用的元件符號，當不同附圖出現相同的元件符號將視為相同或相似的元件。這些實施例只是本發明的一部份，並未揭示所有本發明的可實施方式。Some embodiments of the present invention will be described in detail with reference to the accompanying drawings. The component symbols cited in the following description will be regarded as the same or similar components when the same component symbols appear in different drawings. These embodiments are only part of the present invention and do not disclose all possible implementations of the present invention.

圖1是依照本發明的一實施例的一種基於上下文情境與使用歷程的意圖評選系統的示意圖。Figure 1 is a schematic diagram of an intention selection system based on context and usage history according to an embodiment of the present invention.

請參照圖1，基於上下文情境與使用歷程的意圖評選系統10應用於聲控裝置（圖中未示）中，聲控裝置接收使用者的語音指令後，將語音指令轉換為文字，意圖評選系統10包括收發器310、儲存媒體320以及處理器330。Please refer to Figure 1. The intention selection system 10 based on context and usage history is applied to a voice control device (not shown in the figure). After the voice control device receives the user's voice command, it converts the voice command into text. The intention selection system 10 includes transceiver 310, storage medium 320 and processor 330.

收發器310以無線或有線的方式傳送及接收訊號。收發器310還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。意圖評選系統10經由收發器310與聲控裝置通訊連接或電性連接。The transceiver 310 transmits and receives signals in a wireless or wired manner. Transceiver 310 may also perform, for example, low noise amplification, impedance matching, mixing, up or down frequency conversion, filtering, amplification, and similar operations. The intention selection system 10 is communicatively or electrically connected to the voice control device via the transceiver 310 .

儲存媒體320用以儲存意圖評選系統10運行時所需的各項軟體、資料及各類程式碼。儲存媒體320例如是任何型態的固定式或可移動式的隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟（hard disk drive，HDD）、固態硬碟（solid state drive，SSD）或類似元件或上述元件的組合，而用於儲存可由處理器330執行的多個模組或各種應用程式。在一實施例中，儲存媒體320可儲存自然語言處理模組100以及語意評選模組200。The storage medium 320 is used to store various software, data and various program codes required when the intention selection system 10 is run. The storage medium 320 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), or flash memory (flash memory). , hard disk drive (HDD), solid state drive (SSD) or similar components or a combination of the above components, used to store multiple modules or various application programs that can be executed by the processor 330 . In one embodiment, the storage medium 320 can store the natural language processing module 100 and the semantic selection module 200.

處理器330例如是中央處理單元（central processing unit，CPU），或是其他可程式化之一般用途或特殊用途的微控制單元（micro control unit，MCU）、微處理器（microprocessor）、數位信號處理器（digital signal processor，DSP）、可程式化控制器、特殊應用積體電路（application specific integrated circuit，ASIC）、圖形處理器（graphics processing unit，GPU）、影像訊號處理器（image signal processor，ISP）、影像處理單元（image processing unit，IPU）、算數邏輯單元（arithmetic logic unit，ALU）、複雜可程式邏輯裝置（complex programmable logic device，CPLD）、現場可程式化邏輯閘陣列（field programmable gate array，FPGA）或其他類似元件或上述元件的組合。處理器330可耦接至儲存媒體320以及收發器310，並且存取和執行儲存於儲存媒體320中的各種應用程式以及儲存媒體320所儲存的自然語言處理模組100以及語意評選模組200。其中，自然語言處理模組100包括斷詞子模組110、詞性標註子模組120及命名實體識別子模組130。語意評選模組200與自然語言處理模組100電性連接，包括意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230。The processor 330 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (MCU), microprocessor, digital signal processing Digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processing unit (GPU), image signal processor (ISP) ), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (field programmable gate array) , FPGA) or other similar components or a combination of the above components. The processor 330 can be coupled to the storage medium 320 and the transceiver 310, and access and execute various applications stored in the storage medium 320 as well as the natural language processing module 100 and the semantic selection module 200 stored in the storage medium 320. Among them, the natural language processing module 100 includes a word segmentation sub-module 110, a part-of-speech tagging sub-module 120 and a named entity recognition sub-module 130. The semantic selection module 200 is electrically connected to the natural language processing module 100 and includes an intent database 250, a semantic parsing sub-module 240, a best intent output sub-module 260, a context parsing sub-module 210, and a personal usage history sub-module. group 220 and user process sub-module 230.

其中，斷詞子模組110用以使用斷詞符號對文字進行分詞，以將文字分為多個最小且有詞義的單位。斷詞符號可用逗號表示。Among them, the word segmentation sub-module 110 is used to segment the text using word segmentation symbols to divide the text into multiple smallest and meaningful units. The word breaking symbol can be represented by a comma.

詞性標註子模組120與斷詞子模組110電性連接，用以對各單位進行名詞、動詞、專有名詞等的詞性標註，以對各單位分別標註詞性。於本實施例中，詞性定義使用Chinese Treebank 3.0，即美國賓州大學所定義的中文詞性。舉例而言，PN為代名詞，VV為動詞，NN為一般名詞。The part-of-speech tagging sub-module 120 is electrically connected to the word segmentation sub-module 110, and is used to tag each unit with part-of-speech tags such as nouns, verbs, proper nouns, etc., and to tag each unit with part-of-speech respectively. In this embodiment, Chinese Treebank 3.0 is used to define part of speech, which is the Chinese part of speech defined by the University of Pennsylvania. For example, PN is a pronoun, VV is a verb, and NN is a general noun.

命名實體識別子模組130與詞性標註子模組120電性連接，用以對經詞性標註後的各單位進行命名實體的識別，以依據命名實體分別對各單位標註命名實體標籤。在一實施例中，命名實體識別子模組130可通過機器學習訓練來擴增命名實體，其中命名實體可包括人、事、時、地及物。本發明並不以此為限。The named entity identification sub-module 130 is electrically connected to the part-of-speech tagging sub-module 120, and is used to identify the named entities of each unit after the part-of-speech tagging, and to label each unit with a named entity tag according to the named entity. In one embodiment, the named entity recognition sub-module 130 can augment named entities through machine learning training, where named entities can include people, things, times, places, and things. The present invention is not limited thereto.

意圖資料庫250用以儲存意圖資料，意圖資料包括意圖編號、與意圖編號對應的意圖、與意圖編號對應的語意模型及與意圖編號對應的關鍵詞彙。每個意圖均代表著特定的行為，意圖的輸出可使意圖評選系統10依據意圖進行相對應的行為。The intent database 250 is used to store intent data, which includes intent numbers, intents corresponding to the intent numbers, semantic models corresponding to the intent numbers, and key words corresponding to the intent numbers. Each intention represents a specific behavior, and the output of the intention can enable the intention selection system 10 to perform corresponding behavior according to the intention.

語意解析子模組240與意圖資料庫250電性連接，用以依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型及公式（2）獲取各餘弦相似度，並且判斷各餘弦相似度是否大於相似度門檻值。其中公式（2）為：公式（2） The semantic parsing sub-module 240 is electrically connected to the intent database 250, and is used to obtain the similarity of each cosine based on each unit marked with the named entity tag and the semantic model corresponding to the intent number and formula (2), and determine each cosine. Whether the similarity is greater than the similarity threshold. Among them, formula (2) is: Formula (2)

其中，u為使用者的語音指令，為依據意圖資料庫的語意模型將使用者的語音指令轉成詞頻（term frequency，TF）的向量，為u向量的長度，為意圖k，為意圖k語意模型依據該意圖資料庫的該語意模型轉成TF的向量，為向量的長度，為該向量與該向量的內積。 Among them, u is the user’s voice command, In order to convert the user's voice command into a vector of term frequency (TF) based on the semantic model of the intent database, is the length of u vector, For intention k, is the vector that the semantic model of intention k is converted into TF based on the semantic model of the intention database, for the length of the vector, for this vector with the Inner product of vectors.

最佳意圖輸出子模組260與語意解析子模組240電性連接，用以選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖，意圖的輸出可使聲控裝置依據意圖進行相對應的行為，以符合使用者的語音指令。The best intention output sub-module 260 is electrically connected to the semantic parsing sub-module 240, and is used to select the intention that outputs the intention number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold. The output of the intention can make The voice control device performs corresponding actions according to the intention to comply with the user's voice instructions.

上下文解析子模組210與語意解析子模組240電性連接，用以於各餘弦相似度均非大於相似度門檻值時，依據word2vec模型及關鍵詞彙產生相似詞彙，依據公式1、關鍵詞彙、相似詞彙及語音指令的前N筆對話的上下文獲取對應於意圖編號的各意圖權重，判斷各意圖權重是否大於權重門檻值，以輸出大於權重門檻值的意圖權重對應的意圖編號的意圖，其中公式1為：公式（1） The context analysis sub-module 210 is electrically connected to the semantic analysis sub-module 240, and is used to generate similar words based on the word2vec model and key words when each cosine similarity is not greater than the similarity threshold. According to Formula 1, key words, The context of the first N conversations with similar words and voice commands obtains the intention weight corresponding to the intention number, determines whether the weight of each intention is greater than the weight threshold, and outputs the intention number corresponding to the intention weight greater than the weight threshold, where the formula 1 is: Formula 1)

其中為語音指令隸屬於intent-k的分數，為代表隸屬於k語意的關鍵詞彙，為由k語意的關鍵詞彙透過word2vec模型找到的相似詞彙，為語音指令的前一句話。 in is the score of the voice command belonging to intent-k, represents the key words belonging to the semantic meaning of k, It is the similar words found through the word2vec model from the key words of k semantics. is the previous sentence of the voice command.

使用者歷程子模組230與語意解析子模組240電性連接，用以儲存使用者歷程記錄。在一實施例中，語意解析子模組240依據使用者的個人資訊對使用者歷程記錄進行意圖分群，以獲取意圖分群編號、與意圖分群編號對應的分群意圖及所有使用者平均頻率，並且依據相似詞彙、關鍵詞彙、所有使用者平均頻率及對應意圖編號的意圖輸出對應於意圖分群編號的分群意圖。The user history sub-module 230 is electrically connected to the semantic parsing sub-module 240 for storing user history records. In one embodiment, the semantic parsing sub-module 240 performs intent grouping on user history records based on the user's personal information to obtain the intent grouping number, the grouping intent corresponding to the intent grouping number, and the average frequency of all users, and based on Similar words, key words, the average frequency of all users, and the intent output corresponding to the intent number correspond to the grouping intent of the intent group number.

使用者歷程子模組230可用於解決因個人使用歷程記錄缺乏相關資訊，使用者歷程子模組230儲存的使用者歷程記錄指的是使用聲控裝置的所有使用者的使用者歷程記錄，會依使用者的個人資訊例如性別、年齡、職業、使用時間、使用頻率與使用日期進行意圖分群。舉例來說:當使用者A發出語音指令為“我要聽音樂”，然而使用者A並無相關的個人使用歷程記錄，這時候使用者歷程子模組230會依使用者A的個人資料找出所有使用者中適合的意圖分群，再依上下文解析子模組210的關鍵詞彙，找出”我要聽音樂”最常使用的歌手與歌曲，然後播放給使用者A聆聽。如無上下文解析子模組210的關鍵詞彙，則依意圖分群中所有使用者平均頻率排序最高者對應的分群意圖進行播放。The user history sub-module 230 can be used to solve the problem of lack of relevant information in personal usage history records. The user history records stored in the user history sub-module 230 refer to the user history records of all users who use the voice-controlled device, and will be processed accordingly. Users' personal information such as gender, age, occupation, usage time, usage frequency and usage date are used for purpose grouping. For example: when user A issues a voice command of "I want to listen to music", but user A does not have relevant personal usage history records, then the user history sub-module 230 will find the user's user history based on user A's personal information. Suitable intention groups are found among all users, and then the key words of the sub-module 210 are analyzed according to the context, and the most commonly used singers and songs in "I want to listen to music" are found, and then played for user A to listen. If there is no key word of the context analysis sub-module 210, then the group intention corresponding to the highest average frequency ranking of all users in the intention group will be played.

個人使用歷程子模組220與語意解析子模組240電性連接，用以儲存使用者的個人使用歷程記錄，其中個人使用歷程記錄包括時間、意圖編號、使用的關鍵詞彙及頻率。在一實施例中，於各餘弦相似度均非大於相似度門檻值時，語意解析子模組240將經命名實體標籤標註後的該些單位與個人使用歷程記錄中的使用的關鍵詞彙進行比對，依據頻率獲取並且輸出對應於語音指令的時間的對應於意圖編號的意圖。The personal usage history sub-module 220 is electrically connected to the semantic parsing sub-module 240 to store the user's personal usage history record, where the personal usage history record includes time, intention number, key words used and frequency. In one embodiment, when each cosine similarity is not greater than the similarity threshold, the semantic parsing sub-module 240 compares the units marked with named entity tags with the key words used in the personal usage history record. Yes, the intention corresponding to the intention number corresponding to the time of the voice command is acquired and output based on the frequency.

本發明可針對資訊不足的語音指令，不需要與使用者多次互動而精確判斷出使用者的意圖。個人使用歷程子模組220可以依據個人使用經驗（個人使用歷程記錄），從中自動取得當下不足的資訊。The present invention can address voice commands with insufficient information and accurately determine the user's intention without requiring multiple interactions with the user. The personal usage history sub-module 220 can automatically obtain insufficient current information based on personal usage experience (personal usage history records).

舉例來說:我要聽音樂這個指令，上下文解析子模組210判斷為播放音樂的意圖，然其中並沒有指明要播放誰的歌曲，這時候就會參考個人使用歷程記錄，使用者可能前陣子常聽張惠妹的聽海，這時候我要聽音樂這個指令的意圖就會納入張惠妹的聽海這一個人使用歷程記錄，進而播放張惠妹的聽海給使用者聽。此外，個人使用歷程記錄會依早班(08:01-16:00)、小夜(16:01-24:00)、大夜(00:01-08:00)這三個時間區間進行意圖分群，相同意圖下其關鍵詞彙會不同，用來輔助語意理解的會是最高頻率的關鍵詞彙。For example: the command "I want to listen to music", the context analysis sub-module 210 determines that it is the intention to play music, but it does not specify whose song is to be played. At this time, it will refer to the personal usage history record. The user may have played it a while ago. If you often listen to Zhang Huimei's Tinghai, then the intention of the command "I want to listen to music" will be included in the personal usage history record of Zhang Huimei's Tinghai, and then Zhang Huimei's Tinghai will be played to the user. In addition, personal usage history records will be divided into intention groups based on three time intervals: morning shift (08:01-16:00), midnight (16:01-24:00), and late night (00:01-08:00). , the key words will be different under the same intention, and the key words with the highest frequency will be used to assist semantic understanding.

自然語言處理模組100、語意評選模組200、斷詞子模組110、詞性標註子模組120、命名實體識別子模組130、意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230可透過軟體、韌體、硬體電路的其中之一或其任意組合來實作，且本揭露不對自然語言處理模組100、語意評選模組200、斷詞子模組110、詞性標註子模組120、命名實體識別子模組130、意圖資料庫250、語意解析子模組240、最佳意圖輸出子模組260、上下文解析子模組210、個人使用歷程子模組220以及使用者歷程子模組230的實作方式作出限制。Natural language processing module 100, semantic selection module 200, word segmentation sub-module 110, part-of-speech tagging sub-module 120, named entity recognition sub-module 130, intent database 250, semantic analysis sub-module 240, best intent output The sub-module 260, the context analysis sub-module 210, the personal usage history sub-module 220 and the user history sub-module 230 can be implemented through one of software, firmware, hardware circuits or any combination thereof, and This disclosure does not include the natural language processing module 100, the semantic selection module 200, the word segmentation sub-module 110, the part-of-speech tagging sub-module 120, the named entity recognition sub-module 130, the intent database 250, the semantic parsing sub-module 240, and finally The implementation methods of the good intention output sub-module 260, the context analysis sub-module 210, the personal usage history sub-module 220 and the user history sub-module 230 are restricted.

以下結合幾個實施例來說明基於上下文情境與使用歷程的意圖評選系統10如何應用於聲控裝置。The following describes how the intention selection system 10 based on context and usage history is applied to voice-controlled devices in conjunction with several embodiments.

於一第一實施例中，利用語意解析子模組240與意圖資料庫250即可匹配出使用者的精確意圖。In a first embodiment, the user's precise intention can be matched using the semantic parsing sub-module 240 and the intention database 250 .

以使用者為住在新竹的A先生為例，A先生每天都會由新竹通勤到台北內湖上班，A先生在上班或下班途中都會使用車輛上的聲控裝置聽音樂。A先生對聲控裝置發出的語音指令是[我要聽五月天的乾杯]。該語音指令經由聲控裝置轉換為文字後輸入文字到意圖評選系統10，斷詞子模組110對文字使用斷詞符號例如逗號對文字進行分詞，以將文字分為多個最小且有詞義的單位，經斷詞後得到各最小且有詞義的單位為:[我,要,聽,五月天,的,乾杯]。之後將各單位傳送至詞性標註子模組120，進行詞性標註，得到的結果為[我_PN,要_VV,聽_VV,五月天_NR,的_DER,乾杯_NR]，其中斷詞符號可用逗號表示，PN為代名詞，VV為動詞，NN為一般名詞。最後再將經詞性標注的結果傳送至命名實體識別子模組130，進行命名實體的識別，得到的結果為[我_PN_o,要_VV_o,聽_VV_o,五月天_NR_singer/song,的_DER_o,乾杯_NR_song]，其中o代表沒有特定的人事時地物等命名實體標籤，story代表故事這個詞本身是一則故事、song代表歌曲，也代表在這個實施例中，故事它也是一首歌，如此，經過自然語言處理模組100的上述解析後，將經命名實體識別的結果傳給語意解析子模組240，語意解析子模組240將該結果與意圖資料庫250進行餘弦相似度計算，高於相似度門檻值的最高餘弦相似度對應的意圖編號的意圖就會作為最佳意圖由最佳意圖輸出子模組260輸出。意圖編號意圖語意模型 I ₁ 播放音樂播放_VV_o, 張惠妹_NR_singer, 張學友_NR_singer, 音樂_NN_o, 五月天_NR_singer, 五月天_NR_song 聽海_NR_song, 情書_NR_song, 故事_NR_song,乾杯_NR_song I ₂ 播放故事聽_VV_o,播放_VV_o,故事_NN_o,故事_NR_story, 三隻小豬_NR_story,醜小鴨_NR_story I ₃ 播放廣播聽_VV_o,播放_VV_o,廣播_NN_o,中央警廣_NR_channel I ₄ 播放行事曆有_VV_o,播放_VV_o,行事曆_NN_o,行程_NN_o … … … I _M 播放成語聽_VV_o,播放_VV_o,蒸蒸日上_NR_idiom,成語_NN_o 表1 For example, the user is Mr. A who lives in Hsinchu. Mr. A commutes from Hsinchu to Neihu, Taipei for work every day. Mr. A uses the voice control device on the vehicle to listen to music on the way to and from get off work. The voice command given by Mr. A to the voice control device is [I want to listen to Mayday’s toast]. The voice command is converted into text through the voice control device and then the text is input into the intention selection system 10. The word segmentation sub-module 110 uses word segmentation symbols such as commas to segment the text to divide the text into multiple smallest and meaningful units. , after word segmentation, the smallest and meaningful units are obtained: [I, want, listen, Mayday, cheers]. Then each unit is sent to the part-of-speech tagging sub-module 120 for part-of-speech tagging, and the result obtained is [我_PN, want_VV, listen_VV, Mayday_NR, the_DER, cheers_NR], in which word segmentation Symbols can be represented by commas, PN is a pronoun, VV is a verb, and NN is a general noun. Finally, the result of the part-of-speech tagging is sent to the named entity recognition sub-module 130 to perform the recognition of the named entity. The result obtained is [我_PN_o,要_VV_o, listen_VV_o,Mayday_NR_singer/song,_DER_o, Cheers_NR_song], where o represents no specific named entity tag such as people, events, places, etc., story represents the word story itself, which is a story, and song represents a song, which also means that in this embodiment, the story is also a song, so , after the above analysis by the natural language processing module 100, the result of the named entity recognition is passed to the semantic analysis sub-module 240. The semantic analysis sub-module 240 performs cosine similarity calculation on the result and the intention database 250. High The intention number corresponding to the highest cosine similarity of the similarity threshold will be output by the best intention output sub-module 260 as the best intention. Intent number intention semantic model I ₁ play music Play_VV_o, Zhang Huimei_NR_singer, Jacky Cheung_NR_singer, music_NN_o, Mayday_NR_singer, Mayday_NR_song Tinghai_NR_song, Love Letter_NR_song, Story_NR_song, Cheers_NR_song I ₂ Play story listen_VV_o,play_VV_o,story_NN_o,story_NR_story, The Three Little Pigs_NR_story,The Ugly Duckling_NR_story I ₃ play radio Listen_VV_o, Play_VV_o, Broadcast_NN_o, Central Police Broadcasting_NR_channel I ₄ Play calendar Have_VV_o, play_VV_o, calendar_NN_o, itinerary_NN_o … … … I _M Play idioms listen_VV_o, play_VV_o, flourish_NR_idiom, idiom_NN_o Table 1

如表1所示，意圖共有M個（意圖編號分別為I ₁,I ₂,I ₃,I ₄…I _M），依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型及公式（2）獲取各餘弦相似度的方式如下: As shown in Table 1, there are a total of M intentions (the intention numbers are I ₁ , I ₂ , I ₃ , I ₄ ...I _M respectively). According to each unit marked with the named entity label and the semantic model corresponding to the intention number and The way to obtain each cosine similarity using formula (2) is as follows:

首先要先建立整體的詞彙集合，找出統一的向量維度，依表1建立出的詞彙集合（各意圖編號對應的語意模型）如下:First of all, we must first establish the overall vocabulary set and find the unified vector dimension. The vocabulary set established according to Table 1 (the semantic model corresponding to each intention number) is as follows:

corpus = ['播放_VV_o', '張惠妹_NR_singer', '張學友_NR_singer', '音樂_NN_o', '五月天_NR_singer', '五月天_NR_song','聽海_NR_song', '情書_NR_song', '故事_NR_song','乾杯_NR_song','聽_VV_o','播放_VV_o','故事_NN_o','故事_NR_story','聽_VV_o','播放_VV_o','廣播_NN_o','中央警廣_NR_channel','有_VV_o','播放_VV_o','行事曆_NN_o','行程_NN_o','聽_VV_o','播放_VV_o','蒸蒸日上_NR_idiom','成語_NN_o','三隻小豬_NR_story','醜小鴨_NR_story']corpus = ['Play_VV_o', 'Zhang Huimei_NR_singer', 'Jacky Cheung_NR_singer', 'Music_NN_o', 'Mayday_NR_singer', 'Mayday_NR_song', 'Tinghai_NR_song', 'Love Letter_NR_song ', 'Story_NR_song','Cheers_NR_song','Listen_VV_o','Play_VV_o','Story_NN_o','Story_NR_story','Listen_VV_o','Play_VV_o', 'Broadcast_NN_o','Central Police Broadcasting_NR_channel','Have_VV_o','Play_VV_o','Calendar_NN_o','Itinerary_NN_o','Listen_VV_o','Play_VV_o' ,'Zhizhishang_NR_idiom','idiom_NN_o','Three Little Pigs_NR_story','Ugly Duckling_NR_story']

而後，將A先生的指令透過corpus建立成向量u，向量的內容是指令於corpus出現的次數:Then, Mr. A’s instruction is created into a vector u through corpus. The content of the vector is the number of times the instruction appears in corpus:

u= [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] u = [0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

之後再分別將I ₁,I ₂,…,I _M依上述方式建立向量，分別如下: Then I ₁ , I ₂ ,..., I _M are then used to create vectors in the above manner, as follows:

I ₁= [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] I ₁ = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

I ₂= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1] I ₂ = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]

I ₃= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0] I ₃ = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]

I ₄= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0] I ₄ = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0]

I _M= [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] I _M = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]

使用餘弦相似度公式（2）計算 u與 I ₁的相似度： Use the cosine similarity formula (2) to calculate the similarity between u and I ₁ :

cos( u, I ₁)=(0*1+0*1+0*1+0*1+1*1+1*1+0*1+0*1+0*1+1*1+1*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0)/(4^(0.5))*(10^(0.5)) =3/(2*(10^(0.5)) =0.4743 cos( u , I ₁ )=(0*1+0*1+0*1+0*1+1*1+1*1+0*1+0*1+0*1+1*1+1 *0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0+0*0)/(4 ^(0.5))*(10^(0.5)) =3/(2*(10^(0.5)) =0.4743

以此類推，計算I ₂,I ₃, I ₄,I _M的相似度分別為:0.20,0.25,0,0.25，於本實施例中，相似度門檻值設定為0.45，A先生的指令與I ₁的餘弦相似度最高，最大餘弦相似度為0.4743且高於相似度門檻值，因此選擇輸出最大餘弦相似度對應的意圖編號I ₁的意圖，故直接觸發播放五月天的乾杯這首音樂。 By analogy, the calculated similarities of I ₂ , I ₃ , I ₄ and I _M are respectively: 0.20, 0.25, 0, 0.25. In this embodiment, the similarity threshold is set to 0.45. Mr. A’s instruction and I ₁ has the highest cosine similarity, and the maximum cosine similarity is 0.4743 and is higher than the similarity threshold. Therefore, the intention number I ₁ corresponding to the maximum cosine similarity is selected to output, so the music of Mayday's Cheers is directly triggered to be played.

於一第二實施例中，在語意解析子模組240無法解析下，透過上下文解析子模組210與使用者歷程子模組230匹配出使用者A先生的精確意圖。In a second embodiment, when the semantic parsing sub-module 240 cannot parse, the precise intention of the user Mr. A is matched through the context parsing sub-module 210 and the user process sub-module 230.

當A先生第一次使用智慧聲控裝置，發出語音指令[我要聽音樂]，該語音指令經由聲控裝置轉換為文字後輸入文字到意圖評選系統10，斷詞子模組110對文字進行斷詞，經斷詞後得到各最小且有詞義的單位為[我,要,聽,音樂]，之後各單位經詞性標註子模組120進行詞性標註，得到的結果為[我_PN,要_VV,聽_VV,音樂_NN]，接著再經命名實體識別子模組130進行命名實體識別，得到的結果為[我_PN_o,要_VV_o,聽_VV_o,音樂_NN_o]，如此，經過自然語言處理模組100的上述解析後，語意解析子模組240再將經命名實體識別的結果與意圖資料庫250進行餘弦相似度運算，結合表1及公式（2），對應於意圖編號I1,I2,I3,I4,IM的各餘弦相似度分別為0.2236,0.2887,0.3536,0,0.3536，均沒有超過設定的相似度門檻值0.45。When Mr. A uses the smart voice control device for the first time and issues a voice command [I want to listen to music], the voice command is converted into text through the voice control device and then input into the text into the intention selection system 10. The word segmentation sub-module 110 segments the text. , after word segmentation, the smallest and meaningful units are obtained as [I, want, listen, music], and then each unit is tagged by the part-of-speech tagging sub-module 120, and the result obtained is [I_PN, want_VV , listen_VV, music_NN], and then perform named entity recognition through the named entity recognition sub-module 130, and the result obtained is [I_PN_o, want_VV_o, listen_VV_o, music_NN_o]. In this way, after natural After the above analysis by the language processing module 100, the semantic analysis sub-module 240 then performs a cosine similarity operation on the named entity recognition result and the intention database 250. Combining Table 1 and formula (2), it corresponds to the intention number I1, The cosine similarities of I2, I3, I4, and IM are 0.2236, 0.2887, 0.3536, 0, and 0.3536 respectively, none of which exceeds the set similarity threshold of 0.45.

這時候觸發上下文解析子模組210，依據表2中意圖資料庫250的關鍵詞彙，將關鍵詞彙輸入至由維基百科訓練出來的word2vec模型產生對應的相似詞彙，相似詞彙如表3。意圖編號意圖關鍵詞彙 I ₁ 播放音樂張惠妹, 張學友, 音樂, 五月天, 聽海, 情書, 故事,乾杯 I ₂ 播放故事故事, 三隻小豬, 醜小鴨 I ₃ 播放廣播廣播,中央警廣 I ₄ 播放行事曆行事曆,行程 … … … I _M 播放成語蒸蒸日上,成語表2 At this time, the context parsing sub-module 210 is triggered. According to the key words of the intent database 250 in Table 2, the key words are input into the word2vec model trained by Wikipedia to generate corresponding similar words. The similar words are as shown in Table 3. Intent number intention Key words I ₁ play music Zhang Huimei, Jacky Cheung, music, Mayday, listening to the sea, love letters, stories, cheers I ₂ Play story story, the three little pigs, the ugly duckling I ₃ play radio radio, central police broadcasting I ₄ Play calendar calendar, itinerary … … … I _M Play idioms Thriving day by day, idiom Table 2

由word2vec模型產生的相似詞彙意圖編號意圖相似詞彙(關鍵詞彙) I ₁ 播放音樂蕭亞軒(張惠妹), 張國榮(張學友), 搖滾樂(音樂), 孫燕姿(五月天), 中奇克(聽海), 告白(情書), 劇情(故事), 心虛(乾杯) I ₂ 播放故事劇情(故事), N.A.(三隻小豬), 親親(醜小鴨) I ₃ 播放廣播播送(廣播),N.A.(中央警廣) I ₄ 播放行事曆 N.A.(行事曆), 個小時(行程) … … … I _M 播放成語使經濟(蒸蒸日上), 典故(成語) 表3 Similar words generated by word2vec model Intent number intention Similar words (key words) I ₁ play music Elva Hsiao (Zhang Huimei), Leslie Cheung (Jacky Cheung), Rock (Music), Stefanie Sun (Mayday), Zhongqi Ke (Listening to the Sea), Confession (Love Letter), Drama (Story), Guilty (Cheers) I ₂ Play story Drama (Story), NA (The Three Little Pigs), Kiss (The Ugly Duckling) I ₃ play radio Broadcasting (broadcasting), NA (central police broadcasting) I ₄ Play calendar NA (calendar), hours (trip) … … … I _M Play idioms make the economy (prosperous), allusion (idiom) table 3

在一實施例中，表3中意圖編號I ₃的相似詞彙有個N.A.代表無法透過word2vec模型取得對應的相似詞彙。舉例而言，A先生在下達[我要聽音樂]這一語音指令之前，先與聲控裝置講了非具有特定意圖的對話內容，如表4中A先生上下文資訊: 對話時間上下文資訊意圖 2023/5/19 11:50 最近在回憶蕭亞軒與孫燕姿同期的天后有誰無表4 In one embodiment, if the similar words with intent number I ₃ in Table 3 have NA, it means that the corresponding similar words cannot be obtained through the word2vec model. For example, before Mr. A gave the voice command [I want to listen to music], he first spoke to the voice control device with non-specific dialogue content, such as the context information of Mr. A in Table 4: Conversation time contextual information intention 2023/5/19 11:50 Recently, I am recalling who are the queens of the same period as Elva Hsiao and Stefanie Sun? without Table 4

其中上下文解析子模組210會使用表2中的關鍵詞彙與表3中的相似詞彙以及公式(1)計算對應於意圖編號的各意圖的意圖權重。舉例而言，I ₁的意圖權重計算如下: The context parsing sub-module 210 will use the key words in Table 2, similar words in Table 3 and formula (1) to calculate the intention weight of each intention corresponding to the intention number. For example, the intent weight of I ₁ is calculated as follows:

依公式(1)，依表4中的上下文資訊:最近在回憶蕭亞軒與孫燕姿同期的天后有誰，其中符合I ₁的關鍵詞彙與相似詞彙有兩個詞彙，分別是蕭亞軒與孫燕姿。由於上下文資訊只有一句，所以公式(1)的值為: Score _intent1= 2*(1/1) According to formula (1), according to the contextual information in Table 4: Who are the queens who were recently recalled in the same period as Elva Hsiao and Stefanie Sun? Among them, there are two key words and similar words that match I ₁ , namely Elva Hsiao and Stefanie Sun. Since there is only one sentence of contextual information, the value of formula (1) is: Score _intent1 = 2*(1/1)

如此，計算出對應於意圖編號I ₁的意圖權重為2。以此類推，計算對應於意圖編號I ₂,I ₃, I ₄,I _M的意圖權重分別為0,0,0,0。 In this way, the intention weight corresponding to the intention number I ₁ is calculated to be 2. By analogy, the intention weights corresponding to the intention numbers I ₂ , I ₃ , I ₄ and I _M are calculated as 0, 0, 0, 0 respectively.

由於權重門檻值設定為1.8，只要大於權重門檻值就會觸發對應於意圖編號的意圖，於本實施例中，輸出意圖編號I ₁的意圖，即播放音樂。 Since the weight threshold is set to 1.8, as long as it is greater than the weight threshold, the intention corresponding to the intention number will be triggered. In this embodiment, the intention with intention number I ₁ is output, that is, playing music.

由於A先生是第一次使用智慧聲控裝置，故個人使用歷程子模組220中並無資料（個人使用歷程記錄）可參考。Since Mr. A is using a smart voice control device for the first time, there is no data (personal usage history record) in the personal usage history sub-module 220 for reference.

使用者歷程子模組230儲存智慧聲控裝置所有的使用者歷程記錄，依個人資訊例如性別、年齡、職業、使用時間、使用頻率與使用日期進行意圖分群。於本實施例中，A先生是上班族，性別男、年齡30歲、職業為服務業，使用時間為2023/5/19/12:10，是周五的中午時段。使用者歷程子模組230依據A先生的個人資訊得到的意圖分群為G ₁I ₁如表5（依個資的意圖分群表）: 意圖分群編號分群意圖（播放的歌手與歌曲）所有使用者平均頻率 G ₁I ₁ 五月天的乾杯 31 G ₁I ₁ 張惠妹的聽海 10 G ₁I ₁ 張學友的情書 9 表5 The user history sub-module 230 stores all user history records of the smart voice control device, and performs intent grouping based on personal information such as gender, age, occupation, usage time, usage frequency and usage date. In this example, Mr. A is an office worker, male, 30 years old, and works in the service industry. The usage time is 2023/5/19/12:10, which is the noon time period on Friday. The intention grouping obtained by the user process sub-module 230 based on Mr. A's personal information is G ₁ I ₁ as shown in Table 5 (intention grouping table based on personal information): Intent group number Group intent (singer and song played) Average frequency of all users G ₁ I ₁ Cheers to Mayday 31 G ₁ I ₁ Zhang Huimei's Tinghai 10 G ₁ I ₁ Jacky Cheung's love letter 9 table 5

藉由上述，上下文解析子模組210決定了對應意圖編號I ₁的意圖（播放音樂），這時候透過個人使用歷程子模組220與使用者歷程子模組230與其相似詞彙對應回來的關鍵詞彙為張惠妹與張學友，再依所有使用者平均頻率過濾，決定輸出所有使用者平均頻率中最高頻率對應的意圖分群編號為G1I1的分群意圖，即播放張惠妹的聽海。換言之，依據上下文解析子模組210，其關鍵詞彙與相似詞彙有找到張惠妹與張學友的歌曲，依上下文解析子模組210決定撥放兩者最高頻率的張惠妹的聽海。 Through the above, the context analysis sub-module 210 determines the intention (playing music) corresponding to the intention number I _1. At this time, the key words corresponding to similar words are returned through the personal usage history sub-module 220 and the user history sub-module 230. For Zhang Huimei and Jacky Cheung, filter based on the average frequency of all users, and decide to output the group intention corresponding to the highest frequency among the average frequencies of all users, with the group number G1I1, which is to play Zhang Huimei's Tinghai. In other words, according to the context analysis sub-module 210, the key words and similar words are found in the songs of Zhang Huimei and Jacky Cheung, and the context analysis sub-module 210 decides to play the song of Zhang Huimei with the highest frequency of the two.

於一第三實施例中，在語意解析子模組240無法解析下，且無上下文的情況時，單純透過個人使用歷程子模組220匹配出使用者的精確意圖。In a third embodiment, when the semantic parsing sub-module 240 cannot parse and there is no context, the user's precise intention is simply matched through the personal usage history sub-module 220 .

於本實施例中，B先生於當天早班時間(08:01-16:00)使用聲控裝置時，只講了[故事]這個指令，透過自然語言處理模組100的解析處理（請參考前述斷詞、詞性標註、命名實體的識別）後，得到[故事_NN_o, 故事_NR_story,故事_NR_song]。語意解析子模組240與意圖資料庫250進行餘弦相似度計算後，分別對於[故事_NN_o,故事_NR_story,故事_NR_song]分別計算出的餘弦相似度如表6（故事對於意圖資料庫250的餘弦相似度）: 意圖編號故事_NN_o 故事_NR_story 故事_NR_song I ₁ 0 0 0.32 I ₂ 0.41 0.41 0 I ₃ 0 0 0 I ₄ 0 0 0 I _M 0 0 0 表6 In this embodiment, when Mr. B used the voice control device during the morning shift (08:01-16:00), he only told the command "Story". Through the parsing processing of the natural language processing module 100 (please refer to the above After word segmentation, part-of-speech tagging, and recognition of named entities), [story_NN_o, story_NR_story, story_NR_song] is obtained. After the semantic parsing sub-module 240 and the intention database 250 perform cosine similarity calculations, the cosine similarities calculated for [story_NN_o, story_NR_story, story_NR_song] are as shown in Table 6 (story for the intention database 250 cosine similarity): Intent number Story_NN_o Story_NR_story Story_NR_song I ₁ 0 0 0.32 I ₂ 0.41 0.41 0 I ₃ 0 0 0 I ₄ 0 0 0 I _M 0 0 0 Table 6

由表6可看出，對應於意圖編號I ₁,I ₂,I ₃,I ₄,I _M的餘弦相似度均未超過設定的相似度門檻值0.45，這時候會觸發上下文解析子模組210，由於B先生當日並無上下文資訊可參考，故上下文解析子模組210並無作用。接下來參考個人使用歷程子模組220中儲存的個人使用歷程記錄，因B先生之前使用過該裝置，其個人使用歷程子模組紀錄如表7（B先生個人使用歷程）: 時間意圖編號使用的關鍵詞彙頻率早班 I ₂ 故事，三隻小豬 49 早班 I ₁ 張學友、音樂 10 早班 I ₁ 張惠妹、音樂 5 早班 I ₄ 成語 3 表7 It can be seen from Table 6 that the cosine similarities corresponding to the intention numbers I ₁ , I ₂ , I ₃ , I ₄ and I _M do not exceed the set similarity threshold of 0.45. At this time, the context analysis sub-module 210 will be triggered. , since Mr. B has no context information to refer to on that day, the context analysis sub-module 210 has no effect. Next, refer to the personal usage history record stored in the personal usage history sub-module 220. Since Mr. B has used the device before, his personal usage history sub-module record is as shown in Table 7 (Mr. B’s personal usage history): time Intent number Key words used Frequency morning shift I ₂ Story, The Three Little Pigs 49 morning shift I ₁ Jacky Cheung, music 10 morning shift I ₁ Zhang Huimei, music 5 morning shift I ₄ idiom 3 Table 7

接著，個人使用歷程子模組220會將語音指令中的各單位與個人使用歷程子模組220中的使用的關鍵詞彙做比對，依據頻率取出對應於語音指令的時間對應於意圖編號的意圖（包括使用的關鍵詞彙），如果有相同時間與相同關鍵詞彙的情況，則會隨機挑選一筆。最後，B先生的語音指令發出的時間是早班時間，故挑選表7中使用的關鍵詞彙包含故事且為最高頻率的意圖I ₂，其使用的關鍵詞彙為故事與三隻小豬這個意圖進行播放。 Next, the personal usage history sub-module 220 will compare each unit in the voice command with the key words used in the personal usage history sub-module 220, and extract the intention corresponding to the intention number corresponding to the time of the voice command according to the frequency. (Including the key words used), if there are the same time and the same key words, one will be randomly selected. Finally, Mr. B’s voice command was issued during morning shift time, so the key words used in Table 7 were selected to include stories and be the highest frequency intention I ₂ , and the key words used were story and the intention of the three little pigs. Play.

下文中，將搭配圖1中的各項裝置、元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整，且並不僅限於此。In the following, the method described in the embodiment of the present invention will be described with reference to various devices, components and modules in FIG. 1 . Each process of this method can be adjusted according to the implementation situation, and is not limited to this.

圖2是依照本發明的一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。FIG. 2 is a flow chart of an intent selection method based on context and usage history according to an embodiment of the present invention.

請結合圖1及圖2，於步驟S201中，斷詞子模組110使用斷詞符號對文字進行分詞，以將文字分為多個最小且有詞義的單位。Please refer to FIG. 1 and FIG. 2 . In step S201 , the word segmentation sub-module 110 uses word segmentation symbols to segment the text, so as to divide the text into multiple smallest and meaningful units.

於步驟S202中，詞性標註子模組120對各單位進行詞性標註，以對各單位分別標註詞性。In step S202, the part-of-speech tagging sub-module 120 performs part-of-speech tagging on each unit, so as to tag each unit with a part-of-speech respectively.

於步驟S203中，命名實體識別子模組130對經詞性標註後的各單位進行命名實體的識別，以依據命名實體分別對各單位標註命名實體標籤。In step S203, the named entity recognition sub-module 130 performs named entity recognition on each unit after the part-of-speech tagging, so as to label each unit with a named entity tag according to the named entity.

於步驟S204中，語意解析子模組240依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度，並且判斷各餘弦相似度是否大於相似度門檻值。In step S204, the semantic parsing sub-module 240 obtains each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intention number, and determines whether each cosine similarity is greater than the similarity threshold.

於步驟S205中，倘若餘弦相似度大於相似度門檻值，最佳意圖輸出子模組260選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。In step S205, if the cosine similarity is greater than the similarity threshold, the best intention output sub-module 260 selects the intention to output the intention number corresponding to the maximum cosine similarity among the cosine similarities greater than the similarity threshold.

於步驟S206中，倘若各餘弦相似度均非大於相似度門檻值，上下文解析子模組210依據word2vec模型及關鍵詞彙產生相似詞彙，依據公式1、關鍵詞彙、相似詞彙及語音指令的前N筆對話的上下文獲取對應於意圖編號的各意圖權重，判斷各意圖權重是否大於權重門檻值，以輸出大於權重門檻值的意圖權重對應的意圖編號的意圖。In step S206, if each cosine similarity is not greater than the similarity threshold, the context analysis sub-module 210 generates similar words based on the word2vec model and key words, based on Formula 1, key words, similar words and the first N words of the voice command The context of the conversation obtains the weight of each intention corresponding to the intention number, determines whether the weight of each intention is greater than the weight threshold, and outputs the intention with the intention number corresponding to the intention weight greater than the weight threshold.

於步驟S207中，語意解析子模組240依據使用者的個人資訊對使用者歷程子模組230儲存的使用者歷程記錄進行意圖分群，以獲取意圖分群編號、與意圖分群編號對應的分群意圖及所有使用者平均頻率，並且語意解析子模組240依據相似詞彙、關鍵詞彙、所有使用者平均頻率及對應意圖編號的意圖輸出對應於意圖分群編號的分群意圖。In step S207, the semantic parsing sub-module 240 performs intention grouping on the user history records stored by the user process sub-module 230 based on the user's personal information to obtain the intention grouping number, the grouping intention corresponding to the intention grouping number, and The average frequency of all users, and the semantic parsing sub-module 240 outputs the grouping intention corresponding to the intention grouping number based on similar words, key words, the average frequency of all users and the intention corresponding to the intention number.

圖3是依照本發明的另一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。FIG. 3 is a flow chart of an intent selection method based on context and usage history according to another embodiment of the present invention.

請結合圖1及圖3，於步驟S301中，斷詞子模組110使用斷詞符號對文字進行分詞，以將文字分為多個最小且有詞義的單位。Please refer to FIG. 1 and FIG. 3 . In step S301 , the word segmentation sub-module 110 uses word segmentation symbols to segment the text, so as to divide the text into multiple smallest and meaningful units.

於步驟S302中，詞性標註子模組120對各單位進行詞性標註，以對各單位分別標註詞性。In step S302, the part-of-speech tagging sub-module 120 performs part-of-speech tagging on each unit, so as to tag each unit with a part-of-speech respectively.

於步驟S303中，命名實體識別子模組130對經詞性標註後的各單位進行命名實體的識別，以依據命名實體分別對各單位標註命名實體標籤。In step S303, the named entity recognition sub-module 130 performs named entity recognition on each unit after the part-of-speech tagging, so as to label each unit with a named entity tag according to the named entity.

於步驟S304中，語意解析子模組240依據經命名實體標籤標註後的各單位與對應於意圖編號的語意模型獲取各餘弦相似度，並且判斷各餘弦相似度是否大於相似度門檻值。In step S304, the semantic parsing sub-module 240 obtains each cosine similarity based on each unit marked with the named entity tag and the semantic model corresponding to the intention number, and determines whether each cosine similarity is greater than the similarity threshold.

於步驟S305中，倘若餘弦相似度大於相似度門檻值，最佳意圖輸出子模組260選擇輸出大於相似度門檻值的餘弦相似度中的最大餘弦相似度對應的意圖編號的意圖。In step S305, if the cosine similarity is greater than the similarity threshold, the best intention output sub-module 260 selects the intention to output the intention number corresponding to the maximum cosine similarity among the cosine similarities greater than the similarity threshold.

於步驟S306中，倘若各餘弦相似度均非大於相似度門檻值時，語意解析子模組240將經命名實體標籤標註後的各單位與個人使用歷程子模組220儲存的使用者的個人使用歷程記錄中的使用的關鍵詞彙進行比對，依據頻率獲取並且輸出對應於語音指令的時間的對應於意圖編號的意圖。In step S306, if each cosine similarity is not greater than the similarity threshold, the semantic parsing sub-module 240 combines the user's personal usage stored by the named entity tags of each unit and personal usage history sub-module 220. The key words used in the history record are compared, and the intention corresponding to the intention number corresponding to the time of the voice command is obtained and output.

基於上述，本發明提供一種基於上下文情境與使用歷程的意圖評選系統及其方法，不僅可依當下對話的情境與使用者歷程記錄，進行精確意圖識別並無須互動式方法來釐清使用者的意圖，且減少使用者進行語音指令的次數並且提高使用者使用體驗，讓觸發聲控裝置更簡易更便利。Based on the above, the present invention provides an intention selection system and method based on context and usage history, which not only can accurately identify the intention based on the current conversation situation and user history records, but does not require interactive methods to clarify the user's intention. It also reduces the number of voice commands required by the user and improves the user experience, making it easier and more convenient to trigger the voice control device.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above through embodiments, they are not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field may make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.

10:意圖評選系統 310:收發器 320:儲存媒體 330:處理器 100:自然語言處理模組 200:語意評選模組 110:斷詞子模組 120:詞性標註子模組 130:命名實體識別子模組 210:上下文解析子模組 220:個人使用歷程子模組 230:使用者歷程子模組 240:語意解析子模組 250:意圖資料庫 260:最佳意圖輸出子模組 S201、S202、S203、S204、S205、S206、S207、S301、S302、S303、S304、S305、S306:步驟10: Intention selection system 310:Transceiver 320:Storage media 330: Processor 100:Natural language processing module 200: Semantic selection module 110: Word segmentation sub-module 120: Part-of-speech tagging sub-module 130: Named entity recognition submodule 210:Context parsing sub-module 220: Personal usage history sub-module 230:User process submodule 240: Semantic parsing sub-module 250:Intent database 260: Best Intention Output Submodule S201, S202, S203, S204, S205, S206, S207, S301, S302, S303, S304, S305, S306: steps

圖1是依照本發明的一實施例的一種基於上下文情境與使用歷程的意圖評選系統的示意圖。圖2是依照本發明的一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。圖3是依照本發明的另一實施例的基於上下文情境與使用歷程的意圖評選方法的流程圖。 Figure 1 is a schematic diagram of an intention selection system based on context and usage history according to an embodiment of the present invention. FIG. 2 is a flow chart of an intent selection method based on context and usage history according to an embodiment of the present invention. FIG. 3 is a flow chart of an intent selection method based on context and usage history according to another embodiment of the present invention.

10:意圖評選系統 10: Intention selection system

310:收發器 310:Transceiver

320:儲存媒體 320:Storage media

330:處理器 330: Processor

100:自然語言處理模組 100:Natural language processing module

200:語意評選模組 200: Semantic selection module

110:斷詞子模組 110: Word segmentation sub-module

120:詞性標註子模組 120: Part-of-speech tagging sub-module

130:命名實體識別子模組 130: Named entity recognition submodule

210:上下文解析子模組 210:Context parsing sub-module

220:個人使用歷程子模組 220: Personal usage history sub-module

230:使用者歷程子模組 230:User process submodule

240:語意解析子模組 240: Semantic parsing sub-module

250:意圖資料庫 250:Intent database

260:最佳意圖輸出子模組 260: Best Intention Output Submodule

Claims

一種基於上下文情境與使用歷程的意圖評選系統，應用於聲控裝置中，該聲控裝置接收使用者的語音指令後，將該語音指令轉換為文字，該意圖評選系統包括：收發器；儲存媒體，用以儲存多個模組；以及處理器，耦接至該儲存媒體以及該收發器，該處理器存取並執行該儲存媒體所儲存的該些模組，該些模組包括：自然語言處理模組，該自然語言處理模組包括：斷詞子模組，用以使用斷詞符號對該文字進行分詞，以將該文字分為多個最小且有詞義的單位；詞性標註子模組，與該斷詞子模組電性連接，用以對該些單位進行詞性標註，以對該些單位分別標註詞性；命名實體識別子模組，與該詞性標註子模組電性連接，用以對經該詞性標註後的該些單位進行命名實體的識別，以依據該命名實體分別對該些單位標註命名實體標籤；以及語意評選模組，與該自然語言處理模組電性連接，該語意評選模組包括：意圖資料庫，用以儲存意圖資料，該意圖資料包括意圖編號、與該意圖編號對應的意圖、與該意圖編號對應的語意模型及與該意圖編號對應的關鍵詞彙；語意解析子模組，與該意圖資料庫電性連接，用以依據經該命名實體標籤標註後的該些單位與對應於該意圖編號的該語意模型獲取各餘弦相似度，並且判斷各該餘弦相似度是否大於相似度門檻值；以及最佳意圖輸出子模組，與該語意解析子模組電性連接，用以選擇輸出大於該相似度門檻值的該餘弦相似度中的最大餘弦相似度對應的該意圖編號的該意圖。 An intention selection system based on context and usage history is used in a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection system includes: transceiver; Storage media to store multiple modules; and A processor is coupled to the storage medium and the transceiver. The processor accesses and executes the modules stored in the storage medium. The modules include: Natural language processing module, which includes: The word segmentation sub-module is used to segment the text using word segmentation symbols to divide the text into multiple smallest and meaningful units; The part-of-speech tagging sub-module is electrically connected to the word-segmenting sub-module, and is used to tag these units with part-of-speech tags, so as to mark these units with part-of-speech respectively; The named entity identification sub-module is electrically connected to the part-of-speech tagging sub-module, and is used to identify the named entities of the units that have been tagged with the part-of-speech, and to label the units with named entity tags based on the named entities; as well as The semantic selection module is electrically connected to the natural language processing module. The semantic selection module includes: An intent database is used to store intent data. The intent data includes an intent number, an intent corresponding to the intent number, a semantic model corresponding to the intent number, and a key vocabulary corresponding to the intent number; The semantic parsing sub-module is electrically connected to the intention database, and is used to obtain the cosine similarity based on the units marked by the named entity tag and the semantic model corresponding to the intention number, and determine the cosine similarity. Whether the similarity is greater than the similarity threshold; and The best intention output sub-module is electrically connected to the semantic parsing sub-module, and is used to select and output the intention number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold.

如請求項1所述的意圖評選系統，其中該語意評選模組更包括：上下文解析子模組，與該語意解析子模組電性連接，用以於各該餘弦相似度均非大於該相似度門檻值時，依據word2vec模型及該關鍵詞彙產生相似詞彙，依據公式1、該關鍵詞彙、該相似詞彙及該語音指令的前N筆對話的上下文獲取對應於該意圖編號的各意圖權重，判斷各該意圖權重是否大於權重門檻值，以輸出大於該權重門檻值的該意圖權重對應的該意圖編號的意圖，其中公式1為：公式（1）其中為該語音指令隸屬於intent-k的分數，為代表隸屬於k語意的該關鍵詞彙，為由k語意的該關鍵詞彙透過該word2vec模型找到的該相似詞彙，為該語音指令的前一句話。 The intention selection system as described in claim 1, wherein the semantic selection module further includes: a context parsing sub-module, electrically connected to the semantic parsing sub-module, for each cosine similarity is not greater than the similarity. When the degree threshold is reached, similar words are generated based on the word2vec model and the key words. According to Formula 1, the key words, the similar words and the context of the first N conversations of the voice command, the intention weight corresponding to the intention number is obtained, and the judgment is made Whether the weight of each intention is greater than the weight threshold value is used to output the intention number corresponding to the intention weight that is greater than the weight threshold value, where Formula 1 is: Formula (1) where is the score of the voice command belonging to intent-k, represents the key vocabulary belonging to the semantic meaning of k, It is the similar word found through the word2vec model from the key words of k semantic meaning, The previous sentence of this voice command.

如請求項2所述的意圖評選系統，其中該語意評選模組更包括：使用者歷程子模組，與該語意解析子模組電性連接，用以儲存使用者歷程記錄，該語意解析子模組依據該使用者的個人資訊對該使用者歷程記錄進行意圖分群，以獲取意圖分群編號、與該意圖分群編號對應的分群意圖及所有使用者平均頻率，並且依據該相似詞彙、該關鍵詞彙、該所有使用者平均頻率及對應該意圖編號的該意圖輸出對應於該意圖分群編號的分群意圖。 The intention selection system as described in claim 2, wherein the semantic selection module further includes: The user process sub-module is electrically connected to the semantic parsing sub-module to store user process records. The semantic parsing sub-module performs intent grouping on the user process records based on the user's personal information. Obtain the intent group number, the group intent corresponding to the intent group number, and the average frequency of all users, and output the intent based on the similar words, the key words, the average frequency of all users, and the intent number corresponding to the intent The grouping intention of the grouping number.

如請求項1所述的意圖評選系統，其中該語意評選模組更包括：個人使用歷程子模組，與該語意解析子模組電性連接，用以儲存該使用者的個人使用歷程記錄，其中該個人使用歷程記錄包括時間、該意圖編號、使用的關鍵詞彙及頻率，以於各該餘弦相似度均非大於該相似度門檻值時，該語意解析子模組將經該命名實體標籤標註後的該些單位與該個人使用歷程記錄中的該使用的關鍵詞彙進行比對，依據該頻率獲取並且輸出對應於該語音指令的時間的對應於該意圖編號的意圖。 The intention selection system as described in claim 1, wherein the semantic selection module further includes: The personal usage history sub-module is electrically connected to the semantic analysis sub-module and is used to store the user's personal usage history record, where the personal usage history record includes time, the intention number, the key words used and the frequency, When the cosine similarity is not greater than the similarity threshold, the semantic parsing sub-module compares the units marked by the named entity tag with the used key words in the personal usage history record. Yes, the intention corresponding to the intention number corresponding to the time of the voice instruction is obtained and output based on the frequency.

如請求項1所述的意圖評選系統，其中該語意解析子模組依據經該命名實體標籤標註後的該些單位與對應於該意圖編號的該語意模型獲取各該餘弦相似度的操作中更包括：依據公式（2）獲取各該餘弦相似度，該公式（2）為：公式（2）其中，u為該使用者的該語音指令，為依據該意圖資料庫的該語意模型將該使用者的該語音指令轉成詞頻（term frequency，TF）的向量，為u向量的長度，為意圖k，為意圖k語意模型依據該意圖資料庫的該語意模型轉成TF的向量，為向量的長度，為該向量與該向量的內積。 The intention selection system as described in claim 1, wherein the semantic parsing sub-module obtains updates in the operation of each cosine similarity based on the units marked by the named entity tag and the semantic model corresponding to the intention number. Including: Obtain the cosine similarity according to formula (2), which formula (2) is: Formula (2) where u is the user’s voice command, To convert the user's voice command into a vector of term frequency (TF) based on the semantic model of the intent database, is the length of u vector, For intention k, is the vector that the semantic model of intention k is converted into TF based on the semantic model of the intention database, for the length of the vector, for this vector with the Inner product of vectors.

如請求項1所述的意圖評選系統，其中該命名實體識別子模組對經該詞性標註後的該些單位進行該命名實體的識別，以依據該命名實體分別對該些單位標註該命名實體標籤的操作中更包括：通過機器學習訓練來擴增該命名實體，其中該命名實體包括人、事、時、地及物。 The intent selection system as described in claim 1, wherein the named entity identification sub-module identifies the named entities for the units that have been tagged with the part-of-speech, so as to mark the named entity tags for the units respectively based on the named entities. The operations include: The named entity is augmented through machine learning training, where the named entity includes people, things, times, places and things.

一種基於上下文情境與使用歷程的意圖評選方法，應用於聲控裝置中，該聲控裝置接收使用者的語音指令後，將該語音指令轉換為文字，該意圖評選方法包括：使用斷詞符號對該文字進行分詞，以將該文字分為多個最小且有詞義的單位；對該些單位進行詞性標註，以對該些單位分別標註詞性；對經該詞性標註後的該些單位進行命名實體的識別，以依據該命名實體分別對該些單位標註命名實體標籤；依據經該命名實體標籤標註後的該些單位與對應於意圖編號的語意模型獲取各餘弦相似度，並且判斷各該餘弦相似度是否大於相似度門檻值；以及選擇輸出大於該相似度門檻值的該餘弦相似度中的最大餘弦相似度對應的該意圖編號的該意圖。 An intention selection method based on context and usage history is applied to a voice control device. After receiving the user's voice command, the voice control device converts the voice command into text. The intention selection method includes: Use hyphenation symbols to segment the text to divide the text into multiple smallest and meaningful units; Perform part-of-speech tagging on these units to tag these units with part-of-speech respectively; Identify named entities for the units that have been tagged with the part-of-speech, so as to label the units with named entity tags based on the named entities; Obtain each cosine similarity based on the units marked with the named entity tag and the semantic model corresponding to the intent number, and determine whether each cosine similarity is greater than the similarity threshold; and Select the intent that outputs the intent number corresponding to the maximum cosine similarity among the cosine similarities that is greater than the similarity threshold.

如請求項7所述的意圖評選方法，其中該意圖評選方法更包括：於各該餘弦相似度均非大於該相似度門檻值時，依據word2vec模型及關鍵詞彙產生相似詞彙，依據公式1、該關鍵詞彙、該相似詞彙及該語音指令的前N筆對話的上下文獲取對應於該意圖編號的各意圖權重，判斷各該意圖權重是否大於權重門檻值，以輸出大於該權重門檻值的該意圖權重對應的該意圖編號的該意圖，其中公式1為：公式（1）其中為該語音指令隸屬於intent-k的分數，為代表隸屬於k語意的該關鍵詞彙，為由k語意的該關鍵詞彙透過該word2vec模型找到的該相似詞彙，為該語音指令的前一句話。 The intention selection method as described in claim 7, wherein the intention selection method further includes: when each of the cosine similarities is not greater than the similarity threshold, generating similar words based on the word2vec model and key words, according to Formula 1, the The context of the key words, the similar words and the first N conversations of the voice command obtains the weight of each intention corresponding to the intention number, determines whether the weight of each intention is greater than the weight threshold, and outputs the weight of the intention that is greater than the weight threshold. The intention corresponding to the intention number, where Formula 1 is: Formula (1) where is the score of the voice command belonging to intent-k, represents the key vocabulary belonging to the semantic meaning of k, It is the similar word found through the word2vec model from the key words of k semantic meaning, The previous sentence of this voice command.

如請求項8所述的意圖評選方法，其中該意圖評選方法更包括：依據該使用者的個人資訊對儲存的使用者歷程記錄進行意圖分群，以獲取意圖分群編號、與該意圖分群編號對應的分群意圖及所有使用者平均頻率，並且依據該相似詞彙、該關鍵詞彙、該所有使用者平均頻率及對應該意圖編號的該意圖輸出對應於該意圖分群編號的分群意圖。 The intention selection method as described in claim 8, wherein the intention selection method further includes: Perform intent grouping on the stored user history records based on the user's personal information to obtain the intent group number, the group intent corresponding to the intent group number, and the average frequency of all users, and based on the similar words, the key words, The average frequency of all users and the intention output corresponding to the intention number correspond to the grouping intention of the intention grouping number.

如請求項7所述的意圖評選方法，其中該意圖評選方法更包括：於各該餘弦相似度均非大於該相似度門檻值時，將經該命名實體標籤標註後的該些單位與儲存的該使用者的個人使用歷程記錄中的使用的關鍵詞彙進行比對，依據頻率獲取並且輸出對應於該語音指令的時間的對應於該意圖編號的意圖，其中該個人使用歷程記錄包括該時間、該意圖編號、該使用的關鍵詞彙及該頻率。 The intention selection method as described in claim 7, wherein the intention selection method further includes: When the cosine similarity is not greater than the similarity threshold, the units marked by the named entity tag are compared with the key words used in the stored personal usage history record of the user, based on The frequency acquires and outputs the intention corresponding to the intention number corresponding to the time of the voice instruction, wherein the personal usage history record includes the time, the intention number, the used key words and the frequency.

如請求項7所述的意圖評選方法，其中該依據經該命名實體標籤標註後的該些單位與對應於該意圖編號的該語意模型獲取各該餘弦相似度的步驟中更包括：依據公式（2）獲取各該餘弦相似度，該公式（2）為：公式（2）其中，u為該使用者的該語音指令，為依據一意圖資料庫的該語意模型將該使用者的該語音指令轉成詞頻（term frequency，TF）的向量，為u向量的長度，為意圖k，為意圖k語意模型依據該意圖資料庫的該語意模型轉成TF的向量，為向量的長度，為該向量與該向量的內積。 The intention selection method as described in claim 7, wherein the step of obtaining the cosine similarity based on the units marked by the named entity tag and the semantic model corresponding to the intention number further includes: according to the formula ( 2) Obtain the cosine similarity of each one. The formula (2) is: Formula (2) where u is the user’s voice command, Converting the user's voice command into a vector of term frequency (TF) based on the semantic model of an intent database, is the length of u vector, For intention k, is the vector that the semantic model of intention k is converted into TF based on the semantic model of the intention database, for the length of the vector, for this vector with the Inner product of vectors.

如請求項7所述的意圖評選方法，其中該對經該詞性標註後的該些單位進行該命名實體的識別，以依據該命名實體分別對該些單位標註該命名實體標籤的步驟中更包括：通過機器學習訓練來擴增該命名實體，其中該命名實體包括人、事、時、地及物。 The intent selection method as described in claim 7, wherein the step of identifying the named entities for the units that have been tagged with the part-of-speech, so as to label the units with the named entity tags according to the named entities, further includes : The named entity is augmented through machine learning training, where the named entity includes people, things, times, places and things.