TWI762103B - Method and system for machine reading comprehension - Google Patents
Method and system for machine reading comprehension Download PDFInfo
- Publication number
- TWI762103B TWI762103B TW109145608A TW109145608A TWI762103B TW I762103 B TWI762103 B TW I762103B TW 109145608 A TW109145608 A TW 109145608A TW 109145608 A TW109145608 A TW 109145608A TW I762103 B TWI762103 B TW I762103B
- Authority
- TW
- Taiwan
- Prior art keywords
- text
- knowledge
- encoding
- vectors
- code
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
本發明係關於一種自然語言處理方法。 The present invention relates to a natural language processing method.
機器閱讀理解(Machine Reading Comprehension,MRC)為一種讓電腦閱讀文章並解答相關問題的技術。近年來,各行各業之文字資料大量產生,若以傳統的人工處理方式,例如羅列FAQ,將面臨處理速度慢、開銷巨大、問答對無法全面覆蓋等問題,甚至成為企業發展的瓶頸,因此對於機器閱讀理解的需求逐步提升。 Machine Reading Comprehension (MRC) is a technology that allows computers to read texts and answer related questions. In recent years, a large number of written data from all walks of life have been generated. If traditional manual processing methods are used, such as listing FAQs, there will be problems such as slow processing speed, huge overhead, and inability to comprehensively cover questions and answers, and even become a bottleneck for enterprise development. Therefore, for The demand for machine reading comprehension is gradually increasing.
然而,一般來說,為了文章的簡潔及文學之美,作者在撰寫文章時往往會省略人們既有的常識,另外,專業文章(例如醫學論文)之作者在撰寫時,亦常假設讀者有相關之背景知識,便不於文章中納入過多背景知識的介紹。因此,若以此類文章作為訓練資料或是目標查找答案之資料時,機器閱讀理解系統所取得之答案的精準度將相當地低。 However, in general, for the sake of brevity and literary beauty, authors tend to omit existing common sense when writing articles. In addition, authors of professional articles (such as medical papers) often assume that readers have relevant knowledge when writing articles. Background knowledge, so it is not appropriate to include too much background knowledge in the article. Therefore, if such articles are used as training data or target data for finding answers, the accuracy of the answers obtained by the machine reading comprehension system will be quite low.
鑒於上述,本發明提供一種機器閱讀理解方法及系統。 In view of the above, the present invention provides a machine reading comprehension method and system.
依據本發明一實施例的機器閱讀理解方法,包含取得問題文本及關聯於問題文本的文章文本,依據知識集產生對應於問題文本的第一知識文本及對應於文章文本的第二知識文本,編碼問題文本及文章文本以產 生原始目標文本編碼,編碼第一知識文本及第二知識文本以產生知識文本編碼,對原始目標文本編碼及知識文本編碼執行融合運算以將知識集中的部分知識導入原始目標文本編碼而產生強化目標文本編碼,以及基於強化目標文本編碼取得對應於問題文本的答案,並輸出答案。 A machine reading comprehension method according to an embodiment of the present invention includes obtaining a question text and an article text associated with the question text, generating a first knowledge text corresponding to the question text and a second knowledge text corresponding to the article text according to a knowledge set, and encoding Question text and article text to produce Generate the original target text code, encode the first knowledge text and the second knowledge text to generate the knowledge text code, perform a fusion operation on the original target text code and the knowledge text code to import part of the knowledge in the knowledge set into the original target text code to generate the reinforcement target Text encoding, and obtaining an answer corresponding to the question text based on the reinforcement target text encoding, and outputting the answer.
依據本發明一實施例的機器閱讀理解系統,包含輸入輸出介面、知識文本產生器、語意編碼器、編碼融合器及答案擷取器,其中知識文本產生器連接於輸入輸出介面,語意編碼器連接於輸入輸出介面及知識文本產生器,編碼融合器連接於語意編碼器,且答案擷取器連接於編碼融合器。輸入輸出介面用於取得問題文本及關聯於問題文本的文章文本。知識文本產生器用於依據知識集取得對應於問題文本的第一知識文本及對應於文章文本的第二知識文本。語意編碼器用於編碼問題文本及文章文本以產生原始目標文本編碼,以及編碼第一知識文本及第二知識文本以產生知識文本編碼。編碼融合器用於對原始目標文本編碼及知識文本編碼執行融合運算,以將知識集中的部分知識導入原始目標文本編碼而產生強化目標文本編碼。答案擷取器用於基於強化目標文本編碼,取得對應於問題文本的答案,並輸出答案。 A machine reading comprehension system according to an embodiment of the present invention includes an input and output interface, a knowledge text generator, a semantic encoder, an encoding fusion device and an answer extractor, wherein the knowledge text generator is connected to the input and output interface, and the semantic encoder is connected to In the input-output interface and the knowledge text generator, the encoding fusion device is connected to the semantic encoder, and the answer extractor is connected to the encoding fusion device. The input-output interface is used to obtain the question text and the article text associated with the question text. The knowledge text generator is used for obtaining the first knowledge text corresponding to the question text and the second knowledge text corresponding to the article text according to the knowledge set. The semantic encoder is used to encode the question text and the article text to generate the original target text encoding, and to encode the first knowledge text and the second knowledge text to generate the knowledge text encoding. The code fusion unit is used to perform fusion operation on the original target text code and the knowledge text code, so as to import part of the knowledge in the knowledge set into the original target text code to generate the enhanced target text code. The answer extractor is used to obtain the answer corresponding to the question text based on the enhanced target text encoding, and output the answer.
藉由上述架構,本案所揭示的機器閱讀理解方法及系統,可以執行特殊的編碼運算及融合運算,以在分析問題及文章的過程中導入外部知識,藉此避免文章內容精簡而難以從中取得正確答案的問題,進而提升預測答案之精準度。 With the above structure, the machine reading comprehension method and system disclosed in this case can perform special coding operations and fusion operations to import external knowledge in the process of analyzing problems and articles, thereby avoiding the conciseness of article content and making it difficult to obtain correctness from it. answer questions, thereby improving the accuracy of the predicted answer.
以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理,並且提供本發明之專利申請範圍更進一步之解釋。 The above description of the present disclosure and the following description of the embodiments are used to demonstrate and explain the spirit and principle of the present invention, and provide further explanation of the scope of the patent application of the present invention.
1:機器閱讀理解系統 1: Machine Reading Comprehension System
11:輸入輸出介面 11: Input and output interface
12:知識文本產生器 12: Knowledge Text Generator
13:語意編碼器 13: Semantic encoder
14:編碼融合器 14: Coding Fusion
15:答案擷取器 15: Answer Picker
21:非結構化知識資料庫 21: Unstructured Knowledge Repository
22:結構化知識資料庫 22: Structured Knowledge Repository
x1~x4:單字 x 1 ~x 4 : single word
a1~a4:初始向量 a 1 ~a 4 : initial vector
b1~b4、b1’~b4’:編碼向量 b 1 ~b 4 , b 1 '~b 4 ': encoding vector
aq1~aq4、bq1~bq4:查詢向量 aq 1 ~aq 4 , bq 1 ~bq 4 : query vector
ak1~ak4、bk1’~bk4’:鍵向量 ak 1 ~ak 4 , bk 1 '~bk 4 ': key vector
av1~av4、bv1’~bv4’:值向量 av 1 ~av 4 , bv 1 '~bv 4 ': vector of values
α1,1~α1,4、β1,1' ~β1,4' :初始權重 α 1,1 ~α 1,4 , β 1,1 ' ~β 1,4 ' : initial weight
~、~:歸一化權重 ~ , ~ : normalized weight
m1~m4:融合向量 m 1 ~m 4 : fusion vector
c1:加權和向量 c 1 : weighted sum vector
S1~S7:步驟 S1~S7: Steps
S21~S25:步驟 S21~S25: Steps
S61~S62:步驟 S61~S62: Steps
S8~S11:步驟 S8~S11: Steps
圖1係依據本發明一實施例所繪示的機器閱讀理解系統及外部知識資料庫的功能方塊圖。 FIG. 1 is a functional block diagram of a machine reading comprehension system and an external knowledge database according to an embodiment of the present invention.
圖2係依據本發明一實施例所繪示的機器閱讀理解方法的流程圖。 FIG. 2 is a flowchart of a machine reading comprehension method according to an embodiment of the present invention.
圖3係依據本發明一實施例所繪示的機器閱讀理解方法中之產生知識文本的流程圖。 FIG. 3 is a flowchart of generating knowledge text in a machine reading comprehension method according to an embodiment of the present invention.
圖4A~圖4C係依據本發明一實施例所繪示的機器閱讀理解方法中之編碼作業的運算示意圖。 4A to 4C are schematic diagrams of operations of encoding operations in a machine reading comprehension method according to an embodiment of the present invention.
圖5A~圖5C係依據本發明一實施例所繪示的機器閱讀理解方法中之融合運算的運算示意圖。 5A to 5C are schematic diagrams of operations of fusion operations in a machine reading comprehension method according to an embodiment of the present invention.
圖6A及圖6B係依據本發明一實施例所繪示的機器閱讀理解方法中之答案擷取作業的流程圖。 FIG. 6A and FIG. 6B are flowcharts of an answer retrieval operation in a machine reading comprehension method according to an embodiment of the present invention.
圖7係依據本發明一實施例所繪示的機器閱讀理解方法中之最佳化使用參數的流程圖。 FIG. 7 is a flow chart of optimizing parameters used in a machine reading comprehension method according to an embodiment of the present invention.
圖8A係現有機器閱讀理解系統與本發明一實施例的機器閱讀理解系統以第一種資料進行訓練而得的實驗數據比較圖。 FIG. 8A is a comparison diagram of experimental data obtained by training the existing machine reading comprehension system and the machine reading comprehension system of an embodiment of the present invention with the first data.
圖8B係現有機器閱讀理解系統與本發明一實施例的機器閱讀理解系統以第二種資料進行訓練而得的實驗數據比較圖。 FIG. 8B is a comparison diagram of experimental data obtained by training the existing machine reading comprehension system and the machine reading comprehension system of an embodiment of the present invention with the second data.
以下在實施方式中詳細敘述本發明之詳細特徵以及優點,其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施,且根據本說明書所揭露之內容、申請專利範圍及圖式,任何熟習相關技藝者可輕易地理解本發明相關之目的及優點。以下之實施例係進一步詳細說明本發明之觀點,但非以任何觀點限制本發明之範疇。 The detailed features and advantages of the present invention are described in detail below in the embodiments, and the content is sufficient to enable any person skilled in the relevant art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, the scope of the patent application and the drawings , any person skilled in the related art can easily understand the related objects and advantages of the present invention. The following examples further illustrate the viewpoints of the present invention in detail, but do not limit the scope of the present invention in any viewpoint.
請參考圖1,圖1係依據本發明一實施例所繪示的機器閱讀理解系統及外部知識資料庫的功能方塊圖。如圖1所示,機器閱讀理解系統1包含輸入輸出介面11、知識文本產生器12、語意編碼器13、編碼融合器14及答案擷取器15,其中知識文本產生器12連接於輸入輸出介面11且可以連接於系統外的非結構化知識資料庫21或/及結構化知識資料庫22,語意編碼器13連接於輸入輸出介面11及知識文本產生器12,編碼融合器14連接於語意編碼器13,且答案擷取器15連接於編碼融合器14及輸入輸出介面11。
Please refer to FIG. 1 , which is a functional block diagram of a machine reading comprehension system and an external knowledge database according to an embodiment of the present invention. As shown in FIG. 1 , the machine
輸入輸出介面11用以取得問題文本及關聯於問題文本的文章文本,且可用於輸出系統之其他裝置所判定之對應於問題文本的答案。其中,問題文本及文章文本可以為文字檔案,問題文本指示欲尋求答案之問題,而文章文本則指示答案可能的出處。舉例來說,在智能客服應用上,產品說明文件或活動規則可以作為文章文本,產品使用方法或活動優惠內容之詢問可以作為問題文本。舉另個例子來說,在智慧醫療應用上,病歷或醫學論文可以作為文章文本,病因或療法之詢問可以作為問題文本。上述僅為舉例,並非意圖限制本發明。
The input-
輸入輸出介面11可以包含鍵盤、滑鼠或觸控螢幕等輸入設備,以供使用者輸入或選擇問題文本及文章文本,亦可包含螢幕等輸出設備,以輸出答
案擷取器15所產生之答案。或者,輸入輸出介面11可以為有線或無線連接埠,用以連接系統外部裝置(例如手機、平板、個人電腦等)以接收問題文本及文章文本或選擇特定問題文本及文章文本的指令,且可將答案擷取器15所產生之答案傳送至系統外部裝置。再或者,輸入輸出介面11除了上述輸入輸出設備或連接埠之外,可以更包含處理模組。輸入輸出介面11可以藉由輸入設備或連接埠接收問題文本或選擇特定問題文本的指令,再藉由處理模組從系統外部或內部資料庫查找關聯於問題文本的文章文本。進一步來說,處理模組可以依據問題文本中的關鍵字或是問題文本帶有的標籤判斷問題文本的類型或所屬之事件,並查找相同類型或所屬事件的文章文本。
The input-
知識文本產生器12、語意編碼器13、編碼融合器14、答案擷取器15及前述輸入輸出介面11可能具有之處理模組可以由同個處理器或多個處理器來實施,其中所謂處理器例如為中央處理器(Central Processing Unit,CPU)、微控制器、可程式化邏輯控制器(Programmable Logic Controller,PLC)等。
The
知識文本產生器12用於從輸入輸出介面11接收問題文本及文章文本,且依據一知識集,產生對應於問題文本的第一知識文本及對應於文章文本的第二知識文本。知識集可以由非結構化知識資料庫21及結構化知識資料庫22中的一或二者提供。非結構化知識資料庫21及結構化知識資料庫22可以為網路上公開的資料庫或業者內部的資料庫。非結構化知識資料庫21儲存多個非結構化知識,其中所述多個非結構化知識可以分別為多個特定詞的文字描述。舉例來說,非結構化知識庫21可以包含維基百科、辭典等。結構化知識資料庫22儲存多個結構化知識,其中所述多個結構化知識可以分別為多個特定詞於其他詞的關係,例如以「實體-實體關係-實體」之三元組的形式來表示,且多個三元組可以
組成知識圖譜。也就是說,結構化知識資料庫22可以包含多種領域的知識圖譜。另外,知識文本產生器12可以透過輸入輸出介面11輸出知識集的至少一部分。進一步來說,知識文本產生器12可以透過輸入輸出介面11輸出非結構化知識庫21或/及結構化知識資料庫22所儲存之知識資料,也可以輸出知識文本產生器12所產生的知識文本,以供使用者查看或調整。有關知識文本產生器12依據上述知識集產生知識文本之進一步實施方式將於後描述。
The
語意編碼器13用於從輸入輸出介面11接收問題文本及文章文本,編碼問題文本及文章文本以產生原始目標文本編碼,從知識文本產生器12接收其所產生之第一知識文本及第二知識文本,並編碼第一知識文本及第二知識文本以產生知識文本編碼。語意編碼器13可以多種方式進行編碼作業,包含無涉及上下文關係(non-contextualized)的編碼方式以及涉及上下文關係(contextualized)的編碼方式,其中進一步的實施方式將於後描述。
The
編碼融合器14用於對語意編碼器13產生之原始目標文本編碼及知識文本編碼執行融合運算,以將知識集中的部分知識導入原始目標文本編碼而產生強化目標文本編碼。答案擷取器15用於基於強化目標文本編碼,取得對應於問題文本的答案,並透過輸入輸出介面輸出答案,其中所述輸出介面例如為螢幕等輸出設備,或者為有線或無線連接埠用以連接並將答案傳送至系統外部裝置(例如手機、平板、個人電腦等)。有關編碼融合器14所執行之融合運算及答案擷取器15所執行之答案擷取作業的進一步實施方式將於後描述。
The
請參考圖1及圖2,其中圖2係依據本發明一實施例所繪示的機器閱讀理解方法的流程圖。圖2所示的機器閱讀理解方法適用於圖1所示的機器閱讀理解系統1,但不限於此。如圖2所示,機器閱讀理解方法包含步驟S1:取得問
題文本及關聯於問題文本的文章文本;步驟S2:依據知識集,產生對應於問題文本的第一知識文本及對應於文章文本的第二知識文本;步驟S3:編碼問題文本及文章文本以產生原始目標文本編碼;步驟S4:編碼第一知識文本及第二知識文本以產生知識文本編碼;步驟S5:對原始目標文本編碼及知識文本編碼執行融合運算,以將知識集中的部分知識導入原始目標文本編碼而產生強化目標文本編碼;步驟S6:基於強化目標文本編碼,取得對應於問題文本的答案;步驟S7:輸出答案。以下示例性地以圖1所示的機器閱讀理解系統1包含的裝置來進一步說明圖2所示之資源配置方法的各種實施方式。
Please refer to FIG. 1 and FIG. 2 , wherein FIG. 2 is a flowchart of a machine reading comprehension method according to an embodiment of the present invention. The machine reading comprehension method shown in FIG. 2 is applicable to the machine
於步驟S1中,輸入輸出介面11可以取得問題文本及關聯於問題文本的文章文本。進一步來說,輸入輸出介面11可以直接接收問題文本及文章文本之檔案,或接收選擇特定問題文本及文章文本的指令,或者可以接收問題文本或選擇特定問題文本的指令,再從系統外部或內部資料庫查找關聯於問題文本的文章文本。其中,查找關聯於問題文本的文章文本的方式可以為:依據問題文本中的關鍵字或是問題文本帶有的標籤判斷問題文本的類型或所屬之事件,再查找相同類型或所屬事件的文章文本。舉例來說,當輸入輸出介面11判斷問題文本為醫學類時,便查找醫學類的文章文本;當輸入輸出介面11判斷問題文本指示周年慶活動之相關問題時,便查找周年慶活動之相關文章。上述僅為舉例,並非意圖限制本發明。
In step S1, the input/
於步驟S2中,知識文本產生器12可以依據知識集,產生對應於問題文本的第一知識文本及對應於文章文本的第二知識文本,也就是說,知識文本產生器12可以分別將問題文本及文章文本作為待處理文本來處理,以分別產生對應之知識文本。知識集包含非結構化知識資料庫21及結構化知識資料庫22中
的一或二者所儲存的知識,也就是說,知識文本產生器12可以從非結構化知識資料庫21或/及結構化知識資料庫22查找用於產生第一知識文本及第二知識文本的材料。
In step S2, the
進一步來說明產生知識文本之流程,請參考圖1及圖3,圖3係依據本發明一實施例所繪示的機器閱讀理解方法中之產生知識文本的流程圖。如圖3所示,產生知識文本之流程可以包含步驟S21:將待處理文本分割為多個詞;步驟S22:依據所述多個詞查找知識集,以取得至少一相關知識;步驟S23:判斷相關知識的數量為一或大於一;當相關知識的數量為一時,執行步驟S24:以此相關知識產生目標知識文本;而當相關知識的數量大於一時,執行步驟S25:依據所述多個詞的排列順序及預設範本,組合相關知識以產生目標知識文本。其中,以問題文本作為待處理文本而產生之目標知識文本即為第一知識文本,而以文章文本作為待處理文本而產生之目標知識文本即為第二知識文本。 To further describe the process of generating knowledge text, please refer to FIG. 1 and FIG. 3 . FIG. 3 is a flowchart of generating knowledge text in a machine reading comprehension method according to an embodiment of the present invention. As shown in FIG. 3, the process of generating knowledge text may include step S21: dividing the text to be processed into a plurality of words; step S22: searching a knowledge set according to the plurality of words to obtain at least one relevant knowledge; step S23: judging The number of related knowledge is one or more than one; when the number of related knowledge is one, execute step S24: generate the target knowledge text based on the related knowledge; and when the number of related knowledge is greater than one, execute step S25: according to the multiple words The order of arrangement and the preset template, and the related knowledge is combined to generate the target knowledge text. The target knowledge text generated by taking the question text as the text to be processed is the first knowledge text, and the target knowledge text generated by using the article text as the text to be processed is the second knowledge text.
於步驟S21中,知識文本產生器12可以藉由自然語言分析技術將待處理文本分割為多個詞。於步驟S22中,知識文本產生器12可以將分割出的詞各作為關鍵字以從知識集中查找與關鍵字相關的知識,也就是從非結構化知識資料庫21或/及結構化知識資料庫22查找與關鍵字相關的知識。特別來說,待處理文本所包含之關鍵字的數量與查找到之相關知識的數量不一定對應,一關鍵字可能對應於零個、一個或多個相關知識。也就是說,知識文本產生器12所取得之相關知識可能為零個、一個或多個。當相關知識的數量為零時,知識文本產生器12便停止運作或/及輸出錯誤訊號;當相關知識的數量為一或大於一時,知識文本產生器12之運作則如下所述。
In step S21 , the
於步驟S23~S25中,當相關知識的數量為一時,知識文本產生器12便依據此相關知識產生目標知識文本;而當相關知識的數量大於一時,知識文本產生器12會依據分割出來的詞的排列順序及一預設範本(第一預設範本),組合這些相關知識以產生目標知識文本。舉例來說,第一預設範本指示串接所有相關知識之文字描述,且每兩相關知識間以分隔符號(例如句號)隔開,其中串接的順序同於詞的排列順序,但不限於此。於另一實施例中,知識文本產生器12可以再將串接完的文字描述透過文本摘要系統產生精簡版本的知識文本以作為目標知識文本。另外,當知識文本產生器12所取得之相關知識的數量大於一預設處理上限時,知識文本產生器12可以依據待處理文本的類型或所屬活動(例如基於文本帶有的標籤)或依據相關知識來源的可信度(例如期刊論文優先於網路文章)來篩選相關知識,以留下不大於預設處理上限之數量的相關知識。
In steps S23-S25, when the quantity of the relevant knowledge is one, the
如前所述,知識文本產生器12依據關鍵字所取得之相關知識可能來自於非結構化知識資料庫21或/及結構化知識資料庫22,也就是說,相關知識可能包含非結構化知識或/及結構化知識。對於屬於非結構化知識的相關知識,其形式本身便為文字描述,因此知識文本產生器12可以直接以相關知識來產生目標知識文本。對於屬於結構化知識的相關知識,知識文本產生器12在產生目標知識文本之前,會先依據另一預設範本(第二預設範本)將相關知識之形式轉換為文字描述。以形式為「實體(A)-實體關係(B)-實體(C)」之三元組的非結構化知識為例,第二預設範本可以設定為「A的B為C」,但不限於此。
As mentioned above, the relevant knowledge obtained by the
下述舉三個以問題文本作為待處理文本之例子,分別為相關知識皆屬非結構化知識的例子、相關知識皆屬結構化知識的例子以及相關知識兼具 非結構化知識及結構化知識的例子。這些例子僅為示例性的說明,非意圖限制本發明。 The following three examples of using the question text as the text to be processed are the examples in which the relevant knowledge is all unstructured knowledge, the example in which the relevant knowledge is all structured knowledge, and the relevant knowledge is both. Examples of unstructured and structured knowledge. These examples are illustrative only and are not intended to limit the invention.
於第一個例子中,問題文本為「原告想要維護什麼權利」,且知識文本產生器12從知識集中取得關鍵字「原告」的文字描述及關鍵字「權利」的文字描述,則知識文本產生器12可以產生第一知識文本「(原告的文字描述)。(權利的文字描述)」。於第二個例子中,問題文本為「坐月子可以洗澡嗎」,且知識文本產生器12從知識集中取得關鍵字「坐月子」的三元組「坐月子-概念-產後護理」及關鍵字「洗澡」的三元組「洗澡-功效-清除污垢」,則知識文本產生器12可以先將兩三元組分別轉換為文字描述「坐月子的概念為產後護理」及「洗澡的功效為清除污垢」,再將兩個文字描述依關鍵字在問題文本中的順序串接以產生目標知識文本。於第三個例子中,問題文本為「婚生子女的出生年月日為何」,且知識文本產生器12從知識集中取得關鍵字「婚生子女」的文字描述及關鍵字「年月日」的三元組,則知識文本產生器12會先將「年月日」的三元組依前述方式轉換為文字描述,再依關鍵字在問題文本中的順序串接文字描述。上述例子僅為示例性的說明,非意圖限制本發明。
In the first example, the question text is "what right does the plaintiff want to defend", and the
如上所述,機器閱讀理解系統1可以藉由知識文本產生器12將結構化知識轉換為文字描述,以整合非結構化及結構化的知識,藉此,後續分析文章以產出答案之運算相較於直接分析結構化資料以產出答案之運算可以具有較低的運算複雜度。
As mentioned above, the machine
接著說明圖2之步驟S3及S4。於此要特別說明的是,圖2示例性地繪示步驟S4執行於步驟S3之後,然而於其他實施例中,步驟S4可以執行於步驟S3之前,也可以與步驟S3同時執行。於步驟S3及S4中,語意編碼器13可以編碼
問題文本及文章文本以產生原始目標文本編碼,並編碼第一知識文本及第二知識文本以產生知識文本編碼。也就是說,於步驟S3中,語意編碼器13將問題文本與文章文本之組合作為編碼運算的執行對象,而於步驟S4中,將第一知識文本與第二知識文本之組合作為編碼運算的執行對象,其中,所謂組合可以係將兩文本直接串連而形成,或是串連後於文本串首端、尾端及兩文本之間加入分割符(例如於首端加入[CLS],於尾端加入[SEP]且於兩文本之間加入[SEP])而形成,但不限於此。
Next, steps S3 and S4 in FIG. 2 will be described. It should be noted here that, FIG. 2 exemplarily shows that step S4 is performed after step S3, but in other embodiments, step S4 may be performed before step S3, or may be performed simultaneously with step S3. In steps S3 and S4, the
語意編碼器13可以藉由無涉及上下文關係的編碼方式或涉及上下文關係的方式來執行編碼運算以產生原始目標文本編碼或知識文本編碼。特別來說,產生原始目標文本編碼及知識文本編碼的方式可以利用相同或不同之編碼方式。無涉及上下文關係的編碼方式可以包含:將執行對象分割為多個單字,取得各單字分別對應的初始向量以及組合初始向量以產生原始目標文本編碼或知識文本編碼。以英文作為執行對象為例,語意編碼器13可直接依據執行對象中的空格來分割詞,或可藉由WordPiece演算法來分割成子詞(subword),例如將playing分割為play及##ing;另以中文作為執行對象為例,語意編碼器13可以將執行對象分割為多個字元,或可藉由自然語言分析技術將執行對象分割為多個詞。上述僅為舉例,本發明不以此為限。
The
所述初始向量可以僅為單字向量(Token Embedding),或者包含同一維度的單字向量、分段向量(Segment Embedding)及位置向量(Position Embedding),例如為三向量之和。單字向量表示對應的單字在向量空間中的代表向量,而取得單字向量的方式可以利用Word2Vec模型或GloVe模型來實施。分段向量表示對應的單字係屬於執行對象中的第一文
本或第二文本。以問題文本與文章文本之組合作為執行對象為例,所述第一文本表示問題文本,且對應之分段向量為編碼代號為0的向量,所述第二文本表示文章文本,且對應之分段向量為編碼代號為1的向量。位置向量則表示對應的單字在所有單字中的位置。原始目標文本編碼或知識文本編碼可以為初始向量所組成的向量矩陣。
The initial vector may only be a single-word vector (Token Embedding), or may include a single-word vector, a segment vector (Segment Embedding) and a position vector (Position Embedding) of the same dimension, such as the sum of three vectors. The single-word vector represents the representative vector of the corresponding single-word in the vector space, and the way to obtain the single-word vector can be implemented by using the Word2Vec model or the GloVe model. The segment vector indicates that the corresponding word belongs to the first text in the execution object.
this or a second text. Taking the combination of the question text and the article text as the execution object as an example, the first text represents the question text, and the corresponding segment vector is a vector with an encoding code of 0, the second text represents the article text, and the corresponding The segment vector is a vector with
涉及上下文關係的編碼方式則可以包含:將執行對象分割為多個單字;取得這些單字分別對應的多個初始向量;對這些初始向量執行上下文編碼以產生多個編碼向量;以及組合這些編碼向量以產生原始目標文本編碼或知識文本編碼。所述初始向量如前所述,可以僅為單字向量,或者包含同一維度的單字向量、分段向量及位置向量,例如為三向量之和。其中單字向量、分段向量及位置向量之意義如前所述於此便不再贅述。 The encoding method involving the context relationship may include: dividing the execution object into multiple words; obtaining multiple initial vectors corresponding to these words; performing context encoding on these initial vectors to generate multiple encoding vectors; and combining these encoding vectors to Generates original target text encodings or knowledge text encodings. As mentioned above, the initial vector may only be a single-word vector, or may include a single-word vector, a segment vector, and a position vector of the same dimension, for example, the sum of three vectors. The meanings of the single word vector, the segment vector and the position vector are described above and will not be repeated here.
進一步來說明一種上下文編碼方式,請參考圖1及圖4A~圖4C,圖4A~圖4C係依據本發明一實施例所繪示的機器閱讀理解方法中之編碼作業的運算示意圖。於圖4A中,語意編碼器13將執行對象分割為單字x1~x4,並依前述方式取得單字x1~x4分別對應的初始向量a1~a4,語意編碼器13接著分別對初始向量a1~a4執行上下文編碼,以產生多個編碼向量b1~b4。其中,對初始向量a1~a4所執行之上下文編碼可以平行執行,也可以依特定順序執行。圖4B及圖4C示例性地繪示對初始向量a1執行上下文編碼以取得編碼向量b1,其他初始向量a2~a4亦以相同運算來取得編碼向量b2~b4,因此無另外繪示。另外要特別說明的是,圖4A~圖4C所繪示之單字數量僅為舉例,本發明不限於此。
To further illustrate a context encoding method, please refer to FIG. 1 and FIGS. 4A to 4C . FIGS. 4A to 4C are schematic operation diagrams of encoding operations in a machine reading comprehension method according to an embodiment of the present invention. In FIG. 4A , the
如圖4B所示,語意編碼器13可以依據初始向量a1~a4產生對應的多個查詢向量aq1~aq4、多個鍵向量ak1~ak4及多個值向量av1~av4。進一步來說,
表示查詢向量aq1~aq4、鍵向量ak1~ak4及值向量av1~av4的數學式可以表示如下:aqi=Waqai As shown in FIG. 4B , the
aki=Wakai ak i =W ak a i
avi=Wavai av i =W av a i
其中,Waq、Wak及Wav為隨機給定之權重矩陣,可藉由分析機器閱讀理解系統1多次的執行表現來決定最佳值,進一步的最佳化流程將於後描述。
Among them, W aq , W ak and W av are randomly given weight matrices, and the optimal value can be determined by analyzing the execution performance of the machine
接著,語意編碼器13將查詢向量aq1分別與鍵向量ak1~ak4執行內積運算,以得到多個初始權重α1,1~α1,4。或者,在進行內積運算之後可以更除以查詢向量aq1及鍵向量ak1~ak4所屬之維度,以得多個初始權重α1,1~α1,4,以數學式表示可以為:
其中d為查詢向量aq1及鍵向量ak1~ak4所屬之維度。 d is the dimension to which the query vector aq 1 and the key vector ak 1 ~ak 4 belong.
語意編碼器13再分別對初始權重α1,1~α1,4執行歸一化運算,以得到多個歸一化權重~。其中,所述歸一化運算可以利用Softmax函式來執行,以Softmax函式運算而得之歸一化權重~以數學式可以表示如下,然本發明之歸一化運算亦可藉由其他使權重之總和為1之函數來執行,不以下列數學式為限:
接著如圖4C所示,語意編碼器13將歸一化權重~與值向量av1~av4進行加權和運算,以得到加權和向量,而此加權和向量便作為編碼向量b1,以數學式表示可以為:
編碼向量b2~b4亦由語意編碼器13以上述運算方式產生。於另一實施例中,上述利用查詢向量aq1~aq4、鍵向量ak1~ak4及值向量av1~av4進行之運算可以重複執行多次。也就是說,圖4A中的上下文編碼程序方塊可以為多層,語意編碼器13將初始向量a1~a4作為第一層的輸入,其輸出(加權和向量)作為下一層的輸入,以此類推,最後一層輸出的運算結果便作為編碼向量b1~b4,其中,每次上下文編碼程序中用於產生查詢向量、鍵向量及值向量之權重矩陣會有所不同。藉此,可以提升機器閱讀理解系統1對於文本的理解程度。當編碼運算的執行對象為問題文本與文章文本之組合時,編碼向量b1~b4所組合之矩陣為原始目標文本編碼,而當執行對象為第一知識文本與第二知識文本之組合時,編碼向量b1~b4所組合之矩陣為知識文本編碼。
The coding vectors b 2 ˜b 4 are also generated by the
除了圖4A~圖4C所示之上下文編碼方式,語意編碼器13亦可執行其他種上下文編碼器的編碼方式,例如BERT、RoBERTa、XLNet、ALBERT、採用長短期記憶模型(Long Short-Term Memory,LSTM)的ELMo等編碼器之編碼方式。
In addition to the context encoding methods shown in FIGS. 4A to 4C , the
經上述語意編碼器13執行編碼作業以產生原始目標文本編碼及知識文本編碼之後,編碼融合器14可以對原始目標文本編碼及知識文本編碼執行融合運算,以將知識集中的部分知識導入原始目標文本編碼而產生強化目標
文本編碼,即圖2所示之步驟S5。進一步來說,請參考圖1及圖5A~圖5C,其中圖5A~圖5C係依據本發明一實施例所繪示的機器閱讀理解方法中之融合運算的運算示意圖。於圖5A中,編碼向量b1~b4表示原始目標文本編碼所包含之編碼向量,編碼向量b1’~b4’則表示知識文本編碼所包含之編碼向量。編碼融合器14可以對原始目標文本編碼之編碼向量b1~b4及知識文本編碼之編碼向量b1’~b4’執行融合運算,以產生融合向量m1~m4。其中,用於產生融合向量m1~m4所執行之融合運算可以平行執行,也可以依特定順序執行。圖5B及圖5C示例性地繪示對編碼向量b1與編碼向量b1’~b4’執行融合運算以取得融合向量m1,而其他編碼向量b2~b4亦各可與編碼向量b1’~b4’進行相同運算來取得融合向量m2~m4,因此無另外繪示。另外要特別說明的是,圖5A~圖5C僅示例性地繪示編碼向量之數量,且原始目標文本編碼所包含之編碼向量的數量與知識文本編碼所包含之編碼向量的數量實際上不需一致。
After the above-mentioned
如圖5B所示,編碼融合器14可以依據原始目標文本編碼之編碼向量b1~b4,產生對應的多個查詢向量bq1~bq4,並依據知識文本編碼之編碼向量b1’~b4’,產生對應的多個鍵向量bk1’~bk4’及多個值向量bv1’~bv4’。進一步來說,表示查詢向量bq1~bq4、鍵向量bk1’~bk4’及值向量bv1’~bv4’的數學式可以表示如下:bqi=Wbqbi As shown in FIG. 5B , the
bki’=Wbkbi’ bk i '=W bk b i '
bvi’=Wbvbi’ bv i '=W bv b i '
其中,Wbq、Wbk及Wbv為隨機給定之權重矩陣,可藉由分析機器閱讀理解系統1多次的執行表現來決定最佳值,進一步的最佳化流程將於後描述。
Among them, W bq , W bk and W bv are randomly given weight matrices, and the optimal values can be determined by analyzing the execution performance of the machine
接著,編碼融合器14將查詢向量bq1分別與鍵向量bk1’~bk4’執行內積運算,以得到多個初始權重β1,1' ~β1,4' 。或者,在進行內積運算之後可以更除以查詢向量bq1及鍵向量bk1’~bk4’所屬之維度,以得多個初始權重β1,1' ~β1,4' ,以數學式表示可以為:
其中d為查詢向量bq1及鍵向量bk1’~bk4’所屬之維度。上述運算可視為原始目標文本編碼的編碼向量b1分別與知識文本編碼的編碼向量b1’~b4’之間的相似度判斷。特別來說,編碼融合器14亦可以其他具有相似度判斷功能之函數來實施原始目標文本編碼與知識文本編碼之間的相似度判斷。
where d is the dimension to which the query vector bq 1 and the key vector bk 1 '~bk 4 ' belong. The above operations can be regarded as the similarity judgment between the encoding vector b 1 encoded by the original target text and the encoding vector b 1 ' ~b 4 ' encoded by the knowledge text respectively. In particular, the
編碼融合器14再對初始權重β1,1' ~β1,4' 執行歸一化運算,以得到多個歸一化權重~。其中,所述歸一化運算可以利用Softmax函式來執行,以Softmax函式運算而得之歸一化權重~以數學式可以表示如下,然本發明之歸一化運算亦可藉由其他使權重之總和為1之函數來執行,不以下列數學式為限:
接著如圖5C所示,編碼融合器14將歸一化權重~與值向量bv1’~bv4’進行加權和運算,以得到加權和向量c1,以數學式表示可以為:
編碼融合器14可以再將加權和向量c1與對應的編碼向量b1相加,並將相加結果作為融合向量m1。或者,編碼融合器14可以將加權和向量c1與對應的編碼向量b1作串接,並將串接的結果作為融合向量m1(變成兩倍維度,假設原
本加權和向量c1與編碼向量b1各為d維,則串接兩者而產生之融合向量m1為2d維)。融合向量m2~m4亦由編碼融合器14以上述運算方式產生。編碼融合器14可以組合融合向量m1~m4以形成矩陣,並將此矩陣作為強化目標文本編碼。
The
經上述編碼融合器14執行融合運算以將知識導入目標文本編碼而產生強化目標文本編碼之後,答案擷取器15可以基於強化目標文本編碼取得對應於問題文本的答案,並透過輸入輸出介面11輸出此答案,即圖2所示之步驟S6及步驟S7。進一步來說,答案擷取器15可以從強化目標文本編碼中擷取出對應於問題文本的答案。請參考圖1、圖6A及圖6B,其中圖6A及圖6B分別係依據本發明兩實施例所繪示的機器閱讀理解方法中之答案擷取作業的流程圖。
After the above-mentioned
如圖6A所示,答案擷取器15所執行之答案擷取作業可以包含步驟S61:將強化目標文本編碼中對應於文章文本的部分編碼與起始分類向量執行矩陣運算及歸一化運算,以取得多個起始機率;步驟S62:將所述部分編碼與結束分類向量執行矩陣運算及歸一化運算,以取得多個結束機率;步驟S63:依據起始機率中具有最大數值者,決定答案於所述部分編碼中的起始位置;以及步驟S64:依據結束機率中具有最大數值者,決定答案於所述部分編碼中的結束位置。
As shown in FIG. 6A , the answer extraction operation performed by the
於步驟S61及步驟S62中,答案擷取器15將強化目標文本編碼中對應於文章文本的部分編碼分別與起始分類向量及結束分類向量進行矩陣運算(特別是內積運算)及歸一化運算而取得多個起始機率及多個結束機率。進一步來說,所述部分編碼係由編碼融合器14所取得之多個融合向量中對應於屬於文章文本之初始向量的融合向量所組成的向量矩陣。更進一步來說,各融合向量所對應之問題文本及文章文本,在問題文本及文章文本輸入時便具有指示符(例如
是0/1遮罩)以表示其位置是屬於文章或是問題。步驟S61所述之運算可以表示如下列數學式:
其中,表示起始機率向量中的第i個起始機率,其中起始機率向量包含多個起始機率且每一起始機率表示所述部分編碼中所對應之融合向量作為答案起始位置的機率,S表示起始分類向量,T i 則表示所述部分編碼中的第i個融合向量。同理,步驟S62可以上列數學式表示,其中替換為以表示結束機率向量中的第i個結束機率,其包含多個結束機率且每一結束機率表示所述部分編碼中所對應之融合向量作為答案結束位置的機率,且S替換為E以表示結束分類向量。起始分類向量及結束分類向量為隨機給定之向量,可藉由分析機器閱讀理解系統1多次的執行表現來決定最佳向量,進一步的最佳化流程將於後描述。
in, represents the ith starting probability in the starting probability vector, wherein the starting probability vector includes multiple starting probabilities and each starting probability represents the probability that the corresponding fusion vector in the partial code is used as the starting position of the answer, S represents the initial classification vector, and T i represents the ith fusion vector in the partial encoding. Similarly, step S62 can be represented by the above mathematical formula, where replace with Denote the ith ending probability in the ending probability vector, which includes multiple ending probabilities and each ending probability represents the probability that the fusion vector corresponding to the partial encoding is used as the ending position of the answer, and S is replaced by E to indicate the ending categorical vector. The start classification vector and the end classification vector are randomly given vectors, and the optimal vector can be determined by analyzing the execution performance of the machine
於步驟S63及S64中,答案擷取器15可以決定起始機率中具有最大數值者所對應的融合向量為答案的起始位置(即起始索引),且決定結束機率中具有最大數值者所對應的融合向量為答案的結束位置(即結束索引)。舉例來說,若起始機率向量中的多個起始機率之數值依序為0.02、0.90、0.05、0.01及0.02,則答案擷取器15決定答案的起始位置對應於目標文本編碼中對應於文章文本的第二個融合向量。答案的結束位置之決定同理於起始位置之決定,於此不另舉例說明。
In steps S63 and S64, the
於此要特別說明的是,步驟S63執行於步驟S61之後,步驟S64執行於步驟S62之後,然而,本發明不限制步驟S61與步驟S62之間的先後關係,不 限制步驟S61與步驟S64之間的先後關係,不限制步驟S62與步驟S63之間的先後關係,亦不限制步驟S63與步驟S64之間的先後關係。 It should be noted here that step S63 is executed after step S61, and step S64 is executed after step S62. However, the present invention does not limit the sequence relationship between step S61 and step S62. The sequence relationship between step S61 and step S64 is limited, the sequence relationship between step S62 and step S63 is not limited, and the sequence relationship between step S63 and step S64 is not limited.
答案擷取器15亦可以另一實施方式執行答案擷取作業。如圖6B所示,答案擷取作業可以包含步驟S61’:將強化目標文本編碼中對應於文章文本的部分編碼與起始分類向量執行矩陣運算及歸一化運算,以取得多個起始機率;步驟S62’:將所述部分編碼與結束分類向量執行矩陣運算及歸一化運算,以取得多個結束機率;步驟S63’:依照起始機率之數值由大至小的順序,選擇前數個以作為多個候選起始機率;步驟S64’:依照結束機率之數值由大至小的順序,選擇前數個以作為多個候選結束機率;步驟S65’:將候選起始機率與候選結束機率進行配對以產生多個候選配對,其中每一候選配對中的候選起始機率所對應的位置先於候選結束機率所對應的位置;步驟S66’:計算每一候選配對中的候選起始機率與候選結束機率之和值或乘積;步驟S67’:依據候選配對中具有最大和值者或最大乘積者中的候選起始機率及候選結束機率,決定該答案於部分編碼中的起始位置及結束位置。
The
步驟S61’及S62’的進一步執行內容分別同於圖6A之步驟S61及S62,於此不再贅述。於步驟S63’及步驟S64’中,答案擷取器15先分別選擇前幾大的起始機率作為候選起始機率並選擇前幾大的結束機率作為候選結束機率,其中所選擇之候選起始/結束機率的數量例如為5,但不限於此。於步驟S65’中,答案擷取器15可以將候選起始機率與候選結束機率兩兩配對,並篩除掉候選起始機率所對應的位置位於候選結束機率所對應的位置之後的配對,以產生多個候選配對。換句話說,每一候選配對中的候選起始機率所對應的位置皆會先於候選結束機率所對應的位置。於步驟S66’及步驟S67’中,答案擷取器15將每一候選
配對中的候選起始機率之值與候選結束機率之值相加或相乘,並決定具有最大和值或乘積的候選配對中的候選起始機率所對應的融合向量為答案的起始位置,且此後選配對中的候選結束機率所對應的融合向量為答案的結束位置。
The further execution contents of steps S61' and S62' are respectively the same as those of steps S61 and S62 in FIG. 6A , and will not be repeated here. In step S63 ′ and step S64 ′, the
藉由圖6B所示之答案擷取作業的實施方式,答案擷取器15可以避免起始位置大於結束位置(即起始位置在結束位置之後)的狀況,進而提升答案的精準度。於此要特別說明的是,步驟S63’執行於步驟S61’之後且步驟S64’執行於步驟S62’之後,然而,本發明不限制步驟S61’與步驟S62’之間的先後關係,不限制步驟S61’與步驟S64’之間的先後關係,不限制步驟S62’與步驟S63’之間的先後關係,亦不限制步驟S63’與步驟S64’之間的先後關係。
With the implementation of the answer extraction operation shown in FIG. 6B , the
另外,如前所述,語意編碼器13所執行之編碼作業的使用參數(例如權重矩陣Waq、Wak及Wav)、編碼融合器14所執行之融合運算的使用參數(權重矩陣Wbq、Wbk及Wbv)及答案擷取器15所執行之答案擷取作業的使用參數(起始分類向量及結束分類向量)可以藉由最佳化使用參數的流程以設定為最佳化之使用參數。特別來說,圖2所示之機器閱讀理解方法的步驟S2~S6可以為已經訓練過程訓練之機器閱讀理解系統1執行答案預測的過程,也可以為機器閱讀理解系統1的訓練過程中的一環,其中,所述訓練過程包含最佳化使用參數的流程。
In addition, as mentioned above, the usage parameters of the encoding operation performed by the semantic encoder 13 (such as the weight matrices W aq , W ak and W av ), the usage parameters of the fusion operation performed by the encoding fuser 14 (the weight matrix W bq , W bk and W bv ) and the use parameters (start classification vector and end classification vector) of the answer extraction operation performed by the
請參考圖1、圖2及圖7,圖7係依據本發明一實施例所繪示的機器閱讀理解方法中之最佳化使用參數的流程圖。如圖7所示,最佳化使用參數的流程可以包含步驟S8:對多個第一訓練資料執行第一編碼作業、第二編碼作業、融合運算及答案擷取作業,以產生多個第一訓練答案,且依據所述多個第一訓練答案及損失函數計算第一損失值;步驟S9:依據第一損失值對第一編碼作業、第二 編碼作業、融合運算及答案擷取作業之多個使用參數中的一或多者執行調整;步驟S10:於調整後,對多個第二訓練資料執行第一編碼作業、第二編碼作業、融合運算及答案擷取作業,以產生多個第二訓練答案,且依據所述多個第二訓練答案及損失函數計算第二損失值;步驟S11:依據第二損失值對第一編碼作業、第二編碼作業、融合運算及答案擷取作業之多個使用參數中的一或多者執行調整。其中,每個第一/第二訓練資料包含一問題文本及一文章文本,第一編碼作業包含前述實施例中的編碼問題文本及文章文本以產生原始目標文本編碼之步驟,第二編碼作業包含依據知識集產生第一知識文本及第二知識文本之步驟以及編碼第一知識文本及第二知識文本以產生知識文本編碼之步驟。也就是說,圖7之步驟S8可以包含對多個第一訓練資料中的每一個執行圖2之步驟S2~S6,圖7之步驟S10可以包含對多個第二訓練資料的每一個執行圖2之步驟S2~S6。 Please refer to FIG. 1 , FIG. 2 , and FIG. 7 . FIG. 7 is a flowchart of optimizing parameters used in a machine reading comprehension method according to an embodiment of the present invention. As shown in FIG. 7 , the process of optimizing the use parameters may include step S8 : performing a first encoding operation, a second encoding operation, a fusion operation and an answer retrieval operation on a plurality of first training data to generate a plurality of first training data training answers, and calculates a first loss value according to the plurality of first training answers and the loss function; step S9: according to the first loss value, the first coding operation, the second One or more of the multiple usage parameters of the encoding operation, the fusion operation and the answer retrieval operation are adjusted; Step S10 : after the adjustment, perform the first encoding operation, the second encoding operation, the fusion operation on the plurality of second training data Operations and answer retrieval operations are performed to generate a plurality of second training answers, and a second loss value is calculated according to the plurality of second training answers and the loss function; Step S11 : According to the second loss value, the first encoding operation, the second Adjustment is performed on one or more of the multiple usage parameters of the two encoding operations, fusion operations, and answer retrieval operations. Wherein, each of the first/second training data includes a question text and an article text, the first encoding operation includes the step of encoding the question text and article text in the foregoing embodiment to generate the original target text encoding, and the second encoding operation includes The step of generating the first knowledge text and the second knowledge text according to the knowledge set and the step of encoding the first knowledge text and the second knowledge text to generate the knowledge text encoding. That is, step S8 of FIG. 7 may include performing steps S2 to S6 of FIG. 2 for each of the plurality of first training materials, and step S10 of FIG. 7 may include performing the steps S2 to S6 of FIG. 2 for each of the plurality of second training materials. Steps S2 to S6 of 2.
步驟S8~S11可以由設置於機器閱讀理解系統1之外部或內部的處理裝置來執行。此處理裝置包含中央處理器(Central Processing Unit,CPU)、微控制器、可程式化邏輯控制器(Programmable Logic Controller,PLC)或其他處理器,連接於語意編碼器13、編碼融合器14及答案擷取器15,可以控制這些裝置以當前使用參數對多個第一訓練資料執行運作以產生多個第一訓練答案,依據所述多個第一訓練答案及損失函數產生第一損失值,且依據第一損失值對這些裝置之多個使用參數中的一或多者進行調整,控制這些裝置在使用參數調整後對多個第二訓練資料再次執行運作以產生多個第二訓練答案,並依據第二訓練答案及損失函數計算第二損失值,再依據第二損失值對所述多個使用參
數中的一或多者進行調整,其中用以計算第一/第二損失值的損失函數可以表示如下列數學式:
其中,為表示正解起始位置之向量,表示答案擷取器15所計算而得之起始機率向量,為表示正解結束位置之向量,表示答案擷取器15所計算而得之結束機率向量,N則表示用以產生訓練答案的訓練資料的數量。
in, is the vector representing the starting position of the positive solution, represents the initial probability vector calculated by the
在步驟S11後,處理裝置可以再對其他多個訓練資料執行步驟S10以計算出損失值,並以此損失值再次執行步驟S11,反覆執行多次。也就是說,處理裝置可以執行多次訓練,每次訓練計算而得之損失值可以作為下一次訓練前之使用參數調整的依據。進一步來說,處理裝置可以一批量(batch size)的訓練資料(第一訓練資料)及當前的使用參數來計算答案(第一訓練答案),並依據答案計算損失值(第一損失值),接著依據此損失值調整使用參數,再以另一批量的訓練資料(第二訓練資料)及調整後的使用參數來計算答案(第二訓練答案)及對應的損失值(第二損失值),接著再以此損失值調整使用參數,以又一批量的訓練資料及調整後的使用參數來計算答案及對應的損失值,以此類推。舉例來說,假設訓練資料的總數為2560,每批32筆,則上述使用參數之調整及調整後的答案及損失值計算需執行80次才訓練完一輪(one epoch)。在訓練完一輪後,處理裝置更可以將所有訓練資料進行洗牌(shuffle),再執行下一輪訓練。特別來說,要執行幾輪為超參數之設定,其數值可以藉由將訓練資料集中的部分資料保 留下來作為驗證集(Validation set)並針對此驗證集的表現(例如損失值、EM或F1分數)來選擇。 After step S11 , the processing apparatus may perform step S10 on other multiple training data to calculate the loss value, and perform step S11 again with the loss value, and perform the steps repeatedly. That is to say, the processing device can perform multiple training sessions, and the loss value calculated for each training session can be used as a basis for adjusting the used parameters before the next training session. Further, the processing device can calculate the answer (the first training answer) with a batch size of training data (the first training data) and the current usage parameters, and calculate the loss value (the first loss value) according to the answer, Then, the usage parameters are adjusted according to the loss value, and another batch of training data (second training data) and the adjusted usage parameters are used to calculate the answer (second training answer) and the corresponding loss value (second loss value), Then, the usage parameters are adjusted with this loss value, and the answer and the corresponding loss value are calculated with another batch of training data and the adjusted usage parameters, and so on. For example, assuming that the total number of training data is 2560, and each batch is 32, the above-mentioned adjustment of the used parameters and the calculation of the adjusted answer and loss value need to be performed 80 times before one epoch of training is completed. After one round of training, the processing device may further shuffle all the training data, and then execute the next round of training. In particular, several rounds of hyperparameter setting are performed, the values of which can be determined by keeping part of the training data set Left as a Validation set and selected for performance on this validation set (e.g. loss value, EM or F1 score).
理論上,隨著訓練次數增加,使用參數會更符合於訓練資料,然而當使用參數太過符合(over fit)訓練資料時,對於新資料(待預測資料)之預測準確度反而可能下降,因此如上述將訓練資料集中的部分資料保留下來作為驗證集,並對驗證集進行預測以取得對應的預測表現,可以據以決定適當的訓練輪數。舉例來說,在一輪訓練完成後,處理裝置可以判斷驗證集的表現相較於上一輪訓練對應之驗證集表現是否更佳(例如損失值更低或EM/F1分數更高)。若本輪對應之驗證集表現相較於上一輪對應之驗證集表現更佳,則繼續下一輪訓練;而若更差或變化不大,便停止訓練。經上述訓練流程後之使用參數便可作為最佳化使用參數。 Theoretically, as the number of training increases, the parameters used will be more in line with the training data. However, when the parameters used are overfit to the training data, the prediction accuracy for new data (data to be predicted) may decrease. Therefore, As mentioned above, part of the data in the training data set is retained as the validation set, and the validation set is predicted to obtain the corresponding prediction performance, which can be used to determine the appropriate number of training rounds. For example, after a round of training is completed, the processing device may determine whether the performance of the validation set is better than that of the validation set corresponding to the previous round of training (eg, a lower loss value or a higher EM/F1 score). If the performance of the validation set corresponding to the current round is better than the performance of the validation set corresponding to the previous round, continue the next round of training; if it is worse or has little change, stop training. The usage parameters after the above training process can be used as optimized usage parameters.
所述用來進行訓練之問題文本及文章文本的來源可以為目標標記資料集,即系統預計訓練來執行預測的資料集,而用以產生知識文本的知識集的來源為目標標記資料集所對應(例如屬於同類型)的知識資料庫。於另一實施例中,在以目標標記資料集訓練之前,機器閱讀理解方法可以先以外部標記資料集及對應(例如屬於同類型)的知識資料庫來進行訓練,即將外部標記資料集作為問題文本及文章文本的來源,且將對應於外部標記資料集之知識資料庫作為知識集來源,以初次決定最佳化使用參數。舉例來說,假設標記資料集包含DRCD、CMRC 2018及CAIL 2019,則當目標資料集為DRCD時,可以先以CMRC 2018及CAIL 2019中的一或兩者作為訓練資料集而初次決定最佳化使用參數,再以DRCD作為訓練資料集而再次決定最佳化使用參數。藉由上述多次最佳化使用 參數的過程,可以避免目標標記資料集之標記不完全而致使訓練結果不理想的問題。 The source of the question text and article text used for training can be the target labeled data set, that is, the data set that the system expects to train to perform prediction, and the source of the knowledge set used to generate the knowledge text is the target labeled data set. (e.g. of the same type) knowledge repositories. In another embodiment, before training with the target labeled data set, the machine reading comprehension method can be trained with an external labeled data set and a corresponding (for example, of the same type) knowledge database, that is, the external labeled data set is used as a question. The source of the text and article text, and the knowledge database corresponding to the external tag data set is used as the knowledge set source to initially determine the optimal use parameters. For example, assuming that the labeled data set includes DRCD, CMRC 2018 and CAIL 2019, when the target data set is DRCD, one or both of CMRC 2018 and CAIL 2019 can be used as the training data set for the initial determination of optimization Use parameters, and then use DRCD as a training data set to determine the optimal use parameters again. Using the above multiple optimizations The parameter process can avoid the problem that the labeling of the target labeling data set is incomplete and the training result is not ideal.
請參考圖8A及8B,圖8A及8B分別係現有機器閱讀理解方法及系統(multi-Bert)與本發明上述之機器閱讀理解方法及系統以兩種資料集進行訓練而得的實驗數據比較圖。於圖8A的實驗中,本案之機器閱讀理解方法及系統與現有機器閱讀理解方法及系統以法律領域之CAIL 2019資料集作為訓練資料來源,且本案之機器閱讀理解方法及系統更以OpenBase知識庫(非結構化知識)及HowNet知識庫(結構化知識)作為知識集來源。於圖8B的實驗中,本案之機器閱讀理解方法及系統與現有機器閱讀理解方法及系統以百科領域之DRCD資料集作為訓練資料來源,且本案之機器閱讀理解方法及系統更以HowNet知識庫作為知識集來源。 Please refer to FIGS. 8A and 8B . FIGS. 8A and 8B are comparison diagrams of experimental data obtained by training the existing machine reading comprehension method and system (multi-Bert) and the above-mentioned machine reading comprehension method and system of the present invention with two data sets, respectively. . In the experiment in Figure 8A, the machine reading comprehension method and system in this case and the existing machine reading comprehension method and system use the CAIL 2019 data set in the legal field as the training data source, and the machine reading comprehension method and system in this case use the OpenBase knowledge base. (unstructured knowledge) and HowNet knowledge base (structured knowledge) as knowledge set sources. In the experiment of Fig. 8B, the machine reading comprehension method and system of this case and the existing machine reading comprehension method and system use the DRCD data set in the field of encyclopedia as the training data source, and the machine reading comprehension method and system of this case use the HowNet knowledge base as the source of training data. Knowledge set sources.
圖8A及圖8B所示之實驗數據EM(Exact Match)表示預測答案與標準答案一致的比率(單位:%),F1則是計算預測答案與標準答案單詞化後的準確分數。進一步來說,F1可以下列數學式表示:
其中,precision表示預測答案有多大比例的單詞在標準答案中出現,recall則表示標準答案中有多大比例的單詞在預測答案中出現。 Among them, precision indicates what proportion of the words in the predicted answer appear in the standard answer, and recall indicates what proportion of the words in the standard answer appear in the predicted answer.
如圖8A及圖8B所示,本案的機器閱讀理解方法及系統相較於現有機器閱讀理解方法及系統皆具有較高的EM及F1,即具有較高的答案預測精準度。本案的機器閱讀理解方法及系統在訓練資料量少時具有相當大的效能,表示在系統訓練初期能夠輔助標記人員加快資料標記的進行,甚至在僅1k之訓練資 料的狀態下,可達到EM之值可達人為判斷水準80%,即可替代人為作業且維持相當的精準度。另外,F1分數亦可接近人類水平(F1分數:92)。 As shown in FIG. 8A and FIG. 8B , the machine reading comprehension method and system of the present application have higher EM and F1 than existing machine reading comprehension methods and systems, that is, higher answer prediction accuracy. The machine reading comprehension method and system in this case have considerable performance when the amount of training data is small, which means that in the early stage of system training, it can assist the labeler to speed up the data labeling, even in the case of only 1k training resources. In the state of raw materials, the value of EM can reach 80% of the human judgment level, which can replace manual work and maintain a considerable accuracy. In addition, the F1 score can also be close to human level (F1 score: 92).
藉由上述架構,本案所揭示的機器閱讀理解方法及系統,可以執行特殊的編碼運算及融合運算,以在分析問題及文章的過程中導入外部知識,藉此避免文章內容精簡而難以從中取得正確答案的問題,進而提升預測答案之精準度。 With the above structure, the machine reading comprehension method and system disclosed in this case can perform special coding operations and fusion operations to import external knowledge in the process of analyzing problems and articles, thereby avoiding the conciseness of article content and making it difficult to obtain correctness from it. answer questions, thereby improving the accuracy of the predicted answer.
雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明。在不脫離本發明之精神和範圍內,所為之更動與潤飾,均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。 Although the present invention is disclosed in the foregoing embodiments, it is not intended to limit the present invention. Changes and modifications made without departing from the spirit and scope of the present invention belong to the scope of patent protection of the present invention. For the protection scope defined by the present invention, please refer to the attached patent application scope.
S1~S7:步驟 S1~S7: Steps
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109145608A TWI762103B (en) | 2020-12-23 | 2020-12-23 | Method and system for machine reading comprehension |
CN202011642613.7A CN114741484A (en) | 2020-12-23 | 2020-12-30 | Machine reading understanding method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109145608A TWI762103B (en) | 2020-12-23 | 2020-12-23 | Method and system for machine reading comprehension |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI762103B true TWI762103B (en) | 2022-04-21 |
TW202226000A TW202226000A (en) | 2022-07-01 |
Family
ID=82198927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109145608A TWI762103B (en) | 2020-12-23 | 2020-12-23 | Method and system for machine reading comprehension |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114741484A (en) |
TW (1) | TWI762103B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959396A (en) * | 2018-06-04 | 2018-12-07 | 众安信息技术服务有限公司 | Machine reading model training method and device, answering method and device |
-
2020
- 2020-12-23 TW TW109145608A patent/TWI762103B/en active
- 2020-12-30 CN CN202011642613.7A patent/CN114741484A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959396A (en) * | 2018-06-04 | 2018-12-07 | 众安信息技术服务有限公司 | Machine reading model training method and device, answering method and device |
Non-Patent Citations (1)
Title |
---|
期刊韕Sun, Haitian, et al. 韕 韕"Open domain question answering using early fusion of knowledge bases and text." arXiv preprint arXiv:韕1809.00782 (2018).韕arXiv preprint arXiv韕2018韕p1-3韕 * |
Also Published As
Publication number | Publication date |
---|---|
TW202226000A (en) | 2022-07-01 |
CN114741484A (en) | 2022-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022198868A1 (en) | Open entity relationship extraction method, apparatus and device, and storage medium | |
US11900064B2 (en) | Neural network-based semantic information retrieval | |
JP5989707B2 (en) | System and method for integrating semantic concept definitions and semantic concept relationships utilizing existing domain definitions | |
US11321671B2 (en) | Job skill taxonomy | |
CN112131366A (en) | Method, device and storage medium for training text classification model and text classification | |
WO2020258487A1 (en) | Method and apparatus for sorting question-answer relationships, and computer device and storage medium | |
WO2021218015A1 (en) | Method and device for generating similar text | |
CN110781306B (en) | English text aspect layer emotion classification method and system | |
WO2022088671A1 (en) | Automated question answering method and apparatus, device, and storage medium | |
US11183175B2 (en) | Systems and methods implementing data query language and utterance corpus implements for handling slot-filling and dialogue intent classification data in a machine learning task-oriented dialogue system | |
CN111310439A (en) | Intelligent semantic matching method and device based on depth feature dimension-changing mechanism | |
JP7303195B2 (en) | Facilitate subject area and client-specific application program interface recommendations | |
CN116842126B (en) | Method, medium and system for realizing accurate output of knowledge base by using LLM | |
US9336195B2 (en) | Method and system for dictionary noise removal | |
US20220198149A1 (en) | Method and system for machine reading comprehension | |
TWI762103B (en) | Method and system for machine reading comprehension | |
Liu et al. | Scmhl5 at trac-2 shared task on aggression identification: Bert based ensemble learning approach | |
KR102330190B1 (en) | Apparatus and method for embedding multi-vector document using semantic decomposition of complex documents | |
Zhang et al. | Na-aware machine reading comprehension for document-level relation extraction | |
Jiang et al. | A study on the application of sentiment-support words on aspect-based sentiment analysis | |
Bartoli et al. | Evolutionary learning of syntax patterns for genic interaction extraction | |
CN113515935A (en) | Title generation method, device, terminal and medium | |
KR102535417B1 (en) | Learning device, learning method, device and method for important document file discrimination | |
Li et al. | A Chinese NER Method Based on Chinese Characters' Multiple Information | |
CN116562305B (en) | Aspect emotion four-tuple prediction method and system |