TWM607472U

TWM607472U - Text section labeling system

Info

Publication number: TWM607472U
Application number: TW109212199U
Authority: TW
Inventors: 趙式隆; 林奕辰; 沈昇勳; 林子雋; 黃世丞; 劉穎立
Original assignee: 洽吧智能股份有限公司
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2021-02-11

Abstract

一種文字區段標籤系統，是連接到輸入裝置，輸入裝置接受待識別文件，待識別文件包括多個文字影像，文字區段標籤系統包括文字影像識別模組、語言處理模組、文字區段關係分析模組、信心轉換模組、標籤庫、及標籤輸出模組。其中，文字影像識別模組連接到輸入裝置以接受該待識別文件，文字影像識別模組辨識出該待識別文件中至少文字區段，文字區段包括上述文字影像，且語言處理模組將文字區段中的文字影像轉換為可編輯文字。藉由文字區段標籤系統，可賦予每一文字區段所對應的標籤。 A text section labeling system is connected to an input device. The input device accepts a document to be recognized. The document to be recognized includes a plurality of text images. The text section labeling system includes a text image recognition module, a language processing module, and a text section relationship Analysis module, confidence conversion module, label library, and label output module. Wherein, the text image recognition module is connected to the input device to receive the document to be recognized, the text image recognition module recognizes at least a text segment in the document to be recognized, the text segment includes the text image, and the language processing module converts the text The text image in the section is converted to editable text. With the text section label system, a label corresponding to each text section can be assigned.

Description

文字區段標籤系統 Text section labeling system

本新型是指一種標籤系統，特別是指一種文字區段標籤系統。 This model refers to a labeling system, especially a text segment labeling system.

目前，為了有效提高紙本診斷書或相關單據輸入時的效率，在輸入該診斷書或該相關單據的作業過程中會使用OCR(Optical Character Recognition，光學字元識別)技術，以將該診斷書或該相關單據中的文字影像自動轉換為可編輯文字。然而，在轉換成可編輯文字後，仍需要人工將這些可編輯文字輸入到資料庫的相應欄位中。舉例來說，紙本診斷書上的“醫療財團法人XX紀念醫院”在轉換成可編輯字元後，仍須人工將其輸入到資料庫的“醫院名稱”這個欄位中。這樣一來，還是會有一定的人工成本且更增加錯誤的機會。 At present, in order to effectively improve the efficiency of inputting a paper medical certificate or related documents, OCR (Optical Character Recognition) technology is used to input the medical certificate or related documents. Or the text image in the related bill is automatically converted into editable text. However, after being converted into editable text, it is still necessary to manually input the editable text into the corresponding field of the database. For example, after the "Medical Consortium XX Memorial Hospital" on the paper medical certificate is converted into editable characters, it must be manually entered into the "Hospital Name" field of the database. In this way, there will still be a certain labor cost and increase the chance of error.

因此，如何將OCR轉換而成的可編輯字元自動填入到資料庫的相應欄位中，便是值得本領域具有通常知識者去思量地。 Therefore, how to automatically fill in the editable characters converted from OCR into the corresponding fields of the database is worth considering by those with ordinary knowledge in the field.

本新型之目的在於提供一文字區段標籤系統，本新型之文字區段標籤系統能將OCR轉換而成的可編輯字元自動填入到資料庫的相應欄位中。 The purpose of the present invention is to provide a text segment labeling system. The text segment labeling system of the present invention can automatically fill in the editable characters converted from OCR into the corresponding fields of the database.

本新型之文字區段標籤系統是連接到一輸入裝置，輸入裝置接受一待識別文件，待識別文件包括多個文字影像，文字區段標籤系統包括一文字影像識別模組、一語言處理模組、一文字區段關係分析模組、一信心轉換模組、一標籤庫、及一標籤輸出模組。其中，文字影像識別模組連接到輸入裝置以接受該待識別文件，文字影像識別模組辨識出該待識別文件中至少一文字區段，文字區段包括至少一上述文字影像，且語言處理模組將文字區段中的文字影像轉換為一可編輯文字。此外，語言處理模組與文字影像識別模組相連接，語言處理模組衡量該文字區段與待識別文件間的至少一第一關聯資訊，並將可編輯文字與第一關聯資訊轉為一第一特徵矩陣。另外，文字區段關係分析模組與該語言處理模組相連接，文字區段關係分析模組衡量各個文字區段與其他文字區段的一第二關聯資訊，藉由第二關聯資訊將第一特徵矩陣轉換為一第二特徵矩陣。此外，信心轉換模組與文字區段關係分析模組相連接，信心轉換模組將第二特徵矩陣轉換為代表著信心水準一第三特徵矩陣。標籤庫是儲存有多個標籤。標籤輸出模組與信心轉換模組及該標籤庫相連接，標籤輸出模組將第三特徵矩陣轉換為一一維矩陣，一維矩陣的每一元素代表每一文字區段所對應的一標籤代碼，且標籤輸出模組是根據該標籤代碼於一標籤庫尋找所對應的一標籤，並賦予每一文字區段所對應的標籤。 The text segment labeling system of the present invention is connected to an input device. The input device accepts a document to be recognized. The document to be recognized includes multiple text images. The text segment labeling system includes a text image recognition module, a language processing module, A text segment relationship analysis module, a confidence conversion module, a label library, and a label output module. Wherein, the text image recognition module is connected to the input device to receive the document to be recognized, the text image recognition module recognizes at least one text segment in the document to be recognized, the text segment includes at least one text image as described above, and the language processing module Convert the text image in the text section into a Edit text. In addition, the language processing module is connected to the text image recognition module. The language processing module measures at least one first associated information between the text segment and the document to be recognized, and converts the editable text and the first associated information into one The first feature matrix. In addition, the text segment relationship analysis module is connected to the language processing module, and the text segment relationship analysis module measures the second correlation information between each text segment and other text segments, and uses the second correlation information to compare the second correlation information of each text segment with other text segments. A feature matrix is converted into a second feature matrix. In addition, the confidence conversion module is connected to the text segment relationship analysis module, and the confidence conversion module converts the second feature matrix into a third feature matrix representing the confidence level. The tag library stores multiple tags. The label output module is connected with the confidence conversion module and the label library. The label output module converts the third feature matrix into a one-dimensional matrix. Each element of the one-dimensional matrix represents a label code corresponding to each text segment. , And the label output module searches for a label corresponding to a label library according to the label code, and assigns a label corresponding to each text segment.

如上述之文字區段標籤系統，其中第一關聯資訊包括以下資訊的至少其中之一：文字區段於該文件中所佔的面積比例；文字區段的長寬比；或文字區段的位置。 Such as the text section labeling system described above, wherein the first associated information includes at least one of the following information: the proportion of the area occupied by the text section in the document; the aspect ratio of the text section; or the position of the text section .

如上述之文字區段標籤系統，其中文字影像識別模組、語言處理模組、該文字區段關係分析模組、信心轉換模組、與標籤輸出模組皆包括至少一神經網路模型。 As in the above-mentioned text segment labeling system, the text image recognition module, the language processing module, the text segment relationship analysis module, the confidence conversion module, and the label output module all include at least one neural network model.

如上述之文字區段標籤系統，其中文字區段關係分析模組是藉由一圖像神經網路模型衡量各個文字區段與其他文字區段的第二關聯資訊。 As in the above-mentioned text segment labeling system, the text segment relationship analysis module uses an image neural network model to measure the second association information between each text segment and other text segments.

為讓本之上述特徵和優點能更明顯易懂，下文特舉較佳實施例，並配合所附圖式，作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, the following is a detailed description of preferred embodiments in conjunction with the accompanying drawings.

S210~S290:流程圖符號 S210~S290: flow chart symbols

10:輸入裝置 10: Input device

20:資料庫 20: Database

100:文字區段標籤系統 100: Text section labeling system

102:伺服端 102: server

110:文字影像識別模組 110: text image recognition module

120:語言處理模組 120: Language Processing Module

130:文字區段關係分析模組 130: Text segment relationship analysis module

140:信心轉換模組 140: Confidence Conversion Module

150:標籤庫 150: Tag Library

160:標籤輸出模組 160: Label output module

80:待識別文件 80: File to be recognized

81:文字區段 81: Text section

下文將根據附圖來描述各種實施例，所述附圖是用來說明而不是用以任何方式來限制範圍，其中相似的標號表示相似的元件，並且其中：圖1所繪示為本新型之文字區段標籤系統的實施例。 Hereinafter, various embodiments will be described based on the accompanying drawings, which are used for illustration rather than limiting the scope in any way, in which similar reference numerals indicate similar elements, and in which: Figure 1 shows an embodiment of the new text segment labeling system.

圖2A至圖2D所繪示為待識別文件與其隨處理過程所呈現之變化的其中一實施例。 FIG. 2A to FIG. 2D illustrate one embodiment of the document to be recognized and its changes during the processing.

圖3所繪示為本新型之文字區段標籤方法的實施例。 Figure 3 shows an embodiment of the new text segment labeling method.

圖4A所繪示為第一特徵矩陣的示意圖。 FIG. 4A shows a schematic diagram of the first feature matrix.

圖4B所繪示為第二特徵矩陣的示意圖。 FIG. 4B shows a schematic diagram of the second feature matrix.

圖4C所繪示為第三特徵矩陣的示意圖。 FIG. 4C shows a schematic diagram of the third feature matrix.

圖4D所繪示為一維矩陣的示意圖。 FIG. 4D shows a schematic diagram of a one-dimensional matrix.

參照本文闡述的詳細內容和附圖說明是最好理解本創作。下面參照附圖會討論各種實施例。然而，本領域技術人員將容易理解，這裡關於附圖給出的詳細描述僅僅是為了解釋的目的，因為這些方法和系統可超出所描述的實施例。例如，所給出的教導和特定應用的需求可能產生多種可選的和合適的方法來實現在此描述的任何細節的功能。因此，任何方法可延伸超出所描述和示出的以下實施例中的特定實施選擇範圍。 It is best to understand this creation by referring to the detailed content and the accompanying drawings explained in this article. Various embodiments will be discussed below with reference to the drawings. However, those skilled in the art will easily understand that the detailed description given here with respect to the drawings is only for the purpose of explanation, because these methods and systems may go beyond the described embodiments. For example, the given teachings and the requirements of specific applications may produce a variety of alternative and suitable methods to implement any detailed functions described herein. Therefore, any method can extend beyond the specific implementation options described and illustrated in the following embodiments.

在說明書及後續的申請專利範圍當中使用了某些詞彙來指稱特定的元件。所屬領域中具有通常知識者應可理解，不同的廠商可能會用不同的名詞來稱呼同樣的元件。本說明書及後續的申請專利範圍並不以名稱的差異來作為區分元件的方式，而是以元件在功能上的差異來作為區分的準則。在通篇說明書及後續的請求項當中所提及的「包含」或「包括」係為一開放式的用語，故應解釋成「包含但不限定於」。另外，「耦接」或「連接」一詞在此係包含任何直接及間接的電性連接手段。因此，若文中描述一第一裝置耦接於一第二裝置，則代表該第一裝置可直接電性連接於該第二裝置，或透過其他裝置或連接手段間接地電性連接至該第二裝置。 In the specification and subsequent patent applications, certain words are used to refer to specific elements. Those with ordinary knowledge in the field should understand that different manufacturers may use different terms to refer to the same components. The scope of this specification and subsequent patent applications does not use differences in names as a way of distinguishing elements, but uses differences in functions of elements as a criterion for distinguishing. The "include" or "include" mentioned in the entire specification and subsequent request items is an open term, so it should be interpreted as "includes but is not limited to". In addition, the term "coupled" or "connected" herein includes any direct and indirect electrical connection means. Therefore, if it is described that a first device is coupled to a second device, it means that the first device can be directly electrically connected to the second device, or indirectly electrically connected to the second device through other devices or connection means. Device.

請參閱圖1，圖1所繪示為本新型之文字區段標籤系統的實施例。文字區段標籤系統100包括一文字影像識別模組110、一語言處理模組120、一文字區段關係分析模組130、一信心轉換模組140、一標籤庫150、與一標籤輸出模組160，其中文字區段標籤系統100還包括一輸入裝置10，此輸入裝置10例如為一掃描裝置、一數位相機、或一具有拍照功能的智慧型手機。藉由此輸入裝置10，可將一待識別文件(如圖2A)匯入到文字區段標籤系統100中。在本實施例中，文字影像識別模組110、語言處理模組120、文字區段關係分析模組130、信心轉換模組140、標籤庫150、與標籤輸出模組160是設置於伺服端102，該伺服端102例如是由一台或多台伺服器所組成。 Please refer to FIG. 1. FIG. 1 shows an embodiment of the new text segment labeling system. The text segment labeling system 100 includes a text image recognition module 110, a language processing module 120, a text segment relationship analysis module 130, a confidence conversion module 140, a tag library 150, and a tag output module 160, The text segment labeling system 100 further includes an input device 10, and the input device 10 is, for example, a scanning device, a digital camera, or a smart phone with a camera function. With this input device 10, a document to be recognized (as shown in FIG. 2A) can be imported into the text segment labeling system 100. In this embodiment, the text image recognition module 110, the language processing module 120, the text segment relationship analysis module 130, the confidence conversion module 140, the tag library 150, and the tag output module 160 are provided on the server 102. The server 102 is composed of one or more servers, for example.

另外，也請參照圖2A，圖2A所繪示為待識別文件的其中一實施例，在本實施例中待識別文件為醫療費用收據。從圖2A可知，此待識別文件80包括多個文字，而當待識別文件80的影像被輸入裝置10捕捉後，待識別文件80上的文字當然也是以影像的方式存在的，也就是說由輸入裝置10匯入到文字區段標籤系統100的待識別文件上的文字是無法編輯的，以下將這些文字稱為文字影像。 In addition, please also refer to FIG. 2A. FIG. 2A shows an embodiment of the document to be identified. In this embodiment, the document to be identified is a medical expense receipt. It can be seen from FIG. 2A that the document 80 to be recognized includes multiple characters, and when the image of the document 80 to be recognized is captured by the input device 10, the characters on the document 80 to be recognized certainly exist in the form of images, that is to say, The text imported by the input device 10 into the document to be recognized in the text segment labeling system 100 cannot be edited, and these texts are hereinafter referred to as text images.

此外，請同時參照圖3，圖3所繪示為本新型之文字區段標籤方法的實施例。首先，實施步驟S210，匯入如圖2A的待識別文件，其詳細流程已如上文所述，在此不再贅述。接著，實施步驟S220，辨識出待識別文件80中的文字區段81。在圖2B中，文字區段81是由虛線所框出來的區域，文字區段81例如是由文字影像識別模組110識別出來。由圖2B可清楚得知，文字區段81是將待識別文件80上的文字影像選取出來，尤其是將集合在一起的文字以一個區段的方式選取出來。之後，實施步驟S230，藉由文字影像識別模組110將文字區段81中的文字影像轉換為可編輯字元。也就是說，原本由輸入裝置10所匯入的待識別文件的影像，其上的文字影像是無法編輯的，然而文字影像識別模組110可將這些文字影像轉換為可編輯文字，其例如是採用OCR(Optical Character Recognition，光學字元識別)的技術。然而，若單純採用OCR的技術，在待識別文件上的字元影像模糊不清或是被髒污附著時，便可能發生判別錯誤的情形。此時，便可採用例如台灣專利申請號107145984所揭露的技術對發生判別錯誤的情形進行修正。在此，文字影像識別模組110可包括遞歸神經網路(Recurrent Neural Network)、長短期記憶模型(Long Short-Term Memory)或是卷積神經網路(Convolutional Neural Network)等神經網路模型。 In addition, please refer to FIG. 3 at the same time. FIG. 3 illustrates an embodiment of the new text segment labeling method. First, step S210 is implemented to import the file to be recognized as shown in FIG. 2A. The detailed process is as described above, and will not be repeated here. Next, step S220 is implemented to identify the text section 81 in the document 80 to be identified. In FIG. 2B, the text section 81 is the area framed by the dashed line, and the text section 81 is recognized by the text image recognition module 110, for example. It can be clearly seen from FIG. 2B that the text segment 81 is selected from the text image on the document 80 to be recognized, especially the grouped text is selected in a segment. After that, step S230 is implemented to convert the text image in the text section 81 into editable characters by the text image recognition module 110. In other words, the text image on the image of the document to be recognized originally imported by the input device 10 cannot be edited, but the text image recognition module 110 can Some text images are converted into editable texts, for example, using OCR (Optical Character Recognition) technology. However, if the OCR technology is used alone, when the image of the characters on the document to be recognized is blurred or dirty, it may be misjudged. In this case, the technology disclosed in Taiwan Patent Application No. 107145984 can be used to correct the situation where the discrimination error occurs. Here, the text image recognition module 110 may include neural network models such as Recurrent Neural Network, Long Short-Term Memory, or Convolutional Neural Network.

之後，實施步驟S240，可藉由語言處理模組120衡量文字區段81與待識別文件80間的至少一第一關聯資訊。詳細來說，第一關聯資訊是指文字區段81與待識別文件80間的相對關係；例如：文字區段81於該待識別文件80中所佔的面積比例、文字區段81的長寬比、文字區段81於該待識別文件80中的位置(例如：座標)。然後，實施步驟S250，將文字區段81中的可編輯文字與第一關聯資訊轉為一第一特徵矩陣。請同時參照圖4A，圖4A所繪示為第一特徵矩陣的示意圖。從圖4A可知，第一特徵矩陣為N x F的二維矩陣，也就是說具有N列和F行的二維矩陣。其中，N的列數是代表於該待識別文件80中文字區段81的數量，F則代表每一個文字區段81所對應的參數。從圖4A可知，F所代表的參數可由文字資訊與第一關聯資訊所構成，在本實施例中第n行前的元素用以表示文字資訊。文字資訊是由文字區段81的可編輯文字轉換而成，其例如是使用詞嵌入(word embedding)的技術轉換而成的向量。在第一特徵矩陣中，第一關聯資訊則是用數值來表示，並將其加入於文字資訊的後方，在本實施例是用第n+1行後的元素來進行表示。舉例來說，若文字區段81於該待識別文件80中所佔的面積比例為10.53%，則可表為0.1053。或者，若是文字區段81的長寬比為4：1，則可表為0.2。又或者，文字區段81的座標資訊為(20,31)且整張文件的大小為(1000,800)，則座標資訊經正規化可表為(0.02,0.03875)。這樣一來，此第一關聯資訊可表為[0.1053,0.2,0.02,0.03875]。 After that, step S240 is implemented, and the language processing module 120 can measure at least one first correlation information between the text segment 81 and the document 80 to be recognized. In detail, the first associated information refers to the relative relationship between the text section 81 and the document 80 to be recognized; for example, the proportion of the area occupied by the text section 81 in the document 80 to be recognized, the length and width of the text section 81 Compare the position (for example, coordinates) of the text section 81 in the document 80 to be recognized. Then, step S250 is implemented to convert the editable text and the first associated information in the text segment 81 into a first feature matrix. Please refer to FIG. 4A at the same time. FIG. 4A is a schematic diagram of the first feature matrix. It can be seen from FIG. 4A that the first feature matrix is an N x F two-dimensional matrix, that is, a two-dimensional matrix with N columns and F rows. Wherein, the number of columns of N represents the number of text sections 81 in the document 80 to be recognized, and F represents the parameter corresponding to each text section 81. It can be seen from FIG. 4A that the parameter represented by F can be composed of text information and first associated information. In this embodiment, the element before the nth line is used to represent text information. The text information is converted from the editable text of the text section 81, which is, for example, a vector converted using a word embedding technology. In the first feature matrix, the first associated information is represented by a numerical value and added to the back of the text information. In this embodiment, it is represented by the element after the n+1th row. For example, if the proportion of the area occupied by the text segment 81 in the document 80 to be recognized is 10.53%, it can be represented as 0.1053. Or, if the aspect ratio of the text section 81 is 4:1, it can be expressed as 0.2. Or, the coordinate information of the text section 81 is (20,31) and the entire document The size of is (1000,800), then the coordinate information can be expressed as (0.02,0.03875) after normalization. In this way, the first related information can be expressed as [0.1053,0.2,0.02,0.03875].

再來，實施步驟S260，可藉由文字區段關係分析模組130衡量各個文字區段81與其他文字區段81的一第二關聯資訊。請同時參照圖2C，若將每一個文字區段81與其他文字區段81間都畫有一線段(在圖2C中並未畫出全部的線段，而僅是示意)，則線段的數量將有N²個(其中N為文字區段81的數量)，此圖形即為數學上的完全圖(Complete Graph)。也就是說，若第二關聯資訊在圖中以文字區段81與其他文字區段81間的線段來表示，則可很清楚得知第二關聯資訊的數量為N²個。舉例來說，若待識別文件80中的文字區段81有20個，則第二關聯資訊的數量為20²個，意即400個。在本實施例中，由於文字區段81與文字區段81之間的關係(意即：第二關聯資訊)可用一完全圖(Complete Graph)來進行表示，故藉由圖像神經網路(Graph Neural Network)的模型來衡量第二關聯資訊。也就是說，文字區段關係分析模組130可包括圖像神經網路的模型。藉由圖像神經網路的模型，文字區段81與文字區段81之間可交換重要的資訊，讓文字區段81與文字區段81之間的關係能用數值來進行表示。 Then, in step S260, the text segment relationship analysis module 130 can measure a second correlation information between each text segment 81 and other text segments 81. Please refer to Figure 2C at the same time. If a line segment is drawn between each text segment 81 and the other text segments 81 (in Figure 2C not all the line segments are drawn, but only for illustration), the number of line segments will be N ² (where N is the number of text segments 81), this graph is a mathematical complete graph (Complete Graph). In other words, if the second related information is represented by a line segment between the text section 81 and other text sections 81 in the figure, it can be clearly known that the number of second related information is N ² pieces. For example, if the file to be identified in the text segment 80 81 with 20, the second association information number is 20 ^2, which means 400. In this embodiment, since the relationship between the text segment 81 and the text segment 81 (meaning: the second associated information) can be represented by a complete graph (Complete Graph), an image neural network ( Graph Neural Network) model to measure the second related information. In other words, the text segment relationship analysis module 130 may include an image neural network model. With the image neural network model, important information can be exchanged between the text segment 81 and the text segment 81, so that the relationship between the text segment 81 and the text segment 81 can be represented by numerical values.

舉例來說，當「健保」這個文字區段81與左側「身份」的文字區段81間的第二關聯資訊可能就用代表關聯性較高的數值來表示。更具體來說，「身份」的數值化向量就會提供給「健保」這個文字區段81較多的向量資訊，例如將「身份」這個文字區段81的數值化向量加在「健保」這個文字區段81的後方，或者將「身份」這個文字區段81的數值化向量乘以一個較大的權重後加在「健保」這個文字區段81的後方；另外，「醫療費用收據」這個文字區段81與「健保」這個文字區段81之間的關聯性可能較低，故「醫療費用收據」這個文字區段81的數值化向量乘以一個較小的權重後加在「健保」這個文字區段81的後方。也因此，在經由步驟S260後，第一特徵矩陣會轉換為如圖4B所示的第二特徵矩陣，而第二特徵矩陣為N x F2的二維矩陣。其中，N是代表於該待識別文件80中文字區段81的數量，F2則代表每一個文字區段81在併入第二關聯資訊後所對應的參數，F2的數量例如為F*N。須注意的是，以上僅是舉例，文字區段81間的關聯性何者較高何者較低是由訓練過後的圖像神經網路模型或其他神經網路模型來進行判定。在本實施例中，是藉由圖像神經網路(Graph Neural Network)的模型來衡量第二關聯資訊。然而，本領域具有通常知識者也可用其他的神經網路模型，如：卷積神經網路(Convolutional Neural Network,CNN)或循環神經網路(Recurrent neural network,RNN)。 For example, when the second correlation information between the text segment 81 of "health insurance" and the text segment 81 of "identity" on the left may be represented by a numerical value representing a higher correlation. More specifically, the numerical vector of "identity" will be provided to the text section 81 of "health insurance" with more vector information. For example, the numerical vector of the text section 81 of "identity" is added to the "health insurance" section. After the text section 81, or multiply the digitized vector of the text section 81 "identity" by a larger weight and add it to the back of the text section 81 "health insurance"; in addition, the "Medical Expense Receipt" The correlation between the text section 81 and the text section 81 "health insurance" may be low, so the numeric vector of the text section 81 "medical expense receipt" is multiplied by a smaller weight and added to the "health insurance" Behind this text section 81. Therefore, after step S260, the first feature matrix will be transformed into the second feature moment as shown in FIG. 4B Matrix, and the second feature matrix is a two-dimensional matrix of N x F2. Wherein, N represents the number of text sections 81 in the document 80 to be recognized, F2 represents the parameter corresponding to each text section 81 after incorporating the second related information, and the number of F2 is, for example, F*N. It should be noted that the above is only an example, and the correlation between the text segments 81 is higher or lower is determined by the trained image neural network model or other neural network models. In this embodiment, a graph neural network (Graph Neural Network) model is used to measure the second associated information. However, those with ordinary knowledge in the field can also use other neural network models, such as Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN).

之後，實施步驟S270，信心轉換模組140例如是藉由Softmax函數將第二特徵矩陣轉換為代表著信心水準的一第三特徵矩陣，此第三特徵矩陣為N x C的二維矩陣，如圖4C所示。其中，N的列數是代表於該待識別文件80中文字區段81的數量，C的行數則代表標籤的全部數量。以下，將對標籤庫150中的標籤進行介紹。 Afterwards, step S270 is implemented. The confidence conversion module 140 converts the second eigen matrix into a third eigen matrix representing the confidence level by, for example, the Softmax function. The third eigen matrix is a two-dimensional matrix of N x C, such as Shown in Figure 4C. Among them, the number of columns of N represents the number of text sections 81 in the document 80 to be recognized, and the number of rows of C represents the total number of tags. Hereinafter, the tags in the tag library 150 will be introduced.

在本實施例中，標籤庫150儲存有多個標籤，這些標籤是用於標示文字區段81的種類。舉例來說，請參照圖2D，「醫療財團法人XXX紀念醫院」這個文字區段81就會被標籤為標題資訊，位於待識別文件80中間區域的數字則會被標籤為費用，而位於待識別文件80最右方的警語則會被標籤為非重要資訊。此外，在這些標籤中，彼此間也可具有階層關係。舉例來說，標題資訊可進一步分類為：醫院名稱、收據類別、健保身份、身份證字號等；費用可進一步分類為：藥品費、護理費、檢查費、藥事服務費等。請回去參照圖4C，在第三特徵矩陣中，每一個元素(element)代表對應到每一個標籤的信心水準。舉例來說，對於藥事服務費這個文字區段81，代表藥事服務費這個元素可能有最高的數值，而代表費用這個元素可能有次高的數值。 In this embodiment, the tag library 150 stores a plurality of tags, and these tags are used to indicate the type of the text section 81. For example, referring to Figure 2D, the text section 81 of "Medical Foundation XXX Memorial Hospital" will be labeled as title information, and the number located in the middle area of the document 80 to be identified will be labeled as expense, and it will be labeled as cost. The warning at the far right of document 80 will be labeled as non-important information. In addition, these tags can also have a hierarchical relationship with each other. For example, the title information can be further classified as: hospital name, receipt type, health insurance identity, ID number, etc.; expenses can be further classified as: drug fees, nursing fees, inspection fees, and pharmacy service fees. Please refer back to FIG. 4C. In the third feature matrix, each element represents the confidence level corresponding to each label. For example, for the text section 81 of the pharmaceutical service fee, the element representing the pharmaceutical service fee may have the highest value, and the element representing the cost may have the second highest value.

再來，實施步驟S280，藉由標籤輸出模組160將第三特徵矩陣轉換為一一維矩陣(如圖4D所示)，此一維矩陣的每一元素代表每一文字區段所對應的一標籤代碼。接著，實施步驟S290，標籤輸出模組160於標籤庫150尋找該標籤代碼所對應的標籤，並賦予每一文字區段81所對應的標籤。這樣一來，之後資料庫相關處理軟體便可根據文字區段81所對應的標籤，將正確的資料輸入到資料庫20所對應的欄位中。因此，藉由本實施例所提到的文字區段標籤方法，使用者在對待識別文件進行拍照後，後續就能完全交由電腦將相關資料輸入到資料庫相對應的欄位。 Then, step S280 is implemented to convert the third feature matrix into a one-dimensional matrix (as shown in FIG. 4D) by the label output module 160. Each element of the one-dimensional matrix represents a corresponding one of each text segment. Label code. Then, step S290 is implemented. The label output module 160 searches the label library 150 for the label corresponding to the label code, and assigns the label corresponding to each text segment 81. In this way, the database-related processing software can then input the correct data into the field corresponding to the database 20 according to the label corresponding to the text section 81. Therefore, with the text segment labeling method mentioned in this embodiment, after the user takes a picture of the document to be identified, the user can then completely hand over the computer to input the relevant data into the corresponding field of the database.

在上述的實施例中，文字影像識別模組110、語言處理模組120、文字區段關係分析模組130、信心轉換模組140、及標籤輸出模組160都包含神經網路模型，這些神經網路模型於訓練時可將樣本分成訓練集與測試集，先由訓練集訓練後，再由測試集進行測試。在其中一個實施例中，訓練集的樣本數約是測試集的三倍。 In the foregoing embodiment, the text image recognition module 110, the language processing module 120, the text segment relationship analysis module 130, the confidence conversion module 140, and the label output module 160 all include neural network models. When the network model is trained, the samples can be divided into training set and test set, which are first trained by the training set and then tested by the test set. In one of the embodiments, the number of samples in the training set is approximately three times that of the test set.

雖然本創作已以較佳實施例揭露如上，然其並非用以限定本創作，任何所屬技術領域中具有通常知識者，在不脫離本創作之精神和範圍內，當可作些許之更動與潤飾，因此本創作之保護範圍當視後附之申請專利範圍所界定者為準。 Although this creation has been disclosed as above in a preferred embodiment, it is not intended to limit this creation. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of this creation. Therefore, the scope of protection of this creation shall be subject to those defined in the attached patent scope.

10:輸入裝置 10: Input device

20:資料庫 20: Database

100:文字區段標籤系統 100: Text section labeling system

102:伺服端 102: server

110:文字影像識別模組 110: text image recognition module

120:語言處理模組 120: Language Processing Module

140:信心轉換模組 140: Confidence Conversion Module

150:標籤庫 150: Tag Library

160:標籤輸出模組 160: Label output module

Claims

一種文字區段標籤系統，包括：一輸入裝置，接受一待識別文件，該待識別文件包括多個文字影像，該文字區段標籤系統；一文字影像識別模組，連接到該輸入裝置以接受該待識別文件，該文字影像識別模組辨識出該待識別文件中至少一文字區段，該文字區段包括至少一上述文字影像，且該語言處理模組將該文字區段中的該文字影像轉換為一可編輯文字；一語言處理模組，與該文字影像識別模組相連接，該語言處理模組衡量該文字區段與該待識別文件間的至少一第一關聯資訊，並將該可編輯文字與該第一關聯資訊轉為一第一特徵矩陣；一文字區段關係分析模組，與該語言處理模組相連接，該文字區段關係分析模組衡量各個文字區段與其他文字區段的一第二關聯資訊，藉由該第二關聯資訊將該第一特徵矩陣轉換為一第二特徵矩陣；一信心轉換模組，與該文字區段關係分析模組相連接，該信心轉換模組將該第二特徵矩陣轉換為代表著信心水準一第三特徵矩陣；一標籤庫，儲存有多個標籤；一標籤輸出模組，與該信心轉換模組及該標籤庫相連接，該標籤輸出模組將該第三特徵矩陣轉換為一一維矩陣，該一維矩陣的每一元素代表每一文字區段所對應的一標籤代碼，且該標籤輸出模組是根據該標籤代碼於一標籤庫尋找所對應的一標籤，並賦予每一文字區段所對應的標籤。 A text segment labeling system includes: an input device that accepts a document to be recognized, the document to be recognized includes a plurality of text images, the text segment labeling system; a text image recognition module connected to the input device to receive the For the document to be recognized, the text image recognition module recognizes at least one text segment in the document to be recognized, the text segment includes at least one text image, and the language processing module converts the text image in the text segment Is an editable text; a language processing module is connected to the text image recognition module, the language processing module measures at least one first association information between the text segment and the document to be recognized, and combines the The edited text and the first associated information are converted into a first feature matrix; a text segment relationship analysis module is connected to the language processing module, and the text segment relationship analysis module measures each text segment and other text areas A second correlation information of a segment, by using the second correlation information to transform the first feature matrix into a second feature matrix; a confidence conversion module connected to the text segment relationship analysis module, the confidence conversion The module converts the second feature matrix into a third feature matrix representing the confidence level; a tag library storing a plurality of tags; a tag output module connected to the confidence conversion module and the tag library, the The label output module converts the third feature matrix into a one-dimensional matrix, each element of the one-dimensional matrix represents a label code corresponding to each text segment, and the label output module is based on the label code in a The tag library searches for a corresponding tag, and assigns a tag corresponding to each text segment.

如請求項1所述之文字區段標籤系統，其中該第一關聯資訊包括以下資訊的至少其中之一：該文字區段於該文件中所佔的面積比例；該文字區段的長寬比；或該文字區段的位置。 The text section labeling system according to claim 1, wherein the first associated information includes at least one of the following information: the proportion of the area occupied by the text section in the document; the aspect ratio of the text section ; Or the location of the text section.

如請求項1所述之文字區段標籤系統，其中該文字影像識別模組、該語言處理模組、該文字區段關係分析模組、該信心轉換模組、與該標籤輸出模組皆包括至少一神經網路模型。 The text segment label system according to claim 1, wherein the text image recognition module, the language processing module, the text segment relationship analysis module, the confidence conversion module, and the label output module all include At least one neural network model.

如請求項1或請求項3所述之文字區段標籤系統，其中該文字區段關係分析模組是藉由一圖像神經網路模型衡量各個文字區段與其他文字區段的該第二關聯資訊。 The text segment labeling system of claim 1 or claim 3, wherein the text segment relationship analysis module measures the second of each text segment and other text segments by an image neural network model Associated information.

如請求項1所述之文字區段標籤系統，其中部分標籤彼此間具有階層關係。 The text segment label system according to claim 1, wherein some of the labels have a hierarchical relationship with each other.

如請求項1所述之文字區段標籤系統，其中該信心轉換模組是藉由Softmax函數將該第二特徵矩陣轉換為代表著信心水準的該第三特徵矩陣。 The text segment labeling system according to claim 1, wherein the confidence conversion module converts the second feature matrix into the third feature matrix representing the confidence level by using a Softmax function.

如請求項1所述之文字區段標籤系統，其中該第一特徵矩陣、該第二特徵矩陣、與該第三特徵矩陣皆為二維矩陣。 The text segment label system according to claim 1, wherein the first feature matrix, the second feature matrix, and the third feature matrix are all two-dimensional matrices.