TW202011232A

TW202011232A - Segmentation method, segmentation system and non-transitory computer-readable medium

Info

Publication number: TW202011232A
Application number: TW108104097A
Authority: TW
Inventors: 藍國誠; 詹詩涵
Original assignee: 台達電子工業股份有限公司
Priority date: 2018-09-07
Filing date: 2019-02-01
Publication date: 2020-03-16
Also published as: CN110891202B; TWI709905B; TW202011231A; TWI725375B; CN110895654A; SG10201906347QA; SG10201905236WA; SG10201905532QA; TWI699663B; TWI700597B; JP2020042771A; TW202011749A; TWI696386B; CN110895654B; JP6829740B2; JP2020042770A; CN110891202A; JP2020042777A; TW202011222A; CN110889034A

Abstract

The present disclosure relates to a segmentation method, a segmentation system and a non-transitory computer-readable medium. The segmentation method includes the following operations: receiving captioning information, wherein the captioning information includes a plurality of captioning sentences; selecting the captioning sentences according to a defalut value and dividing the selected captioning sentence into a first paragraph; performing a common segmentation vocabulary judgment for a first captioning sentence; wherein the first captioning sentence is one of the captioning sentences; and generating a second paragraph or merging the first captioning sentence into the first paragraph according to a judgment result of the common segmentation vocabulary judgment.

Description

分段方法、分段系統及非暫態電腦可讀取媒體Segmentation method, segmentation system and non-transitory computer readable media

本揭示內容關於一種分段方法、分段系統及非暫態電腦可讀取媒體，且特別是有關於一種針對字幕的分段方法、分段系統及非暫態電腦可讀取媒體。This disclosure relates to a segmentation method, a segmentation system, and non-transitory computer readable media, and in particular to a subtitle segmentation method, a segmentation system, and non-transitory computer readable media.

線上學習平台是指一種將眾多學習資料儲存於伺服器中，讓使用者能透過網際網路連線至伺服器，以隨時瀏覽學習資料的網路服務。在現行的各類線上學習平台中，提供的學習資料類型包含影片、音訊、簡報、文件或論壇。The online learning platform refers to a network service that stores many learning materials in a server and allows users to connect to the server through the Internet to browse the learning materials at any time. In the current various online learning platforms, the types of learning materials provided include videos, audios, presentations, documents or forums.

由於線上學習平台中儲存的學習資料數量龐大，為了能夠方便使用者的使用，需要針對學習資料的文字進行自動分段以及建立段落關鍵字。因此，如何根據學習影片的內容之間的差異性進行處理，達到將學習影片中類似的主題進行分段並標註關鍵字的功能是本領域待解決的問題。Due to the huge amount of learning materials stored in the online learning platform, in order to facilitate the use of users, it is necessary to automatically segment the text of the learning materials and create paragraph keywords. Therefore, it is a problem to be solved in the art how to deal with the differences between the contents of the learning movie and achieve the function of segmenting similar topics in the learning movie and labeling keywords.

本揭示內容之第一態樣是在提供一種分段方法。分段方法包含下列步驟：接收字幕資訊；其中，字幕資訊包含複數個字幕句；根據設定值選取字幕句，並將被選取的字幕句分為第一段落；針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是字幕句的其中之一；以及根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The first aspect of this disclosure is to provide a segmentation method. The segmentation method includes the following steps: receiving subtitle information; wherein, the subtitle information includes a plurality of subtitle sentences; selecting the subtitle sentences according to the setting values, and dividing the selected subtitle sentences into the first paragraph; performing common segmentation vocabulary for the first subtitle sentence Judgment; wherein, the first subtitle sentence is one of the subtitle sentences; and the second paragraph is generated or the first subtitle sentence is merged into the first paragraph according to the judgment result of the common segmented vocabulary judgment.

本揭示內容之第二態樣是在提供一種分段系統，其包含儲存單元以及處理器。儲存單元用以儲存字幕資訊、分段結果、第一段落對應的註解以及第二段落對應的註解。處理器與儲存單元電性連接，用以接收字幕資訊；其中，字幕資訊包含複數個字幕句，處理器包含：分段單元、常見詞偵測單元、以及段落產生單元。分段單元用以利用設定值根據特定順序選取字幕句，並將被選取的字幕句分為第一段落。常見詞偵測單元與分段單元電性連接，用以針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是該些字幕句的其中之一。段落產生單元與常見詞偵測單元電性連接，用以根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The second aspect of the present disclosure is to provide a segmentation system including a storage unit and a processor. The storage unit is used to store subtitle information, segmentation results, notes corresponding to the first paragraph and notes corresponding to the second paragraph. The processor and the storage unit are electrically connected to receive subtitle information; wherein the subtitle information includes a plurality of subtitle sentences, and the processor includes: a segmentation unit, a common word detection unit, and a paragraph generation unit. The segmentation unit is used to select caption sentences according to a specific order using the set values, and divide the selected caption sentences into the first paragraph. The common word detection unit and the segmentation unit are electrically connected to determine the common segmentation vocabulary for the first subtitle sentence; wherein, the first subtitle sentence is one of the subtitle sentences. The paragraph generation unit and the common word detection unit are electrically connected to generate a second paragraph or merge the first subtitle sentence into the first paragraph according to the judgment result of the common segment vocabulary judgment.

本案之第三態樣是在提供一種非暫態電腦可讀取媒體包含至少一指令程序，由處理器執行至少一指令程序以實行分段方法，其包含以下步驟：接收字幕資訊；其中，字幕資訊包含複數個字幕句；根據設定值選取字幕句，並將被選取的字幕句分為第一段落；針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是字幕句的其中之一；以及根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The third aspect of the case is to provide a non-transitory computer-readable medium that includes at least one instruction program, and the processor executes at least one instruction program to implement the segmentation method, which includes the following steps: receiving subtitle information; The information includes a plurality of subtitle sentences; the subtitle sentences are selected according to the setting values, and the selected subtitle sentences are divided into the first paragraph; the common segmentation vocabulary judgment is made for the first subtitle sentence; wherein, the first subtitle sentence is one of the subtitle sentences One; and according to the judgment result of common segmented vocabulary judgment, generate a second paragraph or merge the first subtitle sentence into the first paragraph.

本揭露之分段方法、分段系統及非暫態電腦可讀取媒體，其主要係改進以往係利用工方式進行影片段落標記，耗費大量人力以及時間的問題。先計算每一字幕句對應的關鍵字，在針對字幕句進行常見分段詞彙判斷，根據該常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落，以產生分段結果，達到將學習影片中類似的主題進行分段並標註關鍵字的功能。The segmentation method, the segmentation system and the non-transitory computer-readable media disclosed in the present disclosure are mainly to improve the problem of using previous work methods to mark the video paragraphs, which consumes a lot of manpower and time. Calculate the keywords corresponding to each subtitle sentence first, and then perform common segmentation vocabulary judgment on the subtitle sentence, generate a second paragraph according to the judgment result of the common segmentation vocabulary judgment or merge the first subtitle sentence into the first paragraph, to generate a score The segment result achieves the function of segmenting and tagging keywords on similar topics in the movie.

以下將以圖式揭露本案之複數個實施方式，為明確說明起見，許多實務上的細節將在以下敘述中一併說明。然而，應瞭解到，這些實務上的細節不應用以限制本案。也就是說，在本揭示內容部分實施方式中，這些實務上的細節是非必要的。此外，為簡化圖式起見，一些習知慣用的結構與元件在圖式中將以簡單示意的方式繪示之。In the following, a plurality of embodiments of the case will be disclosed in a diagram. For the sake of clarity, many practical details will be described together in the following description. However, it should be understood that these practical details should not be used to limit the case. That is to say, in some embodiments of the present disclosure, these practical details are unnecessary. In addition, in order to simplify the drawings, some conventional structures and elements will be shown in a simple schematic manner in the drawings.

於本文中，當一元件被稱為「連接」或「耦接」時，可指「電性連接」或「電性耦接」。「連接」或「耦接」亦可用以表示二或多個元件間相互搭配操作或互動。此外，雖然本文中使用「第一」、「第二」、…等用語描述不同元件，該用語僅是用以區別以相同技術用語描述的元件或操作。除非上下文清楚指明，否則該用語並非特別指稱或暗示次序或順位，亦非用以限定本發明。In this article, when an element is referred to as "connected" or "coupled", it can be referred to as "electrically connected" or "electrically coupled." "Connected" or "coupled" can also be used to indicate that two or more components interact or interact with each other. In addition, although terms such as "first", "second", etc. are used herein to describe different elements, the terms are only used to distinguish elements or operations described in the same technical terms. Unless the context clearly dictates, the term does not specifically refer to or imply order or order, nor is it intended to limit the present invention.

請參閱第1圖。第1圖係根據本案之一些實施例所繪示之分段系統100的示意圖。如第1圖所繪示，分段系統100包含儲存單元110以及處理器130。儲存單元110電性連接至處理器130，儲存單元110用以儲存字幕資訊、分段結果、常見分段詞彙資料庫DB1、課程資料庫DB2、第一段落對應的註解以及第二段落對應的註解。Please refer to Figure 1. Figure 1 is a schematic diagram of a segmentation system 100 according to some embodiments of the present case. As shown in FIG. 1, the segmentation system 100 includes a storage unit 110 and a processor 130. The storage unit 110 is electrically connected to the processor 130, and the storage unit 110 is used to store subtitle information, segmentation results, common segmentation vocabulary database DB1, course database DB2, annotations corresponding to the first paragraph and annotations corresponding to the second paragraph.

承上述，處理器130包含關鍵字擷取單元131、分段單元132、常見詞偵測單元133、段落產生單元134以及註解產生單元135。分段單元132與關鍵字擷取單元131以及常見詞偵測單元133電性連接，段落產生單元134與常見詞偵測單元133以及註解產生單元135電性連接，常見詞偵測單元133與註解產生單元135電性連接。As described above, the processor 130 includes a keyword extraction unit 131, a segmentation unit 132, a common word detection unit 133, a paragraph generation unit 134, and a comment generation unit 135. The segmentation unit 132 is electrically connected to the keyword extraction unit 131 and the common word detection unit 133, the paragraph generation unit 134 is electrically connected to the common word detection unit 133 and the annotation generation unit 135, the common word detection unit 133 and the annotation The generating unit 135 is electrically connected.

於本發明各實施例中，儲存裝置110可以實施為記憶體、硬碟、隨身碟、記憶卡等。處理器130可以實施為積體電路如微控制單元(microcontroller)、微處理器(microprocessor)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit，ASIC)、邏輯電路或其他類似元件或上述元件的組合。In various embodiments of the present invention, the storage device 110 may be implemented as a memory, a hard disk, a flash drive, a memory card, or the like. The processor 130 can be implemented as an integrated circuit such as a microcontroller, a microprocessor, a digital signal processor, an application specific integrated circuit (ASIC), or a logic circuit Or other similar elements or a combination of the above elements.

請參閱第2圖。第2圖係根據本案之一些實施例所繪示之分段方法200的流程圖。於一實施例中，第2圖所示之分段方法200可以應用於第1圖的分段系統100上，處理器130用以根據下列分段方法200所描述之步驟，針對字幕資訊進行分段以產生一分段結果以及每一段落對應的註解。如第2圖所示，分段方法200首先執行步驟S210接收字幕資訊。於一實施例中，字幕資訊包含複數個字幕句。舉例而言，字幕資訊為影片的字幕檔案，影片的字幕檔案已經根據影片撥放時間將影片內容分為複數個字幕句，字幕句也會根據影片播放時間排序。Please refer to figure 2. FIG. 2 is a flowchart of the segmentation method 200 according to some embodiments of the present case. In one embodiment, the segmentation method 200 shown in FIG. 2 can be applied to the segmentation system 100 of FIG. 1, and the processor 130 is used to segment the subtitle information according to the steps described in the following segmentation method 200 Paragraphs to produce a segmentation result and a comment corresponding to each paragraph. As shown in FIG. 2, the segmentation method 200 first performs step S210 to receive caption information. In one embodiment, the subtitle information includes a plurality of subtitle sentences. For example, the subtitle information is the subtitle file of the video. The subtitle file of the video has divided the content of the video into a plurality of subtitle sentences according to the playback time of the video, and the subtitle sentences will also be sorted according to the playback time of the video.

接著，分段方法200執行步驟S220根據設定值選取字幕句，並將被選取的字幕句分為當前段落。於一實施例中，設定值可以是任意的正整數，在此設定值以3為例，因此在此步驟中會根據影片播放的時間選擇3句字幕句組成當前段落。舉例而言，如果總共有N句字幕句，可以選擇第1字幕句~第3字幕句組成當前段落。Next, the segmentation method 200 executes step S220 to select a caption sentence according to the set value, and divide the selected caption sentence into the current paragraph. In an embodiment, the setting value may be any positive integer. Here, the setting value is 3 as an example. Therefore, in this step, 3 subtitle sentences are selected according to the playing time of the video to form the current paragraph. For example, if there are a total of N subtitle sentences, you can select the first subtitle sentence to the third subtitle sentence to form the current paragraph.

接著，分段方法200執行步驟S230針對當前字幕句進行常見分段詞彙判斷。於一實施例中，常見分段詞彙係儲存於常見分段詞彙資料庫DB1，常見詞偵測單元133會偵測是否出現常見分段詞彙。常見分段詞彙可以分為常見開頭詞彙以及常見結尾詞彙。舉例而言，常見開頭詞彙可以為「接下來」、「開始說明」等，常見結尾詞彙可以為「以上說明到此」、「今天到這裡告一段落」等。在此步驟中，會偵測是否出現常見分段詞彙以及出現的常見分段詞彙類型(常見開頭詞彙或常見結尾詞彙)。Next, the segmentation method 200 performs step S230 to perform common segmentation vocabulary judgment on the current subtitle sentence. In one embodiment, the common segmented vocabulary is stored in the common segmented vocabulary database DB1, and the common word detection unit 133 detects whether the common segmented vocabulary appears. Common segmented words can be divided into common beginning words and common ending words. For example, the common beginning vocabulary can be "next", "start description", etc., and the common ending vocabulary can be "the above description ends here", "today is here to come to an end", etc. In this step, it will detect whether there is a common segmented vocabulary and the type of common segmented vocabulary that appears (common beginning vocabulary or common ending vocabulary).

接著，分段方法200執行步驟S240根據常見分段詞彙判斷的判斷結果產生下一段落或將當前字幕句併入當前段落。於一實施例中，根據前述常見詞偵測單元133的偵測結果，可以決定是要產生新的段落或是將當前執行字幕句併入當前段落。舉例而言，當前段落是由第1字幕句~第3字幕句組成，當前執行字幕句可以是第4字幕句，根據判斷結果可以將第4字幕句併入當前段落或是將第4字幕句作為新的段落的開始。Next, the segmentation method 200 executes step S240 to generate the next paragraph or merge the current subtitle sentence into the current paragraph according to the judgment result of the common segmented vocabulary judgment. In an embodiment, according to the detection result of the aforementioned common word detection unit 133, it may be determined whether a new paragraph should be generated or the currently executed subtitle sentence is incorporated into the current paragraph. For example, the current paragraph is composed of the first subtitle sentence to the third subtitle sentence. The currently executed subtitle sentence can be the fourth subtitle sentence. According to the judgment result, the fourth subtitle sentence can be merged into the current paragraph or the fourth subtitle sentence As the beginning of a new paragraph.

承上述，步驟S240執行將當前字幕句併入當前段落後，會接著執行下一字幕句的常見分段詞彙判斷，因此會重行執行步驟S230的判斷。舉例而言，如果第4字幕句併入當前段落後，會接著執行第5字幕句的常見分段詞彙判斷。如果步驟S240執行產生下一段落後，會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S220的操作。舉例而言，如果第4字幕句被分類為下一段落後，會重新選擇第5字幕句、第6字幕句以及第7字幕句加入下一段落。因此，會重複執行分段的動作，直到字幕句被分段完畢，最後產生分段結果。According to the above, after step S240 is performed to merge the current subtitle sentence into the current paragraph, the common segmented vocabulary judgment of the next subtitle sentence is then performed, so the judgment of step S230 is repeated. For example, if the fourth subtitle sentence is merged into the current paragraph, the common segmented vocabulary judgment of the fifth subtitle sentence will be performed next. If step S240 is executed to generate the next paragraph backward, it will then execute the selection of subtitle sentences according to the specific order according to the set value, and divide the selected subtitle sentences into the next paragraph, so the operation of step S220 will be repeated. For example, if the 4th subtitle sentence is classified as lagging behind, the 5th subtitle sentence, 6th subtitle sentence and 7th subtitle sentence will be re-selected and added to the next paragraph. Therefore, the segmentation action will be repeated until the subtitle sentence is segmented, and finally the segmentation result is generated.

接著，步驟S240更包含步驟S241~S242，請一併參考第3圖，第3圖係根據本案之一些實施例所繪示之步驟S240的流程圖。如第3圖所示，分段方法200進一步執行步驟S241如果當前字幕句與常見分段詞彙相關聯，進行分段處理產生下一段落，並利用設定值根據特定順序選取字幕句，將被選取的字幕句加入下一段落。其中，步驟S241更包含步驟S2411~S2413，請進一步參考4圖，第4圖係根據本案之一些實施例所繪示之步驟S241的流程圖。如第4圖所示，分段方法200進一步執行步驟S2411根據判斷結果決定當前字幕句是否與開頭分段詞彙以及結尾分段詞彙的其中之一相關聯。接續上方實施例，根據步驟S230的判斷結果，可以決定當前字幕句是否與開頭分段詞彙或結尾分段詞彙相關聯。Next, step S240 further includes steps S241 to S242. Please refer to FIG. 3 together. FIG. 3 is a flowchart of step S240 according to some embodiments of the present case. As shown in FIG. 3, the segmentation method 200 further performs step S241. If the current subtitle sentence is associated with a common segmented vocabulary, segmentation processing is performed to generate the next paragraph, and the subtitle sentence is selected according to a specific order using the set value, and the selected Add the subtitle sentence to the next paragraph. Among them, step S241 further includes steps S2411 to S2413. Please further refer to FIG. 4, which is a flowchart of step S241 according to some embodiments of the present invention. As shown in FIG. 4, the segmentation method 200 further executes step S2411 to determine whether the current subtitle sentence is associated with one of the beginning segment vocabulary and the ending segment vocabulary according to the judgment result. Following the above embodiment, according to the judgment result of step S230, it can be determined whether the current subtitle sentence is associated with the opening segment vocabulary or the ending segment vocabulary.

承上述，分段方法200進一步執行步驟S2412如果當前字幕句與開頭分段詞彙相關聯，以當前字幕句作為下一段落的起始句。舉例而言，如果在前述的判斷結果中偵測到第4字幕句中具有「接下來」的詞彙，即將第4字幕作為下一段落的起始句。According to the above, the segmentation method 200 further performs step S2412. If the current subtitle sentence is associated with the beginning segmented vocabulary, the current subtitle sentence is used as the starting sentence of the next paragraph. For example, if the word "Next" is detected in the fourth subtitle sentence in the foregoing judgment result, the fourth subtitle sentence is used as the starting sentence of the next paragraph.

承上述，分段方法200進一步執行步驟S2413如果當前字幕句與結尾分段詞彙相關聯，以當前字幕句作為當前段落的結尾句。舉例而言，如果在前述的判斷結果中偵測到第4字幕句中具有「以上說明到此」的詞彙，即將第4字幕作為當前段落的結尾句。執行完步驟S241的操作後會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S220的操作，在此不再贅述。According to the above, the segmentation method 200 further performs step S2413 if the current subtitle sentence is associated with the ending segment vocabulary, the current subtitle sentence is used as the ending sentence of the current paragraph. For example, if it is detected in the fourth subtitle sentence that there is a vocabulary of "the above description" in the foregoing judgment result, the fourth subtitle sentence is taken as the ending sentence of the current paragraph. After the operation in step S241 is performed, the caption sentences are selected according to the specific order according to the specified value, and the selected caption sentences are divided into the next paragraph. Therefore, the operation in step S220 is repeated, which will not be repeated here.

接著，分段方法200進一步執行步驟S242如果當前字幕句不與常見分段詞彙相關聯，當前字幕句與當前段落進行相似值計算，如果相似，將第一字幕句併入當前段落。其中，步驟S242更包含步驟S2421~ S2423，請進一步參考5圖，第5圖係根據本案之一些實施例所繪示之步驟S242的流程圖。如第5圖所示，分段方法200進一步執行步驟S2421比較當前字幕句對應的至少一特徵與當前段落對應的至少一特徵的差異值是否大於門檻值。Next, the segmentation method 200 further performs step S242. If the current subtitle sentence is not associated with a common segmented vocabulary, the current subtitle sentence and the current paragraph are similarly calculated, and if similar, the first subtitle sentence is incorporated into the current paragraph. Among them, step S242 further includes steps S2421 to S2423, please refer to FIG. 5 further, FIG. 5 is a flowchart of step S242 according to some embodiments of the present case. As shown in FIG. 5, the segmentation method 200 further executes step S2421 to compare whether the difference between at least one feature corresponding to the current subtitle sentence and the at least one feature corresponding to the current paragraph is greater than the threshold value.

承上述，於一實施例中，從字幕句中提取出複數個關鍵字，提取出的關鍵字即為當前字幕句對應的至少一特徵。利用TF-IDF統計方法(T ermF requency–InverseD ocumentF requency)計算字幕句對應的關鍵字。TF-IDF統計方法用來評估一字詞對於資料庫中的一份檔案的重要程度，字詞的重要性隨著它在檔案中出現的次數成正比增加，但同時也會隨著它在資料庫中出現的頻率成反比下降。於此實施例中，TF-IDF統計方法可以計算當前字幕句的關鍵字。接著，計算當前字幕句的至少一特徵(關鍵字)與當前段落的至少一特徵(關鍵字)的相似值，計算出的相似值越高即可判定為當前字幕句與當前段落的內容越接近。According to the above, in one embodiment, a plurality of keywords are extracted from the subtitle sentence, and the extracted keywords are at least one feature corresponding to the current subtitle sentence. Use the TF-IDF statistical method ( T erm F requency–Inverse D ocument F requency) to calculate the keywords corresponding to the subtitle sentences. The TF-IDF statistical method is used to evaluate the importance of a word to a file in the database. The importance of the word increases in proportion to the number of times it appears in the file, but it also increases with the word in the data. The frequency of occurrence in the library decreases inversely. In this embodiment, the TF-IDF statistical method can calculate the keywords of the current subtitle sentence. Next, calculate the similarity value between at least one feature (keyword) of the current subtitle sentence and at least one feature (keyword) of the current paragraph, and the higher the calculated similarity value, the closer the content of the current subtitle sentence to the current paragraph can be determined .

承上述，分段方法200進一步執行步驟S2422如果差異值小於門檻值，將當前字幕句併入當前段落。於一實施例中，利用門檻值對相似值進行篩選，當相似值不小於門檻值時，表示當前字幕句與當前段落的內容比較相似，因此可以將當前字幕句併入當前段落中。舉例而言，如果第4字幕句與當前段落的相似值不小於門檻值，表示第4字幕句與當前段落的內容比較相似，因此可以將第4字幕句加入當前段落。According to the above, the segmentation method 200 further executes step S2422 if the difference value is less than the threshold value, the current subtitle sentence is merged into the current paragraph. In an embodiment, the threshold value is used to filter similar values. When the similarity value is not less than the threshold value, it indicates that the content of the current subtitle sentence is similar to that of the current paragraph, so the current subtitle sentence can be incorporated into the current paragraph. For example, if the similarity value between the fourth subtitle sentence and the current paragraph is not less than the threshold value, it means that the content of the fourth subtitle sentence is similar to the current paragraph, so the fourth subtitle sentence can be added to the current paragraph.

承上述，分段方法200進一步執行步驟S2423如果差異值不小於門檻值，以當前字幕句作為下一段落的起始句，並利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落。舉例而言，當相似值小於門檻值時，表示當前字幕句與當前段落的內容具有差異，因此將當前字幕句判定為第二段落的起始句。舉例而言，如果第4字幕句與當前段落的相似值小於門檻值，表示第4字幕句與當前段落的內容具有差異，因此將第4字幕句作為下一段落的起始句。執行完步驟S252的操作後會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S230的操作，在此不再贅述。According to the above, the segmentation method 200 further performs step S2423. If the difference value is not less than the threshold, the current subtitle sentence is used as the starting sentence of the next paragraph, and the subtitle sentence is selected according to a specific order using the set value, and the selected subtitle sentence is divided The next paragraph. For example, when the similarity value is less than the threshold value, it means that the content of the current subtitle sentence is different from the content of the current paragraph, so the current subtitle sentence is determined as the starting sentence of the second paragraph. For example, if the similarity value between the fourth subtitle sentence and the current paragraph is less than the threshold value, it means that the content of the fourth subtitle sentence and the current paragraph are different, so the fourth subtitle sentence is taken as the starting sentence of the next paragraph. After the operation of step S252 is performed, the caption sentences are selected according to a specific order using the set values, and the selected caption sentences are divided into the next paragraph. Therefore, the operation of step S230 is repeated, which is not repeated here.

由上述的分段操作可以得知，每次做完一句字幕句的分段計算後會接著執行下一句字幕句的分段計算，直到所有的字幕句執行完畢為止，如果有剩餘字幕句的數量少於設定值的設定時，可以不再針對剩餘字幕句進行分段計算，而是直接將剩餘字幕句併入當前段落，舉例而言，如果剩餘字幕句的數量為2，少於前述的設定值(前述將設定值設定為3)，因此剩下的2句字幕句即可併入當前段落。It can be known from the above-mentioned segmentation operation that after each segmentation calculation of a subtitle sentence is completed, the subsection calculation of the next sentence will be performed next, until all the subtitle sentences are executed, if there are the number of remaining subtitle sentences When the setting is less than the set value, you can no longer perform segment calculation for the remaining subtitle sentences, but directly merge the remaining subtitle sentences into the current paragraph. For example, if the number of remaining subtitle sentences is 2, it is less than the previous setting Value (previously set the value to 3), so the remaining 2 subtitle sentences can be incorporated into the current paragraph.

接著，執行完上述的分段步驟後，分段方法200執行步驟S250產生段落對應的註解。舉例而言，如果在執行完全部的字幕句後分為3個段落，會分別計算3個段落的註解，註解可以是根據段落中的字幕句對應的關鍵字產生。最後，將分好的段落以及段落對應的註解儲存至儲存單元110的課程資料庫DB2中。舉例而言，如果差異值小於門檻值時，表示當前字幕句與當前段落較相似，因此可以利用字幕句的關鍵字作為當前段落對應的至少一特徵。如果差異值不小於門檻值時，表示當前字幕句與當前段落不相似，因此可以利用字幕句的關鍵字作為下一段落對應的至少一特徵。Next, after performing the above-mentioned segmentation step, the segmentation method 200 executes step S250 to generate the annotation corresponding to the paragraph. For example, if all subtitle sentences are executed and then divided into 3 paragraphs, the comments of the 3 paragraphs will be calculated separately. The comments may be generated based on the keywords corresponding to the subtitle sentences in the paragraph. Finally, the divided paragraphs and the annotations corresponding to the paragraphs are stored in the course database DB2 of the storage unit 110. For example, if the difference value is less than the threshold value, it means that the current subtitle sentence is similar to the current paragraph, so the keyword of the subtitle sentence can be used as at least one feature corresponding to the current paragraph. If the difference value is not less than the threshold value, it means that the current subtitle sentence is not similar to the current paragraph, so the keyword of the subtitle sentence can be used as at least one feature corresponding to the next paragraph.

由上述本案之實施方式可知，主要係改進以往係利用工方式進行影片段落標記，耗費大量人力以及時間的問題。先計算每每一字幕句對應的關鍵字，在針對字幕句進行常見分段詞彙判斷，根據該常見分段詞彙判斷的判斷結果產生下一段落或將第一字幕句併入當前段落，以產生分段結果，達到將學習影片中類似的主題進行分段並標註關鍵字的功能。It can be seen from the implementation of the above-mentioned case that the main problem is to improve the previous method of using the work method to mark the video paragraphs, which consumes a lot of manpower and time. Calculate the keywords corresponding to each subtitle sentence first, and then perform common segmentation vocabulary judgment on the subtitle sentence, generate the next paragraph according to the judgment result of the common segmentation vocabulary judgment or merge the first subtitle sentence into the current paragraph to generate a segment As a result, the function of segmenting similar topics in the learning movie and tagging keywords is achieved.

另外，上述例示包含依序的示範步驟，但該些步驟不必依所顯示的順序被執行。以不同順序執行該些步驟皆在本揭示內容的考量範圍內。在本揭示內容之實施例的精神與範圍內，可視情況增加、取代、變更順序及/或省略該些步驟。In addition, the above example includes exemplary steps in order, but the steps need not be performed in the order shown. Performing these steps in different orders is within the scope of this disclosure. Within the spirit and scope of the embodiments of the present disclosure, the order may be added, replaced, changed, and/or omitted as appropriate.

雖然本揭示內容已以實施方式揭露如上，然其並非用以限定本發明內容，任何熟習此技藝者，在不脫離本發明內容之精神和範圍內，當可作各種更動與潤飾，因此本發明內容之保護範圍當視後附之申請專利範圍所界定者為準。Although the present disclosure has been disclosed as above by way of implementation, it is not intended to limit the content of the present invention. Anyone who is familiar with this skill can make various changes and modifications within the spirit and scope of the present content, so the present invention The protection scope of the content shall be deemed as defined by the scope of the attached patent application.

100:分段系統110:儲存單元130:處理器DB1:常見分段詞彙資料庫DB2:課程資料庫131:關鍵字擷取單元132:分段單元133:常見詞偵測單元134:段落產生單元135:註解產生單元200:分段方法S210～S250、S241~S242、S2411~S2413、S2421~S2423:步驟100: Segmentation system 110: Storage unit 130: Processor DB1: Common segmentation vocabulary database DB2: Course database 131: Keyword extraction unit 132: Segmentation unit 133: Common word detection unit 134: Paragraph generation unit 135: Annotation generation unit 200: Segmentation methods S210-S250, S241-S242, S2411-S2413, S2421-S2423: Steps

為讓本發明之上述和其他目的、特徵、優點與實施例能更明顯易懂，所附圖式之說明如下：第1圖係根據本案之一些實施例所繪示之分段系統的示意圖；第2圖係根據本案之一些實施例所繪示之分段方法的流程圖；第3圖係根據本案之一些實施例所繪示之步驟S240的流程圖；第4圖係根據本案之一些實施例所繪示之步驟S241的流程圖；以及第5圖係根據本案之一些實施例所繪示之步驟S242的流程圖。In order to make the above and other objects, features, advantages and embodiments of the present invention more obvious and understandable, the drawings are described as follows: Figure 1 is a schematic diagram of a segmented system according to some embodiments of the present case; Figure 2 is a flowchart of the segmentation method according to some embodiments of this case; Figure 3 is a flowchart of step S240 according to some embodiments of this case; Figure 4 is some implementations of this case The flowchart of step S241 shown in the example; and FIG. 5 is a flowchart of step S242 shown according to some embodiments of the present case.

200:分段方法 200: Segmentation method

S210~S250:步驟 S210~S250: Steps

Claims

一種分段方法，包含：接收一字幕資訊；其中，該字幕資訊包含複數個字幕句；根據一設定值選取該些字幕句，並將被選取的字幕句分為一第一段落；針對一第一字幕句進行一常見分段詞彙判斷；其中，該第一字幕句是該些字幕句的其中之一；以及根據該常見分段詞彙判斷的一判斷結果產生一第二段落或將該第一字幕句併入該第一段落。A segmentation method includes: receiving a subtitle information; wherein the subtitle information includes a plurality of subtitle sentences; selecting the subtitle sentences according to a set value, and dividing the selected subtitle sentences into a first paragraph; for a first Performing a common segmented vocabulary judgment on the subtitle sentence; wherein, the first subtitle sentence is one of the subtitle sentences; and generating a second paragraph or the first subtitle according to a judgment result of the common segmented vocabulary judgment Sentence into the first paragraph.

如請求項1所述之分段方法，其中，當該第一字幕句併入該第一段落後，針對一第二字幕句進行該常見分段詞彙判斷；其中，該第二字幕句依照一特定順序在該第一字幕句之後。The segmentation method according to claim 1, wherein, after the first subtitle sentence is merged into the first paragraph, the common segmentation vocabulary judgment is performed for a second subtitle sentence; wherein, the second subtitle sentence is based on a specific The order is after the first subtitle sentence.

如請求項1所述之分段方法，其中，當產生該第二段落時，利用該設定值根據一特定順序選取該些字幕句，將被選取的字幕句加入該第二段落。The segmentation method according to claim 1, wherein when the second paragraph is generated, the subtitle sentences are selected according to a specific order using the set value, and the selected subtitle sentences are added to the second paragraph.

如請求項1所述之分段方法，其中，根據該常見分段詞彙判斷的該判斷結果產生該第二段落或將該第一字幕句併入該第一段落，更包含：如果該第一字幕句與該常見分段詞彙相關聯，進行一分段處理產生該第二段落，並利用該設定值根據一特定順序選取該些字幕句，將被選取的字幕句加入該第二段落；以及如果該第一字幕句不與該常見分段詞彙相關聯，該第一字幕句與該第一段落進行一相似值計算，如果相似，將該第一字幕句併入該第一段落。The segmentation method according to claim 1, wherein the second paragraph is generated according to the judgment result of the common segmented vocabulary or the first subtitle sentence is merged into the first paragraph, further including: if the first subtitle The sentence is associated with the common segmented vocabulary, a segmentation process is performed to generate the second paragraph, and the subtitle sentences are selected according to a specific order using the set value, and the selected subtitle sentence is added to the second paragraph; and if The first subtitle sentence is not associated with the common segmented vocabulary. The first subtitle sentence and the first paragraph are subjected to a similarity calculation. If similar, the first subtitle sentence is incorporated into the first paragraph.

如請求項4所述之分段方法，其中，該分段處理包含：根據該判斷結果決定該第一字幕句是否與一開頭分段詞彙以及一結尾分段詞彙的其中之一相關聯；如果該第一字幕句與該開頭分段詞彙相關聯，以該第一字幕句作為該第二段落的起始句；以及如果該第一字幕句與該結尾分段詞彙相關聯，以該第一字幕句作為該第一段落的結尾句。The segmentation method according to claim 4, wherein the segmentation process includes: determining whether the first subtitle sentence is associated with one of a beginning segment vocabulary and an ending segment vocabulary according to the judgment result; if The first subtitle sentence is associated with the opening segment vocabulary, and the first subtitle sentence is used as the beginning sentence of the second paragraph; and if the first subtitle sentence is associated with the ending segment vocabulary, the first The subtitle sentence serves as the ending sentence of the first paragraph.

如請求項4所述之分段方法，其中，該相似值計算包含：比較該第一字幕句對應的至少一特徵與該第一段落對應的至少一特徵的一差異值是否大於一門檻值；如果該差異值小於該門檻值，將該第一字幕句併入該第一段落；以及如果該差異值不小於該門檻值，以該第一字幕句作為該第二段落的起始句，並利用該設定值根據該特定順序選取該些字幕句，將被選取的字幕句分為該第二段落。The segmentation method according to claim 4, wherein the calculation of the similarity value includes: comparing whether a difference value of at least one feature corresponding to the first subtitle sentence and at least one feature corresponding to the first paragraph is greater than a threshold value; If the difference value is less than the threshold value, merge the first subtitle sentence into the first paragraph; and if the difference value is not less than the threshold value, use the first subtitle sentence as the starting sentence of the second paragraph, and use the The setting value selects the subtitle sentences according to the specific order, and divides the selected subtitle sentences into the second paragraph.

如請求項6所述之分段方法，其中，從該些字幕句中提取出複數個關鍵字，該些關鍵字為該第一字幕句對應的至少一特徵。The segmentation method according to claim 6, wherein a plurality of keywords are extracted from the subtitle sentences, and the keywords are at least one feature corresponding to the first subtitle sentence.

如請求項7所述之分段方法，其中，該第一段落對應的至少一特徵，由該第一段落中的該些字幕句提取出的該些關字產生。The segmentation method according to claim 7, wherein at least one feature corresponding to the first paragraph is generated from the keywords extracted from the subtitle sentences in the first paragraph.

一種分段系統，包含：一儲存單元，用以儲存一字幕資訊、一分段結果、一常見分段詞彙資料庫、一第一段落對應的註解以及一第二段落對應的註解；以及一處理器，與該儲存單元電性連接，用以接收該字幕資訊；其中，該字幕資訊包含複數個字幕句，該處理器包含：一分段單元，用以利用一設定值選取該些字幕句，並將被選取的字幕句分為一第一段落；一常見詞偵測單元，與該分段單元電性連接，用以針對一第一字幕句進行一常見分段詞彙判斷；其中，該第一字幕句是該些字幕句的其中之一；以及一段落產生單元，與該常見詞偵測單元電性連接，用以根據該常見分段詞彙判斷的一判斷結果產生一第二段落或將該第一字幕句併入該第一段落。A segmentation system includes: a storage unit for storing a subtitle information, a segmentation result, a common segmentation vocabulary database, a comment corresponding to a first paragraph and a comment corresponding to a second paragraph; and a processor , Electrically connected to the storage unit, for receiving the subtitle information; wherein, the subtitle information includes a plurality of subtitle sentences, the processor includes: a segmentation unit for selecting the subtitle sentences using a set value, and Divide the selected subtitle sentence into a first paragraph; a common word detection unit, electrically connected to the segmentation unit, for judging a common segmented vocabulary for a first subtitle sentence; wherein, the first subtitle sentence The sentence is one of the subtitle sentences; and a paragraph generation unit electrically connected to the common word detection unit for generating a second paragraph or the first paragraph according to a judgment result of the common segmented vocabulary judgment The subtitle sentence is incorporated into the first paragraph.

如請求項9所述之分段系統，其中，當該第一字幕句併入該第一段落後，該常見詞偵測單元更用以針對一第二字幕句進行該常見分段詞彙判斷；其中，該第二字幕句依照一特定順序在該第一字幕句之後。The segmentation system according to claim 9, wherein, after the first subtitle sentence is merged into the first paragraph, the common word detection unit is further used to determine the common segmented vocabulary for a second subtitle sentence; , The second subtitle sentence follows the first subtitle sentence in a specific order.

如請求項9所述之分段系統，其中當產生該第二段落後，該分段單元更用以利用該設定值根據一特定順序選取該些字幕句，將被選取的字幕句加入該第二段落。The segmentation system according to claim 9, wherein after generating the second paragraph, the segmentation unit is further used to select the subtitle sentences according to a specific order using the set value, and add the selected subtitle sentences to the first paragraph Two paragraphs.

如請求項9所述之分段系統，其中，該段落產生單元更用以根據該判斷結果執行以下步驟：如果該第一字幕句與該常見分段詞彙相關聯，進行一分段處理產生一第二段落，並利用該設定值根據一特定順序選取該些字幕句，將被選取的字幕句加入該第二段落；以及如果該第一字幕句不與該常見分段詞彙相關聯，該第一字幕句與該第一段落進行一相似值計算，如果相似，將該第一字幕句併入該第一段落。The segmentation system according to claim 9, wherein the paragraph generation unit is further used to perform the following steps based on the judgment result: If the first subtitle sentence is associated with the common segmented vocabulary, perform a segmentation process to generate a The second paragraph, and use the set value to select the subtitle sentences according to a specific order, add the selected subtitle sentence to the second paragraph; and if the first subtitle sentence is not associated with the common segmented vocabulary, the first A subtitle sentence is calculated with a similarity value to the first paragraph, and if similar, the first subtitle sentence is incorporated into the first paragraph.

如請求項12所述之分段系統，其中，該分段處理包含：根據該判斷結果決定該第一字幕句是否與一開頭分段詞彙以及一結尾分段詞彙的其中之一相關聯；如果該第一字幕句與該開頭分段詞彙相關聯，以該第一字幕句作為該第二段落的起始句；以及如果該第一字幕句與該結尾分段詞彙相關聯，以該第一字幕句作為該第一段落的結尾句。The segmentation system according to claim 12, wherein the segmentation processing includes: determining whether the first subtitle sentence is associated with one of a beginning segment vocabulary and an ending segment vocabulary according to the judgment result; if The first subtitle sentence is associated with the opening segment vocabulary, and the first subtitle sentence is used as the beginning sentence of the second paragraph; and if the first subtitle sentence is associated with the ending segment vocabulary, the first The subtitle sentence serves as the ending sentence of the first paragraph.

如請求項12所述之分段系統，其中，該相似值計算包含：比較該第一字幕句對應的至少一特徵與該第一段落對應的至少一特徵的一差異值是否大於一門檻值；如果該差異值小於該門檻值，將該第一字幕句併入該第一段落；以及如果該差異值不小於該門檻值，以該第一字幕句作為該第二段落的起始句，並利用該設定值根據該特定順序選取該些字幕句，將被選取的字幕句分為該第二段落。The segmentation system according to claim 12, wherein the calculation of the similarity value includes: comparing whether a difference value of at least one feature corresponding to the first subtitle sentence and at least one feature corresponding to the first paragraph is greater than a threshold value; If the difference value is less than the threshold value, merge the first subtitle sentence into the first paragraph; and if the difference value is not less than the threshold value, use the first subtitle sentence as the starting sentence of the second paragraph, and use the The setting value selects the subtitle sentences according to the specific order, and divides the selected subtitle sentences into the second paragraph.

如請求項14所述之分段系統，更包含：一關鍵字擷取單元，與該分段單元電性連接，用以從該些字幕句中提取出複數個關鍵字，該些關鍵字為該第一字幕句對應的至少一特徵。The segmentation system as described in claim 14 further includes: a keyword extraction unit electrically connected to the segmentation unit to extract a plurality of keywords from the subtitle sentences, the keywords being At least one feature corresponding to the first subtitle sentence.

如請求項15所述之分段系統，其中，該第一段落對應的至少一特徵，由該第一段落中的該些字幕句提取出的該些關字產生。The segmentation system according to claim 15, wherein at least one feature corresponding to the first paragraph is generated from the keywords extracted from the subtitle sentences in the first paragraph.

一種非暫態電腦可讀取媒體，包含至少一指令程序，由一處理器執行該至少一指令程序以實行一分段方法，其包含：接收一字幕資訊；其中，該字幕資訊包含複數個字幕句；根據一設定值選取該些字幕句，並將被選取的字幕句分為一第一段落；針對一第一字幕句進行一常見分段詞彙判斷；其中，該第一字幕句是該些字幕句的其中之一；以及根據該常見分段詞彙判斷的一判斷結果產生一第二段落或將該第一字幕句併入該第一段落。A non-transitory computer readable medium, including at least one instruction program, executed by a processor to execute a segmentation method, including: receiving a subtitle information; wherein the subtitle information includes a plurality of subtitles Sentences; select the subtitle sentences according to a set value, and divide the selected subtitle sentences into a first paragraph; perform a common segmented vocabulary judgment on a first subtitle sentence; wherein, the first subtitle sentence is the subtitle sentences One of the sentences; and generating a second paragraph based on a judgment result of the common segmented vocabulary or incorporating the first subtitle sentence into the first paragraph.