TWI825535B

TWI825535B - System, method and computer-readable medium for formulating potential hot word degree

Info

Publication number: TWI825535B
Application number: TW110148174A
Authority: TW
Inventors: 李函穎; 廖宜斌
Original assignee: 中華電信股份有限公司
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2023-12-11
Also published as: TW202326471A

Abstract

The invention discloses a system, method and computer-readable medium for formulating potential hot word degree. An extraction module extracts important words in the data. A potential hot word degree calculation module calculates a time series analysis indicator and a momentum evaluation indicator of the important word, and formulates a first potential hot word degree weight and a second potential hot word degree weight respectively according to importance of the time series analysis indicator and the momentum evaluation indicator, and then collocates the time series analysis indicator and the momentum evaluation indicator with the first potential hot word degree weight and the second potential hot word degree weight respectively to calculate a potential hot word degree of the important word. A potential hot word prediction module obtains the potential hot word degree of the important word, and further predicts the potential hot word according to the potential hot word degree of the important word.

Description

制定潛力熱詞度之系統、方法及電腦可讀媒介 System, method and computer-readable medium for formulating potential hot words

本發明係關於一種制定潛力熱詞度或預測潛力熱詞之技術，特別是指一種基於時序分析指標與動量評估指標制定潛力熱詞度之系統、方法及電腦可讀媒介。 The present invention relates to a technology for formulating the degree of potential hot words or predicting the degree of potential hot words. In particular, it refers to a system, method and computer-readable medium for formulating the degree of potential hot words based on time series analysis indicators and momentum evaluation indicators.

習知之潛力熱詞預測系統僅以點擊數、轉載數、評論數、話題關注度甚至網紅影響力等數據進行判斷，據以挑選出潛力熱詞。但此潛力熱詞預測系統並不能全面分析熱詞話題之特徵，亦無法發現潛在成為熱詞(潛力熱詞)之話題。 Xi Zhi's potential hot word prediction system only uses data such as the number of clicks, reprints, comments, topic attention, and even the influence of Internet celebrities to judge and select potential hot words. However, this potential hot word prediction system cannot fully analyze the characteristics of hot word topics, nor can it discover topics that may potentially become hot words (potential hot words).

再者，此潛力熱詞預測系統僅採用特定熱詞評估指標(單一熱詞評估指標)進行潛力熱詞之預測，亦即在潛力熱詞之預測及評估之實作層面上，僅會以特定熱詞評估指標(單一熱詞評估指標)作為衡量潛力熱詞之依據，卻因未考量時序分析指標與動量評估指標等多重評估因子以預測及評估重要詞，據以決定此重要詞是否為潛力熱詞，故無法適應多種不同的需求條件，也無法提供使用者預測出潛力熱詞之趨勢或分析服務。同時，此潛力熱詞預測系統在複雜的情況下，因未能採用重要詞之時序分析指標與動量評估指標等重要指標，便難以從潛力熱詞平台中有效且合適地挑選出最佳潛力熱詞。 Furthermore, this potential hot word prediction system only uses specific hot word evaluation indicators (single hot word evaluation indicators) to predict potential hot words. That is to say, at the implementation level of prediction and evaluation of potential hot words, it only uses specific hot word evaluation indicators. Hot word evaluation index (single hot word evaluation index) is used as the basis for measuring potential hot words, but it does not consider multiple evaluation factors such as time series analysis indicators and momentum evaluation indicators to predict and evaluate important words to determine whether this important word is a potential hot word. Hot words, so it cannot adapt to a variety of different demand conditions, nor can it provide users with trends or analysis services to predict potential hot words. At the same time, Under complex circumstances, this potential hot word prediction system fails to use important indicators such as timing analysis indicators and momentum evaluation indicators of important words, so it is difficult to effectively and appropriately select the best potential hot words from the potential hot word platform.

因此，如何提供一種創新之制定潛力熱詞度或預測潛力熱詞之技術，以解決上述之任一問題或提供相關之功能(技術/服務)，已成為本領域技術人員之一大研究課題。 Therefore, how to provide an innovative technology for formulating potential hot words or predicting potential hot words to solve any of the above problems or provide related functions (technology/services) has become a major research topic for those skilled in the art.

本發明提供一種創新之制定潛力熱詞度之系統、方法及電腦可讀媒介，係將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以產生重要詞之潛力熱詞度，或者依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞，抑或者採用斷詞模型與命名實體模型以依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞，又或者透過隱含狄利克雷分布(LDA)主題模型將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。 The present invention provides an innovative system, method and computer-readable medium for formulating the degree of potential hot words. The timing analysis index and the momentum evaluation index of important words are respectively matched with the weight of the first potential hot word and the second weight of the second potential hot word. To generate the potential hot words of important words, or to predict the potential hot words or the best potential hot words based on the potential hot words of important words, or to use the word segmentation model and the named entity model to predict the relationship between the word's part of speech and the named entity. Annotation is used to extract important words from the data, or the Latent Dirichlet Distribution (LDA) topic model is used to generate each article in the data in the form of a probability distribution to determine the topic of each article.

本發明之制定潛力熱詞度之系統包括：擷取模組，係擷取資料中之重要詞；潛力熱詞度計算模組，係計算擷取模組所擷取之資料中之重要詞之時序分析指標與動量評估指標，以由潛力熱詞度計算模組依據重要詞之時序分析指標與動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重，再由潛力熱詞度計算模組將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以計算出重要詞之潛力熱詞度；以及潛力熱詞預測模組，係取得潛力熱詞度計算模組所計算出之重要詞之潛力熱詞度，以由潛力熱詞預測模組依據重要詞之潛力熱詞度預測出潛力熱詞。 The system for formulating the degree of potential hot words of the present invention includes: a retrieval module, which retrieves important words in the data; and a calculation module, which calculates the degree of potential hot words, which calculates the important words in the data retrieved by the retrieval module. The timing analysis index and the momentum evaluation index are determined by the potential hot word degree calculation module according to the importance of the timing analysis index and the momentum evaluation index of the important words. The first potential hot word degree weight and the second potential hot word degree weight are respectively determined. The potential hot word degree calculation module combines the timing analysis index and momentum evaluation index of important words with the first potential hot word degree weight and the second potential hot word degree weight respectively to calculate the potential hot word degree of the important word; and the potential hot word degree. The word prediction module obtains the potential hot word degree of important words calculated by the potential hot word degree calculation module, and uses the potential hot word prediction module according to the Potential hot words are predicted based on their potential hot word degrees.

本發明之制定潛力熱詞度之方法包括：由擷取模組擷取資料中之重要詞；由潛力熱詞度計算模組計算擷取模組所擷取之資料中之重要詞之時序分析指標與動量評估指標，以由潛力熱詞度計算模組依據重要詞之時序分析指標與動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重，再由潛力熱詞度計算模組將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以計算出重要詞之潛力熱詞度；以及由潛力熱詞預測模組取得潛力熱詞度計算模組所計算出之重要詞之潛力熱詞度，以由潛力熱詞預測模組依據重要詞之潛力熱詞度預測出潛力熱詞。 The method of formulating the degree of potential hot words of the present invention includes: extracting important words from the data by an acquisition module; and using the potential hot word degree calculation module to calculate the timing analysis of the important words in the data extracted by the acquisition module. Indicators and momentum evaluation indicators, the potential hot word degree calculation module formulates the first potential hot word degree weight and the second potential hot word degree weight based on the importance of the time series analysis indicators and momentum evaluation indicators of important words, and then calculates the first potential hot word degree weight and the second potential hot word degree weight based on the potential The hot word degree calculation module combines the time series analysis indicators and momentum evaluation indicators of important words with the first potential hot word degree weight and the second potential hot word degree weight to calculate the potential hot word degree of important words; and based on the potential hot word degree The prediction module obtains the potential hot word degree of the important words calculated by the potential hot word degree calculation module, so that the potential hot word prediction module predicts the potential hot words based on the potential hot word degree of the important words.

本發明之電腦可讀媒介應用於計算裝置或電腦中，係儲存有指令，以執行上述制定潛力熱詞度之方法。 The computer-readable medium of the present invention is used in a computing device or computer and stores instructions to execute the above method of formulating potential hot words.

為讓本發明之上述特徵與優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明。在以下描述內容中將部分闡述本發明之額外特徵及優點，且此等特徵及優點將部分自所述描述內容可得而知，或可藉由對本發明之實踐習得。應理解，前文一般描述與以下詳細描述兩者均為例示性及解釋性的，且不欲約束本發明所欲主張之範圍。 In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and explained in detail with reference to the accompanying drawings. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not intended to limit the scope of the invention.

1:制定潛力熱詞度之系統 1: Develop a system for identifying potential hot words

10:資料蒐集模組 10:Data collection module

20:重要詞擷取模組 20: Key word extraction module

30:資料分群模組 30: Data grouping module

40:潛力熱詞度計算模組 40:Potential hot word calculation module

50:潛力熱詞預測模組 50:Potential hot word prediction module

A:斷詞模型 A: Word segmentation model

B:命名實體模型 B: Named entity model

C:隱含狄利克雷分布(LDA)主題模型 C: Latent Dirichlet distribution (LDA) topic model

I₁:時序分析指標 I ₁ : Timing analysis indicators

I₂:動量評估指標 I ₂ : Momentum Assessment Indicator

S1至S8:步驟 S1 to S8: Steps

W₁:第一潛力熱詞度權重 W ₁ : The first potential hot word weight

W₂:第二潛力熱詞度權重 W ₂ : The second potential hot word weight

圖1為本發明之制定潛力熱詞度之系統之架構示意圖。 Figure 1 is a schematic diagram of the architecture of a system for formulating potential hot word ratings according to the present invention.

圖2為本發明之制定潛力熱詞度之方法之流程示意圖。 FIG. 2 is a schematic flowchart of the method for formulating the degree of potential hot words according to the present invention.

圖3為本發明之制定潛力熱詞度及其方法中，有關重要詞之時序變化數值之示意圖。 Figure 3 is a schematic diagram illustrating the temporal change values of important words in the method and method for formulating potential hot words according to the present invention.

以下藉由特定的具體實施形態說明本發明之實施方式，熟悉此技術之人士可由本說明書所揭示之內容了解本發明之其它優點與功效，亦可因而藉由其它不同具體等同實施形態加以施行或運用。 The following describes the embodiments of the present invention through specific specific embodiments. Those skilled in the art can understand other advantages and effects of the present invention from the content disclosed in this specification, and can also implement it through other different specific equivalent embodiments or Use.

圖1為本發明之制定潛力熱詞度之系統1之架構示意圖，其中，該系統1係基於時序分析指標與動量評估指標而制定潛力熱詞度。如圖所示，該制定潛力熱詞度之系統1係自動化產生或計算出資料中之重要詞之潛力熱詞度以預測出潛力熱詞，並包括互相連接或通訊之一資料蒐集模組10、一重要詞擷取模組20、一資料分群模組30、一潛力熱詞度計算模組40及一潛力熱詞預測模組50等。此外，本發明所述「複數」代表二個以上(如二、三、四、五、十或百個以上)，「連接」或「通訊」代表無線或有線連接或通訊等，時序分析指標I₁或動量評估指標I₂可用「數值」呈現，且第一潛力熱詞度權重W₁或第二潛力熱詞度權重W₂可用「權重值」呈現。 Figure 1 is a schematic structural diagram of a system 1 for formulating the degree of potential hot words according to the present invention. The system 1 formulates the degree of potentially hot words based on timing analysis indicators and momentum evaluation indicators. As shown in the figure, the system 1 for formulating potential hot words automatically generates or calculates the potential hot words of important words in the data to predict potential hot words, and includes a data collection module 10 for interconnection or communication. , an important word extraction module 20, a data grouping module 30, a potential hot word degree calculation module 40 and a potential hot word prediction module 50, etc. In addition, the "plural number" mentioned in the present invention represents two or more (such as two, three, four, five, ten or more than one hundred), "connection" or "communication" represents wireless or wired connection or communication, etc., and the timing analysis index I ₁ or the momentum evaluation index I ₂ can be presented as a "numeric value", and the first potential hot word weight W ₁ or the second potential hot word weight W ₂ can be presented as a "weight value".

在一實施例中，資料蒐集模組10可為資料蒐集器(晶片/電路)、資料蒐集軟體(程式)等，重要詞擷取模組20可為重要詞擷取器(晶片/電路)、重要詞擷取軟體(程式)等，資料分群模組30可為資料分群器(晶片/電路)、資料分群軟體(程式)等，潛力熱詞度計算模組40可為潛力熱詞度軟體(程式)、潛力熱詞度運算器(晶片/電路)、潛力熱詞度運算軟體(程式)等，潛力熱詞預測模組50可為潛力熱詞預測器(晶片/電路)、潛力熱詞預測軟體(程式)等。但是，本發明並不以此為限。 In one embodiment, the data collection module 10 can be a data collector (chip/circuit), data collection software (program), etc., and the keyword retrieval module 20 can be a keyword capturer (chip/circuit), Keyword retrieval software (program), etc., the data grouping module 30 can be a data grouper (chip/circuit), data grouping software (program), etc., and the potential hot word degree calculation module 40 can be a potential hot word degree software ( program), potential hot word calculator (chip/circuit), potential hot word calculation software (program), etc., the potential hot word prediction module 50 can be a potential hot word predictor (chip/circuit), potential hot word prediction Software (program), etc. However, the present invention is not limited thereto.

資料蒐集模組10係蒐集資料(如資料集)，以由重要詞擷取模組20擷取資料中之重要詞(如複數不同的重要詞)，再由資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。潛力熱詞度計算模組40係計算重要詞之時序分析指標I₁(數值)與動量評估指標I₂(數值)等重要指標(多重評估因子)，以依據重要詞之時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂。 The data collection module 10 collects data (such as a data set), and the keyword extraction module 20 retrieves keywords (such as plural different keywords) in the data, and then the data grouping module 30 extracts keywords in the data according to the keywords in the data. Important words (such as plural different important words) are used to group the data. The potential hot word degree calculation module 40 calculates important indicators (multiple evaluation factors) such as the timing analysis index I ₁ (numeric value) and the momentum evaluation index I ₂ (numeric value) of important words, based on the timing analysis index I 1 and the momentum evaluation index I ₂ (numeric value) of important words. The importance of the momentum evaluation index I ₂ determines the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively.

潛力熱詞度計算模組40係將重要詞之時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂以產生或計算出重要詞之潛力熱詞度，例如由潛力熱詞度計算模組40依據「時序分析指標I₁乘以第一潛力熱詞度權重W₁」加上「動量評估指標I₂乘以第二潛力熱詞度權重W₂」所得到之加總(即I₁*W₁+I₂*W₂)計算出重要詞之潛力熱詞度，再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。最後，直到資料蒐集模組10取得新的資料(如資料集)，再進行下一輪次之潛力熱詞度之計算及潛力熱詞之預測等。 The potential hot word degree calculation module 40 combines the time series analysis index I ₁ and the momentum evaluation index I ₂ of important words with the first potential hot word degree weight W ₁ and the second potential hot word degree weight W ₂ to generate or calculate respectively The potential hot word degree of important words, for example, is calculated by the potential hot word degree calculation module 40 based on "the timing analysis index I ₁ multiplied by the first potential hot word degree weight W ₁ " plus "the momentum evaluation index I ₂ multiplied by the second potential word degree". The sum of the hot word degree weight W ₂ (i.e. I ₁ * W ₁ + I ₂ * W ₂ ) is used to calculate the potential hot word degree of the important word, and then the potential hot word prediction module 50 calculates the potential hot word degree based on the potential of the important word. Hot word degree predicts potential hot words or the best potential hot words. Finally, until the data collection module 10 obtains new data (such as a data set), the next round of calculation of potential hot words and prediction of potential hot words will be performed.

詳言之，潛力熱詞度計算模組40與潛力熱詞預測模組50可互相連接或通訊以傳遞潛力熱詞度。當潛力熱詞預測模組50向潛力熱詞度計算模組40發出潛力熱詞度之計算請求時，潛力熱詞度計算模組40可依據此計算請求計算出重要詞之潛力熱詞度，再將潛力熱詞度之計算結果傳送至潛力熱詞預測模組50。 Specifically, the potential hot word degree calculation module 40 and the potential hot word prediction module 50 can be connected or communicated with each other to transmit the potential hot word degree. When the potential hot word prediction module 50 sends a calculation request for the potential hot word degree to the potential hot word degree calculation module 40, the potential hot word degree calculation module 40 can calculate the potential hot word degree of the important word based on this calculation request. Then, the calculation result of the potential hot word degree is sent to the potential hot word prediction module 50 .

例如，潛力熱詞度計算模組40可接收基於大數據蒐集而得之資料(如資料集)，以計算資料中之重要詞之時序分析指標I₁與動量評估指標I₂等重要指標。潛力熱詞度計算模組40亦可採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如經過計算/標準化加上系統預設/使用者預設)，以依據重要詞之時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂，再由潛力熱詞度計算模組40將重要詞之時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂以產生或計算出重要詞之潛力熱詞度，俾由潛力熱詞預測模組50依據重要詞之潛力熱詞度以預測出潛力熱詞或最佳潛力熱詞。 For example, the potential hot word degree calculation module 40 can receive data (such as a data set) collected based on big data to calculate important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of important words in the data. The potential hot word calculation module 40 can also adopt an objective method (such as calculated/standardized), a preset method (such as system default/user default), or a mixed method of objective plus preset (such as calculated/standardized plus System default/user default) to respectively determine the first potential hot word weight W ₁ and the second potential hot word weight W based on the importance of the important words' time series analysis index I ₁ and momentum evaluation index I ₂ ₂ , and then the potential hot word degree calculation module 40 combines the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words with the first potential hot word degree weight W ₁ and the second potential hot word degree weight W _{2 respectively} to generate Or calculate the potential hot words of the important words, so that the potential hot words prediction module 50 can predict the potential hot words or the best potential hot words based on the potential hot words of the important words.

因此，本發明之潛力熱詞度計算模組40係利用一潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法)，將重要詞之時序分析指標I₁與動量評估指標I₂分別乘以第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂後進行加總以產生或計算出重要詞之潛力熱詞度，再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞，更能有效且合適地挑選出潛力熱詞或最佳潛力熱詞，也利於提供最佳分析服務予使用者。 Therefore, the potential hot word degree calculation module 40 of the present invention uses a potential hot word degree calculation formula (such as a potential hot word degree calculation formula or a potential hot word degree algorithm) to combine the time series analysis index I ₁ and momentum of important words. The evaluation index I ₂ is multiplied by the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively, and then summed to generate or calculate the potential hot word degree of the important words, and then the potential hot word prediction model is used. Group 50 predicts potential hot words or the best potential hot words based on the potential hot words of important words, which can more effectively and appropriately select potential hot words or the best potential hot words, and is also conducive to providing the best analysis services to users. .

再者，本發明採用更廣義之潛力熱詞評估方式，先由潛力熱詞度計算模組40將重要詞之時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂以自動化或系統化產生(計算出)重要詞之潛力熱詞度，再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞，俾利於因應複雜情況以有效且合適地挑選出潛力熱詞或最佳潛力熱詞。是以，相較於習知僅採用特定熱詞評估指標(單一熱詞評估指標)，本發明依據重要詞之時序分析指標I₁與動量評估指標I₂等重要指標(多重評估因子)，更能選擇到最佳潛力熱詞。 Furthermore, the present invention adopts a broader potential hot word evaluation method. First, the potential hot word degree calculation module 40 matches the timing analysis index I ₁ and the momentum evaluation index I ₂ of important words with the first potential hot word degree weight W respectively. ₁ and the second potential hot word degree weight W ₂ automatically or systematically generate (calculate) the potential hot word degree of the important words, and then the potential hot word prediction module 50 predicts the potential hot word degree based on the potential hot word degree of the important words. words or the best potential hot words, in order to effectively and appropriately select the potential hot words or the best potential hot words in response to complex situations. Therefore, compared with the conventional practice of only using specific hot word evaluation indicators (single hot word evaluation index), the present invention is based on important indicators (multiple evaluation factors) such as the time series analysis index I ₁ and the momentum evaluation index I ₂ of important words. Can select the best potential hot words.

圖2為本發明之制定潛力熱詞度之方法之流程示意圖，並參閱圖1予以說明。同時，該制定潛力熱詞度之方法之主要技術內容如下，其餘內容相同於上述圖1之說明，於此不再重覆敘述。 Figure 2 is a schematic flow chart of the method for formulating potential hot words according to the present invention. See Figure 1 for explanation. At the same time, the main technical contents of this method of formulating the potential hot word index are as follows. The rest of the contents are the same as the description in Figure 1 above, and will not be repeated here.

如圖2所示，在步驟S1中，資料蒐集模組10蒐集資料(如資料集)。亦即，資料蒐集模組10可蒐集輿情新聞、輿情文章、論壇及/或社群資料等各種資料(如大數據之資料集)。 As shown in Figure 2, in step S1, the data collection module 10 collects data (such as a data set). That is, the data collection module 10 can collect various data (such as big data data sets) such as public opinion news, public opinion articles, forums and/or community data.

在步驟S2中，重要詞擷取模組20擷取資料(如資料集)中之重要詞(如複數不同的重要詞)。亦即，重要詞擷取模組20可透過例如為條件隨機場(Conditional Random Field；CRF)斷詞模型之斷詞模型A對資料進行斷詞及詞彙之詞性標註，且使用命名實體模型B對資料之詞彙進行命名實體之標註，以由重要詞擷取模組20依據詞頻(Term Frequency；TF)選出詞彙之詞性(如代表詞性)，再由重要詞擷取模組20依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。例如，詞彙之詞性可為名詞、代名詞、動詞、形容詞或副詞等詞類(如八大詞類)，命名實體可為人名、地名、組織、時間或地點等，但不以此為限。 In step S2, the keyword retrieval module 20 retrieves keywords (eg, plural different keywords) in the data (eg, data set). That is, the important word extraction module 20 can segment the data and perform part-of-speech tagging of words through the segmentation model A, such as a Conditional Random Field (CRF) segmentation model, and use the named entity model B to segment the data. The vocabulary of the data is tagged with named entities, so that the important word extraction module 20 selects the part of speech (such as representing part of speech) of the vocabulary according to the term frequency (Term Frequency; TF), and then the important word extraction module 20 selects the part of speech and the part of speech of the vocabulary. Named entity annotation extracts important words from the data. For example, the part of speech of the vocabulary can be noun, pronoun, verb, adjective or adverb and other parts of speech (such as the eight major parts of speech), and the named entity can be a person's name, place name, organization, time or place, etc., but is not limited to this.

在步驟S3中，資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。亦即，資料分群模組30可將資料中之重要詞作為特徵向量，以由資料分群模組30依據資料中之重要詞採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如經過計算/標準化加上系統預設/使用者預設)制定資料之分群數(分群之數量)。又，資料分群模組30可透過隱含狄利克雷分布(Latent Dirichlet Allocation；LDA)主題模型C將資料中每篇文章按照機率分布之形式產出(產生出來)，以決定每篇文章之主題。 In step S3, the data grouping module 30 groups the data according to important words in the data (such as plural different keywords). That is, the data grouping module 30 can use the important words in the data as feature vectors, so that the data grouping module 30 adopts an objective method (such as calculation/standardization) or a preset method (such as system preset) based on the important words in the data. Set/user default) or a hybrid method of objective plus default (such as calculated/standardized plus system default/user default) to determine the number of data groups (the number of groups). In addition, the data grouping module 30 can output (generate) each article in the data in the form of a probability distribution through the Latent Dirichlet Allocation (LDA) topic model C to determine the topic of each article. .

在步驟S4中，潛力熱詞度計算模組40計算重要詞之時序分析指標I₁與動量評估指標I₂等重要指標。亦即，潛力熱詞度計算模組40可計算重要詞擷取模組20所擷取之資料中之重要詞之時序分析指標I₁與動量評估指標I₂等，以供後續進行潛力熱詞度之計算。因此，潛力熱詞度計算模組40可藉由重要詞之時序分析指標I₁與動量評估指標I₂等重要指標以因應複雜情況，亦利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效，還能提供最佳分析服務予使用者。 In step S4, the potential hot word degree calculation module 40 calculates important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words. That is to say, the potential hot word degree calculation module 40 can calculate the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words in the data captured by the important word extraction module 20 for subsequent processing of potential hot words. Calculation of degrees. Therefore, the potential hot word degree calculation module 40 can respond to complex situations by using important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of important words, which will also help the subsequent potential hot word prediction module 50 to be effective and efficient. Appropriately selecting potential hot words or the effects of the best potential hot words can also provide the best analysis services to users.

在步驟S5中，潛力熱詞度計算模組40依據時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂。亦即，潛力熱詞度計算模組40可採取客觀方式、預設方式或客觀加預設之混合方式，以依據重要詞之時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂。 In step S5, the potential hot word degree calculation module 40 respectively formulates the first potential hot word degree weight W ₁ and the second potential hot word degree weight W ₂ according to the importance of the timing analysis index I ₁ and the momentum evaluation index I ₂ . That is to say, the potential hot word degree calculation module 40 can adopt an objective method, a preset method, or a mixed method of objective and preset, to respectively formulate the third index according to the importance of the time series analysis index I ₁ and the momentum evaluation index I ₂ of the important words. The first potential hot word weight W ₁ and the second potential hot word weight W ₂ .

當計算重要詞之時序分析指標I₁時，潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)以評估資料中之重要詞之時序變化，再由潛力熱詞度計算模組40透過有關時序分析指標I₁之動態主題模型(Dynamic Topic Models；DTM)與動態影響模型(Dynamic Influence Models；DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。 When calculating the temporal analysis index I ₁ of important words, the potential hot word degree calculation module 40 can use data (such as each cluster data set) based on the latent Dirichlet distribution (LDA) topic model C to evaluate the content of the data. The temporal changes of the important words are then used by the potential hot word calculation module 40 through at least one of the dynamic topic models (Dynamic Topic Models; DTM) and the dynamic influence models (Dynamic Influence Models; DIM) of the relevant temporal analysis index I ₁ To generate temporal change values or temporal analysis results of important words.

當計算重要詞之動量評估指標I₂時，潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)，以將資料(如各群資料集)中之重要詞視為一物體在此物體之運動方向上保持運動之趨勢，並藉由有關動量評估指標I₂之動量演算法(如動量公式) 搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果，且潛力熱詞度計算模組40可使用重要詞對於時間之詞頻(Term Frequency；TF)結果或詞頻-逆向文件頻率(Term Frequency-Inverted Document Frequency；TF-IDF)結果以產生此統計分析結果。 When calculating the momentum evaluation index I ₂ of important words, the potential hot word degree calculation module 40 can use data (such as each cluster data set) based on the latent Dirichlet distribution (LDA) topic model C to combine the data The important words in (such as each group data set) are regarded as the tendency of an object to maintain motion in the direction of motion of the object, and are combined with the statistical analysis results through the momentum algorithm (such as the momentum formula) related to the momentum evaluation index I ₂ Calculate the momentum value of the important words and generate the momentum analysis results of the important words, and the potential hot word calculation module 40 can use the Term Frequency (TF) results of the important words with respect to time or the Term Frequency-Inverse File Frequency (Term Frequency- Inverted Document Frequency; TF-IDF) results to produce this statistical analysis results.

在步驟S6中，潛力熱詞度計算模組40將重要詞之時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂以產生或計算出重要詞之潛力熱詞度。亦即，潛力熱詞度計算模組40可利用潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法)，將重要詞之時序分析指標I₁與動量評估指標I₂分別乘以第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂後進行加總(如線性加總)，以產生或計算出重要詞之潛力熱詞度。 In step S6, the potential hot word degree calculation module 40 combines the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words with the first potential hot word degree weight W ₁ and the second potential hot word degree weight W ₂ respectively. Generate or calculate the potential popularity of important words. That is to say, the potential hot word degree calculation module 40 can use the potential hot word degree calculation formula (such as the potential hot word degree calculation formula or the potential hot word degree algorithm) to combine the important word's time series analysis index I ₁ and the momentum evaluation index I ₂ are multiplied by the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively, and then summed (such as linear summation) to generate or calculate the potential hot word degree of the important word.

在步驟S7中，潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。亦即，潛力熱詞預測模組50可接收或取得潛力熱詞度計算模組40所產生或計算之重要詞之潛力熱詞度，以由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞(如預測出此重要詞是否為潛力熱詞或最佳潛力熱詞)，有利於預先察覺潛力熱詞之趨勢。 In step S7, the potential hot word prediction module 50 predicts the potential hot words or the best potential hot words according to the potential hot word degree of the important words. That is, the potential hot word prediction module 50 can receive or obtain the potential hot word degree of important words generated or calculated by the potential hot word degree calculation module 40, so that the potential hot word prediction module 50 can calculate the potential hot word degree of the important word based on the potential hot word degree. Word degree predicts potential hot words or the best potential hot words (such as predicting whether this important word is a potential hot word or the best potential hot word), which is helpful to detect the trend of potential hot words in advance.

上述步驟S4至步驟S7中，潛力熱詞度計算模組40可將時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂，以自動化產生或計算出重要詞之潛力熱詞度，有利於藉由重要詞之時序分析指標I₁與動量評估指標I₂等重要指標以因應複雜情況，亦能自動化或系統化產生(計算出)重要詞之潛力熱詞度。然後，潛力熱詞預測模組50可依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞，以利達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效。 In the above steps S4 to S7, the potential hot word degree calculation module 40 can match the timing analysis index I ₁ and the momentum evaluation index I ₂ with the first potential hot word degree weight W ₁ and the second potential hot word degree weight W ₂ respectively. , to automatically generate or calculate the potential hot words of important words, which is conducive to responding to complex situations through important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of important words, and can also be automatically or systematically generated (calculation (out) the potential popularity of important words. Then, the potential hot word prediction module 50 can predict the potential hot words or the best potential hot words based on the potential hot words of the important words, so as to effectively and appropriately select the potential hot words or the best potential hot words. .

在步驟S8中，當資料蒐集模組10取得新的資料(如資料集)時，再重新返回步驟S1至步驟S7，以進行下一輪次之潛力熱詞度之計算及潛力熱詞之預測等。 In step S8, when the data collection module 10 obtains new data (such as a data set), it returns to step S1 to step S7 to perform the next round of calculation of potential hot words and prediction of potential hot words. .

以下說明本發明之制定潛力熱詞度之系統1及其方法之實施例。本實施例係採用輿情新聞等資料(如資料集)，以由重要詞擷取模組20擷取資料中之重要詞(如複數不同的重要詞)作為特徵向量，再由資料分群模組30透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。繼之，由潛力熱詞度計算模組40依據時序分析與動量評估等需求條件分別訂定重要詞之時序分析指標I₁與動量評估指標I₂等重要指標，以提升潛力熱詞之預測及掌握潛力熱詞之趨勢為目標，再將時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂進行加總(如線性加總)以得到重要詞之潛力熱詞度，俾由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。 The following describes the embodiments of the system 1 and the method for formulating potential hot words according to the present invention. In this embodiment, public opinion news and other data (such as a data set) are used, and the keyword extraction module 20 extracts the keywords (such as plural different keywords) in the data as feature vectors, and then the data grouping module 30 Through the latent Dirichlet distribution (LDA) topic model C, each article in the data is output in the form of a probability distribution to determine the topic of each article. Subsequently, the potential hot word degree calculation module 40 determines important indicators such as the time series analysis index I ₁ and the momentum evaluation index I ₂ of the important words based on demand conditions such as timing analysis and momentum evaluation, so as to improve the prediction and prediction of potential hot words. With the goal of grasping the trend of potential hot words, the time series analysis index I ₁ and the momentum evaluation index I ₂ are combined with the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively for summing (such as linear addition) (total) to obtain the potential hot words of the important words, so that the potential hot words prediction module 50 can predict the potential hot words or the best potential hot words based on the potential hot words of the important words.

因此，本發明在時序分析指標I₁之數值低、或動量評估指標I₂之數值低、或時序分析指標I₁之數值高但動量評估指標I₂之數值低等各種複雜的組合條件下，仍能預測較合適或最佳潛力熱詞，亦能有效掌握潛力熱詞之趨勢。 Therefore, in the present invention, under various complex combination conditions such as a low value of the time series analysis index I ₁ , a low value of the momentum evaluation index I ₂ , or a high value of the time series analysis index I ₁ but a low value of the momentum evaluation index I ₂ , It can still predict the more suitable or best potential hot words, and can also effectively grasp the trend of potential hot words.

舉例而言，資料蒐集模組10可蒐集或採用輿情新聞1個月之資料，以供使用者利用即時方式於線上設定欲查詢之資料範圍及時間區間。接著，重要詞擷取模組20可擷取每筆資料中之重要詞來組成序列作為特徵向量，再由資料分群模組30依據層次之隱含狄利克雷分布(Hierarchical LDA)主題模型以次高層級之層級數(如層級數為3)作為資料之分群數，再透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。 For example, the data collection module 10 can collect or use public opinion news data for one month, allowing the user to set the data range and time zone to be queried online in real time. between. Next, the important word extraction module 20 can extract the important words in each piece of data to form a sequence as a feature vector, and then the data grouping module 30 follows the Hierarchical LDA topic model. The number of high-level levels (for example, the number of levels is 3) is used as the number of clusters of the data, and then each article in the data is output in the form of a probability distribution through the latent Dirichlet distribution (LDA) topic model C to determine each article. theme.

資料分群模組30可利用客觀方式、預設方式、或客觀加預設之混合方式(即客觀方式加上預設方式)，例如利用客觀加預設之混合方式(如取客觀方式與預設方式之平均數或不同比例)，將此資料之分群數制定為5群，使主題數等於次高層級之層級數(3)加上資料之分群數(5)兩者之平均數，即主題數=(3+5)/2=4，故資料分群模組30可透過隱含狄利克雷分布(LDA)主題模型C將資料分成第一主題群Topic₁、第二主題群Topic₂、第三主題群Topic₃及第四主題群Topic₄。然後，潛力熱詞度計算模組40可計算時序分析指標I₁與動量評估指標I₂，並分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂，以進一步計算出潛力熱詞度及預測出潛力熱詞或最佳潛力熱詞。 The data grouping module 30 may use an objective method, a default method, or a mixed method of objective and default (ie, objective method plus default method), for example, use a mixed method of objective plus default (such as an objective method and a default method). (average or different proportions of the method), set the number of clusters of this data to 5 clusters, so that the number of topics is equal to the number of levels of the next highest level (3) plus the average number of clusters of the data (5), that is, topics Number = (3+5)/2=4, so the data grouping module 30 can divide the data into the first topic group Topic ₁ , the second topic group Topic ₂ , and the third topic group through the latent Dirichlet distribution (LDA) topic model C. The three topic groups Topic ₃ and the fourth topic group Topic ₄ . Then, the potential hot word degree calculation module 40 can calculate the timing analysis index I ₁ and the momentum evaluation index I ₂ , and formulate the first potential hot word degree weight W ₁ and the second potential hot word degree weight W ₂ respectively for further calculation. Find the potential hot words and predict the potential hot words or the best potential hot words.

詳言之，在圖2之步驟S1中，資料蒐集模組10可蒐集資料並持續蒐集新的資料(如資料集)，再以即時方式或定期方式上傳資料。亦即，使用者可利用即時方式於線上設定欲查詢之資料範圍及時間區間，也可利用定期方式設定系統排程所需之資料範圍、評估起迄時間及/或頻率。 Specifically, in step S1 of FIG. 2 , the data collection module 10 can collect data and continuously collect new data (such as data sets), and then upload the data in real-time or periodically. That is, users can use real-time methods to set the data range and time interval to be queried online, and they can also use periodic methods to set the data range, evaluation start and end time and/or frequency required for system scheduling.

在圖2之步驟S2中，重要詞擷取模組20擷取資料(如資料集)中之重要詞(如複數不同的重要詞)。亦即，重要詞擷取模組20可透過例如為條件隨機場(CRF)斷詞模型之斷詞模型A對資料進行斷詞及詞彙之詞性標註，且使用命名實體模型B對資料之詞彙進行命名實體之標註，以由重要詞擷取模組20依據詞頻(TF)選出詞彙之詞性(如代表詞性)，再依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。 In step S2 of FIG. 2 , the keyword retrieval module 20 retrieves keywords (such as plural different keywords) in the data (such as a data set). That is, the important word extraction module 20 can segment the data and words of the vocabulary through the segmentation model A, such as the conditional random field (CRF) segmentation model. gender annotation, and use the named entity model B to annotate the vocabulary of the data with named entities, so that the important word extraction module 20 selects the part of speech of the vocabulary (such as representing the part of speech) based on the word frequency (TF), and then selects the part of speech of the vocabulary according to the part of speech and naming of the vocabulary Entity annotation extracts important words from the data.

在圖2之步驟S3中，資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。亦即，資料分群模組30可將資料中之重要詞作為特徵向量，以依據資料中之重要詞採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如經過計算/標準化加上系統預設/使用者預設)制定資料之分群數。又，資料分群模組30可透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出，以決定每篇文章之主題。 In step S3 of FIG. 2 , the data grouping module 30 groups the data according to important words in the data (such as plural different important words). That is to say, the data grouping module 30 can use the key words in the data as feature vectors to adopt an objective method (such as calculation/standardization) or a preset method (such as system default/user default) based on the key words in the data. ) or a mixed method of objective and preset (such as calculation/standardization plus system default/user default) to formulate the number of data clusters. In addition, the data grouping module 30 can generate each article in the data in the form of probability distribution through the latent Dirichlet distribution (LDA) topic model C to determine the topic of each article.

在圖2之步驟S4中，潛力熱詞度計算模組40計算重要詞之時序分析指標I₁與動量評估指標I₂等重要指標。亦即，本發明可藉由重要詞之時序分析指標I₁與動量評估指標I₂等重要指標以因應複雜情況，也利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效，還能提供最佳分析服務予使用者。 In step S4 of FIG. 2 , the potential hot word degree calculation module 40 calculates important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words. That is to say, the present invention can respond to complex situations through important indicators such as the time series analysis index I ₁ and the momentum evaluation index I ₂ of important words, and also facilitates the subsequent potential hot word prediction module 50 to effectively and appropriately select potential words. The effect of hot words or the best potential hot words can also provide the best analysis services to users.

當計算重要詞之時序分析指標I₁時，潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)以評估資料中之重要詞之時序變化，再由潛力熱詞度計算模組40透過有關時序分析指標I₁之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。 When calculating the temporal analysis index I ₁ of important words, the potential hot word degree calculation module 40 can use data (such as each cluster data set) based on the latent Dirichlet distribution (LDA) topic model C to evaluate the content of the data. The temporal change of the important words is then used by the potential hot word degree calculation module 40 to generate the temporal change value of the important words through at least one of the dynamic topic model (DTM) and the dynamic influence model (DIM) of the relevant temporal analysis index I ₁ or timing analysis results.

資料分群模組30可將資料進行分群以得到分群之結果為Topic_i(其中i=1,2,...,r，r代表分群數)，再依據分群之結果對各分群以每筆資料中之重要詞來組成序列作為特徵向量，俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(Time Slices)及擷取重要詞中前幾名之數量TNN(Top n Number)。 The data grouping module 30 can group the data to obtain the grouping result as Topic _i (where i=1,2,...,r, r represents the number of groups), and then according to the grouping result, each data group is The important words in them are used to form a sequence as a feature vector, so that the data grouping module 30 adopts an objective method, a preset method, or a mixed method of objective and preset to determine the number of time segments TS (Time Slices) and retrieve the key words in the middle. The number of TNN (Top n Number).

潛力熱詞度計算模組40可透過有關時序分析指標I₁之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變化數值

，且將重要詞之時序變化數值以函數表示為

，再依據重要詞之時序分析結果記錄重要詞之時序變化量與時間的乘積之總和(即重要詞之時序變化量對於時間t之一階微分與時間的乘積之總和)。因此，潛力熱詞度計算模組40可採取正規化方式將重要詞之時序變化量與時間的乘積之總和進行正規化以作為重要詞之時序分析指標I₁，如下列公式(1)所示。 The potential hot word degree calculation module 40 can generate temporal change values of important words through at least one of a dynamic topic model (DTM) and a dynamic influence model (DIM) related to the temporal analysis index I ₁

, and express the temporal change value of important words as a function as

, and then record the sum of the products of the temporal changes of the important words and time (that is, the sum of the products of the first-order differential of the temporal changes of the important words with respect to time t and time) based on the timing analysis results of the important words. Therefore, the potential hot word degree calculation module 40 can use a normalization method to normalize the sum of the products of the temporal changes of important words and time as the temporal analysis index I ₁ of important words, as shown in the following formula (1) .

在上列公式(1)中，j=1,2,...,TNN；k=1,2,...,TS；k-1<t<k。t代表時間，TNN代表重要詞中前幾名之數量，TS代表時間區段數量。

(t)代表重要詞之時序變化數值之函數表示，

代表重要詞之時序變化數值。而且，潛力熱詞度計算模組40可採取包括映射函數方式之正規化方式，以將重要詞之時序變化量轉為0至1之間，且映射函數方式包括Sigmoid函數方式(亦稱S型函數方式)與最小最大正規化(Min-Max Normalization)方式之至少一者。 In the above formula (1), j=1,2,...,TNN; k=1,2,...,TS; k-1<t<k. t represents time, TNN represents the number of top words in important words, and TS represents the number of time segments.

(t) represents the functional expression of the temporal change value of important words,

Represents the temporal change value of important words. Moreover, the potential hot word degree calculation module 40 can adopt a normalization method including a mapping function method to convert the temporal variation of important words to between 0 and 1, and the mapping function method includes a Sigmoid function method (also known as S-type At least one of functional method) and min-max normalization method.

例如，資料分群模組30可將資料進行分群以得到分群之結果為Topic_i(其中i=1,2,...,4)，再依據分群之結果對各分群以每筆資料中之重要詞來組成序列作為特徵向量，俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(如TS=10)及擷取重要詞中前幾名之數量TNN(如TNN=20)，進而由潛力熱詞度計算模組40透過有關時序分析指標I₁之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。 For example, the data grouping module 30 can group the data to obtain the grouping result Topic _i (where i=1,2,...,4), and then classify each group according to the importance of each piece of data based on the grouping result. words to form a sequence as a feature vector, so that the data grouping module 30 adopts an objective method, a preset method, or a mixed method of objective and preset to determine the number of time segments TS (such as TS=10) and retrieve the first few important words. The number of names TNN (such as TNN=20), and then the potential hot word calculation module 40 generates important words through at least one of the dynamic topic model (DTM) and the dynamic influence model (DIM) of the relevant time series analysis indicator I ₁ Timing change values or timing analysis results.

舉例而言，如圖3所示之重要詞(例如“疫情”)的時序變化數值之示意圖。以第一主題群Topic₁及重要詞之重要度第1名“疫情”為例，潛力熱詞度計算模組40對於重要詞之時序變化數值分別為：

、

、

、

、

、

、

、

9、

、

，進而將重要詞之時序變化數值以函數表示為如下列公式(2)所示，其中t代表時間。 For example, Figure 3 shows a schematic diagram of the temporal changes in values of important words (such as "epidemic"). Taking the first topic group Topic ₁ and the first important word "epidemic" as an example, the time series change values of the important words by the potential hot word calculation module 40 are:

,

9,

,

, and then express the temporal change value of important words as a function as shown in the following formula (2), where t represents time.

繼之，潛力熱詞度計算模組40可進行公式(2)中時間t之一階微分如下列公式(3)所示。 Subsequently, the potential hot word degree calculation module 40 can perform a first-order differentiation of the time t in the formula (2) as shown in the following formula (3).

假設已計算出

，

，則潛力熱詞度計算模組40可正規化重要詞之時序變化量與時間的乘積之總和以作為重要詞如“疫情”之時序分析指標I₁，即依據上述公式(1)計算出下列公式(4)所示之時序分析指標I₁等於0.47。 Assume that it has been calculated

,

, then the potential hot word degree calculation module 40 can normalize the sum of the products of the temporal changes of important words and time as the temporal analysis index I ₁ of important words such as "epidemic", that is, calculate the following according to the above formula (1) The time series analysis indicator I ₁ shown in formula (4) is equal to 0.47.

當計算重要詞之動量評估指標I₂時，潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)，以將資料(如各群資料集)中之重要詞視為一物體在此物體之運動方向上保持運動之趨勢，並藉由有關動量評估指標I₂之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果，且潛力熱詞度計算模組40可使用重要詞對於時間之詞頻(TF)結果或詞頻-逆向文件頻率(TF-IDF)結果以產生此統計分析結果。因此，由於動量評估指標I₂可以衡量重要詞之動量值，故潛力熱詞度計算模組40依據動量評估指標I₂計算出重要詞之潛力熱詞度，有利於潛力熱詞預測模組50依據重要詞之潛力熱詞度更合適地挑選出潛力熱詞或最佳潛力熱詞。 When calculating the momentum evaluation index I ₂ of important words, the potential hot word degree calculation module 40 can use data (such as each cluster data set) based on the latent Dirichlet distribution (LDA) topic model C to combine the data The important words in (such as each group data set) are regarded as the tendency of an object to maintain motion in the direction of motion of the object, and the momentum calculation method (such as the momentum formula) related to the momentum evaluation index I ₂ is used together with the statistical analysis results to Calculate the momentum value of the important words and generate the momentum analysis results of the important words, and the potential hot word calculation module 40 can use the word frequency (TF) results or the word frequency-inverse document frequency (TF-IDF) results of the important words with respect to time to produces the results of this statistical analysis. Therefore, since the momentum evaluation index I ₂ can measure the momentum value of important words, the potential hot word degree calculation module 40 calculates the potential hot word degree of the important word based on the momentum evaluation index I ₂ , which is beneficial to the potential hot word prediction module 50 Potential hot words or the best potential hot words are more appropriately selected based on the potential hot words of important words.

詳言之，資料分群模組30可將資料進行分群以得到分群之結果為Topic_i(其中i=1,2,...,r，r代表分群數)，再採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(Time Slices)及擷取重要詞中前幾名之數量TNN(Top n Number)。潛力熱詞度計算模組40可將資料中之重要詞(如Word_j，其中j=1,2,...,TNN)視為一物體在此物體之運動方向上保持運動之趨勢，並藉由有關動量評估指標I₂之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果。例如，動量演算法為

；其中，

代表重要詞之動量，單位為公斤‧公尺/秒(kg‧m/s)；

代表重要詞之質量，單位為公斤(kg)；

代表重要詞之速度，單位為公尺/秒(m/s)。 Specifically, the data grouping module 30 can group the data to obtain the grouping result Topic _i (where i=1,2,...,r, r represents the number of groups), and then adopt an objective method and a default method Or use a mixed method of objective and preset to determine the number of time segments TS (Time Slices) and retrieve the number of top words in the keywords TNN (Top n Number). The potential hot word degree calculation module 40 can regard important words in the data (such as Word _j , where j=1,2,...,TNN) as an object that keeps moving in the direction of movement of the object, and The momentum value of important words is calculated by using the momentum algorithm (such as the momentum formula) related to the momentum evaluation index I ₂ and the statistical analysis results to generate the momentum analysis results of the important words. For example, the momentum calculus is

;in,

Represents the momentum of important words, the unit is kilogram·meter/second (kg·m/s);

Represents the mass of the important word, in kilograms (kg);

Represents the speed of the important word, in meters per second (m/s).

潛力熱詞度計算模組40可依據統計分析結果取重要詞對於每個時間區段之詞頻

(其中j=1,2,...,TNN以及t=1,2,...,TS)以制定重要詞之質量

及重要詞之速度

。同時，潛力熱詞度計算模組40 可將重要詞之質量

取重要詞對於每個時間區段之詞頻之和，亦即

；以及將重要詞之速度

取重要詞對於每個時間區段之詞頻與時間的乘積之和，亦即

。因此，潛力熱詞度計算模組40可採取正規化方式將重要詞之動量值進行正規化以作為重要詞之動量評估指標I₂，如下列公式(5)所示。 The potential hot word calculation module 40 can calculate the word frequency of important words for each time period based on the statistical analysis results.

(where j=1,2,...,TNN and t=1,2,...,TS) to formulate the quality of important words

and the speed of key words

. At the same time, the potential hot word calculation module 40 can calculate the quality of important words

Take the sum of word frequencies of important words for each time period, that is

; and the speed of important words

Take the sum of the products of word frequency and time of important words for each time period, that is

. Therefore, the potential hot word degree calculation module 40 can adopt a normalization method to normalize the momentum value of the important words as the momentum evaluation index I ₂ of the important words, as shown in the following formula (5).

在上列公式(5)中，j=1,2,...,TNN；t=1,2,...,TS。TNN代表重要詞中前幾名之數量，TS代表時間區段數量，

代表詞頻。

代表重要詞之質量，

代表重要詞之速度。 In the above formula (5), j=1,2,...,TNN; t=1,2,...,TS. TNN represents the number of top keywords, TS represents the number of time segments,

Represents word frequency.

Represents the quality of important words,

Represents the speed of important words.

資料分群模組30可採取客觀方式、預設方式或客觀加預設之混合方式，以制定時間區段數量TS及擷取重要詞中前幾名之數量TNN。而且，潛力熱詞度計算模組40可採取包括映射函數方式之正規化方式，以將重要詞之動量值轉為0至1之間，且映射函數方式包括Sigmoid函數方式(亦稱S型函數方式)與最小最大正規化(Min-Max Normalization)方式之至少一者。 The data grouping module 30 can adopt an objective method, a preset method, or a mixed method of objective and preset to formulate the number of time segments TS and retrieve the number TNN of the top keywords. Moreover, the potential hot word degree calculation module 40 can adopt a normalization method including a mapping function method to The momentum value of the important words is converted to between 0 and 1, and the mapping function method includes at least one of the Sigmoid function method (also known as the S-shaped function method) and the minimum-max normalization (Min-Max Normalization) method.

資料分群模組30可將資料進行分群以得到分群之結果為Topic_i(其中i=1,2,...,4)，再依據分群之結果對各分群以每筆資料中之重要詞來組成序列作為特徵向量，俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(如TS=10)及擷取重要詞中前幾名之數量TNN(如TNN=20)。 The data grouping module 30 can group the data to obtain the grouping result as Topic _i (where i=1,2,...,4), and then classify each group based on the important words in each piece of data based on the grouping result. The composition sequence is used as a feature vector, so that the data grouping module 30 adopts an objective method, a preset method, or a mixed method of objective and preset to determine the number of time segments TS (such as TS=10) and retrieve the top few of the key words. Number TNN (such as TNN=20).

同樣地，以第一主題群Topic₁及重要詞之重要度第1名如“疫情”為例，潛力熱詞度計算模組40可藉由有關動量評估指標I₂之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果。例如，動量演算法為

；其中，

代表重要詞之動量，單位為公斤‧公尺/秒(kg‧m/s)；

代表重要詞之質量，單位為公斤(kg)；

代表重要詞之速度，單位為公尺/秒(m/s)。 Similarly, taking Topic ₁ of the first topic group and the first important word such as "epidemic" as an example, the potential hot word calculation module 40 can use the momentum algorithm (such as momentum) related to the momentum evaluation index I ₂ Formula) is combined with the statistical analysis results to calculate the momentum value of the important words and generate the momentum analysis results of the important words. For example, the momentum calculus is

;in,

Represents the mass of the important word, in kilograms (kg);

Represents the speed of the important word, in meters per second (m/s).

以制定重要詞之質量

及重要詞之速度

，例如詞頻

分別為

、

、

、

、

、

、

、

、

、

。同時，潛力熱詞度計算模組40可將重要詞之質量

取重要詞對於每個時間區段之詞頻之和，例如

；以及將重要詞之速度

取重要詞對於每個時間區段之詞頻與時間的乘積之和，例如

。 The potential hot word calculation module 40 can calculate the word frequency of important words for each time period based on the statistical analysis results.

to formulate the quality of key words

and the speed of key words

, such as word frequency

respectively

,

Take the sum of the word frequencies of important words for each time period, for example

; and the speed of important words

Take the sum of the products of word frequency and time of important words for each time period, for example

.

假設已計算出

t))=2392516，

，則潛力熱詞度計算模組40可採取正規化方式將重要詞之動量值進行正規化以作為重要詞(如“疫情”)之動量評估指標I₂，例如依據上述公式(5)計算出下列公式(6)所示之動量評估指標I₂等於0.73。 Assume that it has been calculated

t))=2392516,

, then the potential hot word degree calculation module 40 can adopt a normalization method to normalize the momentum value of the important words as the momentum evaluation index I ₂ of the important words (such as "epidemic"), for example, calculated according to the above formula (5) The momentum evaluation index I ₂ shown in the following formula (6) is equal to 0.73.

在圖2之步驟S5中，潛力熱詞度計算模組40依據時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂。亦即，潛力熱詞度計算模組40可接收基於大數據蒐集而得之資料(如資料集)，並採用客觀方式、預設方式或客觀加預設之混合方式，以依據重要詞之時序分析指標I₁與動量評估指標I₂之重要程度分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂，其中，時序分析指標I₁、動量評估指標I₂、第一潛力熱詞度權重W₁或第二潛力熱詞度權重W₂可介於0至1之間，且第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂之總和可為1，但不以此為限。因此，本發明可藉由重要詞之時序分析指標I₁與動量評估指標I₂等重要指標以因應複雜情況，也利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞，還能提供最佳分析服務予使用者。 In step S5 of FIG. 2 , the potential hot word calculation module 40 respectively formulates the first potential hot word weight W ₁ and the second potential hot word weight according to the importance of the timing analysis index I ₁ and the momentum evaluation index I ₂ W ₂ . That is to say, the potential hot word calculation module 40 can receive data (such as a data set) collected based on big data, and use an objective method, a preset method, or a mixed method of objective and preset to determine the timing of important words. The importance of the analysis index I ₁ and the momentum evaluation index I ₂ is determined by formulating the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively. Among them, the time series analysis index I 1 , the momentum evaluation index I ₂ , and the second potential hot word weight W ₂ The first potential hot word weight W ₁ or the second potential hot word weight W ₂ can be between 0 and 1, and the sum of the first potential hot word weight W ₁ and the second potential hot word weight W ₂ can be is 1, but is not limited to this. Therefore, the present invention can respond to complex situations through important indicators such as the time series analysis index I ₁ and the momentum evaluation index I ₂ of important words, and also facilitates the subsequent potential hot word prediction module 50 to effectively and appropriately select potential hot words. words or the best potential hot words, and can also provide the best analysis services to users.

換言之，潛力熱詞度計算模組40可計算重要詞之時序分析指標I₁與動量評估指標I₂等重要指標，以依據時序分析指標I₁與動量評估指標I₂分別制定第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂，並採取客觀方式、預設方式或客觀加預設之混合方式制定一組權重值(W₁ ,W₂)，其中，時序分析指標I₁、動量評估指標I₂、第一潛力熱詞度權重W₁或第二潛力熱詞度權重W₂(權重值)可介於0至1之間，且第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂(權重值)之總和可為1，俾供後續將時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂進行加總(如線性加總)以產生或計算出重要詞之潛力熱詞度。 In other words, the potential hot word calculation module 40 can calculate important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of important words, so as to respectively formulate the first potential hot words based on the timing analysis index I ₁ and the momentum evaluation index I ₂ Degree weight W ₁ and the second potential hot word degree weight W ₂ , and adopt an objective method, a preset method, or a mixed method of objective and preset to formulate a set of weight values (W ₁ , W ₂ ), among which, the time series analysis index I _1. Momentum evaluation index I _2. The first potential hot word weight W ₁ or the second potential hot word weight W ₂ (weight value) can be between 0 and 1, and the first potential hot word weight W ₁ The sum of the second potential hot word weight W ₂ (weight value) can be 1, so that the timing analysis index I ₁ and the momentum evaluation index I ₂ can be matched with the first potential hot word weight W ₁ and the second potential respectively. The hot word degree weight W ₂ is summed (such as linear summation) to generate or calculate the potential hot word degree of the important words.

舉例而言，可在Google網站上分別採用「time series analysis forecasting」與「Momentum」兩個關鍵字進行搜尋，以各自獲得搜尋次數為98,000,000次及219,000,000次，且採取客觀方式以制定經過計算/標準化後之一組客觀權重值為(W₁ ,W₂)=(0.31,0.69)，後續亦可依據此客觀權重值之組合計算出重要詞之潛力熱詞度。 For example, you can search on the Google website using the two keywords "time series analysis forecasting" and "Momentum" respectively, and obtain 98,000,000 and 219,000,000 searches respectively, and adopt an objective approach to develop a calculated/standardized The latter set of objective weight values is (W ₁ , W ₂ )=(0.31 , 0.69). Subsequently, the potential popularity of important words can also be calculated based on this combination of objective weight values.

潛力熱詞度計算模組40除採取客觀方式以制定一組權重值外，亦可採取預設方式、或客觀加預設之混合方式(即客觀方式加上預設方式)以制定不同之一組權重值。例如，採取上述客觀方式以制定一組客觀權重值(W₁ ,W₂)=(0.31,0.69)，或者採取預設方式以制定一組預設權重值(W₁ ,W₂)=(0.38,0.62)，抑或者採取客觀方式之客觀權重值加上預設方式之預設權重值兩者之混合方式以制定一組權重值。 In addition to adopting an objective method to formulate a set of weight values, the potential hot word degree calculation module 40 can also adopt a preset method, or a mixed method of objective and preset (i.e., objective method plus preset method) to formulate a different one. Group weight value. For example, the above objective method is adopted to formulate a set of objective weight values (W ₁ , W ₂ )=(0.31 , 0.69), or the preset method is adopted to formulate a set of preset weight values (W ₁ , W ₂ )=(0.38) , 0.62), or a mixture of the objective weight value of the objective method and the default weight value of the default method is used to formulate a set of weight values.

在一實施例中，潛力熱詞度計算模組40採取客觀方式之客觀權重值(0.31,0.69)與預設方式之預設權重值(0.38,0.62)兩者之混合方式以制定一組權重值，且客觀方式與預設方式之採用比例=0.6：0.4(不同比例)，以對客觀權重值與預設權重值進行加權，進而得到(第一潛力熱詞度權重W₁，第二潛力熱詞度權重W₂)=(0.31,0.69)×0.6+(0.38,0.62)×0.4=(0.34,0.66)，即第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂(權重值)分別為0.34及0.66。 In one embodiment, the potential hot word calculation module 40 adopts a mixed method of the objective weight value (0.31 , 0.69) in the objective method and the preset weight value (0.38 , 0.62) in the default method to formulate a set of weights. value, and the ratio of the objective method and the default method = 0.6: 0.4 (different ratios), so as to weight the objective weight value and the default weight value, and then obtain (the first potential hot word weight W ₁ , the second potential hot word weight W 1 Popular word weight W ₂ )=(0.31 , 0.69)×0.6+(0.38 , 0.62)×0.4=(0.34 , 0.66), that is, the first potential hot word weight W ₁ and the second potential hot word weight W ₂ (weight values) are 0.34 and 0.66 respectively.

在圖2之步驟S6中，潛力熱詞度計算模組40將重要詞之時序分析指標I₁與動量評估指標I₂分別搭配第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂以產生或計算出重要詞之潛力熱詞度。亦即，潛力熱詞度計算模組40可利用潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法)，將重要詞之時序分析指標I₁與動量評估指標I₂分別乘以第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂後進行加總(如線性加總)，以產生或計算出重要詞之潛力熱詞度。 In step S6 of FIG. 2 , the potential hot word degree calculation module 40 matches the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words with the first potential hot word degree weight W ₁ and the second potential hot word degree weight respectively. W ₂ is used to generate or calculate the potential popularity of important words. That is, the potential hot word degree calculation module 40 can use the potential hot word degree calculation formula (such as the potential hot word degree calculation formula or the potential hot word degree algorithm) to combine the important word's time series analysis index I ₁ and the momentum evaluation index I ₂ are multiplied by the first potential hot word weight W ₁ and the second potential hot word weight W ₂ respectively, and then summed (such as linear summation) to generate or calculate the potential hot word degree of the important word.

例如，潛力熱詞度計算模組40之潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法)為：潛力熱詞度

，其中，i=1,2；j=1,2,...,TNN；TNN代表重要詞中前幾名之數量，I₁、I₂、W₁、W₂分別代表時序分析指標、動量評估指標、第一潛力熱詞度權重、第二潛力熱詞度權重。時序分析指標I₁、動量評估指標I₂、第一潛力熱詞度權重W₁或第二潛力熱詞度權重W₂均可介於0至1之間，且第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂之總和可為1，故能推得潛力熱詞度亦介於0至1之間。 For example, the potential hot word degree calculation formula of the potential hot word degree calculation module 40 (such as the potential hot word degree calculation formula or the potential hot word degree algorithm) is: potential hot word degree

, among them, i=1,2; j=1,2,...,TNN; TNN represents the number of top words in the important words, I ₁ , I ₂ , W ₁ , and W ₂ represent time series analysis indicators and momentum respectively. Evaluation indicators, first potential hot word weight, second potential hot word weight. The time series analysis index I ₁ , the momentum evaluation index I ₂ , the first potential hot word weight W ₁ or the second potential hot word weight W ₂ can be between 0 and 1, and the first potential hot word weight W The sum of ₁ and the second potential hot word degree weight W ₂ can be 1, so it can be deduced that the potential hot word degree is also between 0 and 1.

在一實施例中，時序分析指標I₁與動量評估指標I₂之組合(I₁ ,I₂)=(0.47,0.73)，且第一潛力熱詞度權重W₁與第二潛力熱詞度權重W₂之組合(W₁ ,W₂)=(0.34,0.66)。潛力熱詞度計算模組40可利用潛力熱詞度計算式，將「時序分析指標I₁之數值0.47乘以第一潛力熱詞度權重W₁之數值0.34」加上「動量評估指標I₂之數值0.73乘以第二潛力熱詞度權重W₂之數值0.66)」以計算出重要詞之潛力熱詞

之數值0.64，故潛力熱詞

之數值0.64亦介於0至1之間。 In one embodiment, the combination of the time series analysis index I ₁ and the momentum evaluation index I ₂ (I ₁ , I ₂ )=(0.47 , 0.73), and the first potential hot word weight W ₁ and the second potential hot word weight W 1 The combination of weight W ₂ (W ₁ , W ₂ )=(0.34 , 0.66). The potential hot word degree calculation module 40 can use the potential hot word degree calculation formula to multiply "the value 0.47 of the timing analysis index I ₁ by the value 0.34 of the first potential hot word weight W ₁ " plus the "momentum evaluation index I ₂ " The value of 0.73 is multiplied by the value of the second potential hot word weight W ₂ (0.66)" to calculate the potential hot words of important words

The value is 0.64, so the potential hot word

The value 0.64 is also between 0 and 1.

潛力熱詞度=0.64係屬第一主題群Topic₁之重要詞如“疫情”之重要度第1名數據，且第一主題群Topic₁之重要詞包括例如：疫情、口罩、三級警戒、病毒、莫德納、奧運防疫措施、經濟艙、破口、BNT、老人...等。又，潛力熱詞度計算模組40可將重要詞之重要度由大至小排序分別為：

、

、

、

、

、潛力熱詞

、

、

、潛力

、

。 The potential hot word degree = 0.64 is the data of the importance of the important words such as "epidemic" belonging to the first topic group Topic ₁ , and the important words of the first topic group Topic ₁ include, for example: epidemic, masks, three-level alert, Virus, Moderna, Olympic epidemic prevention measures, economy class, breach, BNT, the elderly...etc. In addition, the potential hot word degree calculation module 40 can sort the importance of important words from high to low as follows:

,

, potential hot words

,

, potential

,

.

因此，重要詞之潛力熱詞度之數值或指標可反映未來潛力熱詞之趨勢，使用者亦可由重要詞之潛力熱詞度之數值(數據)立即掌握重要詞之潛力程度，並據此決策將潛力熱詞提供至例如輿情或社群媒體相關之分析服務。同時，由上述重要詞之潛力熱詞度之數據分析可知，重要詞之潛力熱詞度之數值越高，表示此重要詞之潛力程度越好，有利於判定此重要詞是否需持續關注及後續相關之分析服務。 Therefore, the numerical values or indicators of the potential hot words of important words can reflect the trend of potential hot words in the future. Users can also immediately grasp the potential degree of important words based on the numerical values (data) of the potential hot words, and make decisions accordingly. Provide potential hot words to analysis services related to public opinion or social media. At the same time, from the above data analysis of the potential hot word degree of important words, it can be seen that the higher the value of the potential hot word degree of an important word, the better the potential of this important word, which is helpful to determine whether this important word requires continued attention and follow-up. Related analysis services.

本發明依據重要詞之時序分析指標I₁與動量評估指標I₂等重要指標產生或計算出重要詞之潛力熱詞度，且此潛力熱詞度能反映時序分析指標I₁與動量評估指標I₂等兩個向度。所以，相較於習知僅以特定熱詞評估指標(單一熱詞評估指標)作為衡量潛力熱詞，本發明採用時序分析指標I₁ 與動量評估指標I₂等重要指標(多重評估因子)，以利因應複雜情況，更能有效且合適地挑選出潛力熱詞或最佳潛力熱詞，亦能提供最佳分析服務予使用者。 The present invention generates or calculates the potential hot word degree of important words based on important indicators such as the timing analysis index I ₁ and the momentum evaluation index I ₂ of the important words, and this potential hot word degree can reflect the timing analysis index I ₁ and the momentum evaluation index I ₂ and other two dimensions. Therefore, compared with the common practice of only using specific hot word evaluation indicators (single hot word evaluation index) as a measure of potential hot words, the present invention uses important indicators (multiple evaluation factors) such as the time series analysis index I ₁ and the momentum evaluation index I ₂ . In order to cope with complex situations, potential hot words or the best potential hot words can be more effectively and appropriately selected, and the best analysis services can be provided to users.

另外，本發明還提供一種針對基於時序分析指標與動量評估指標制定潛力熱詞度之方法之電腦可讀媒介，係應用於具有處理器及/或記憶體的計算裝置或電腦中，且電腦可讀媒介儲存有指令，並可利用計算裝置或電腦透過處理器及/或記憶體執行電腦可讀媒介，以於執行電腦可讀媒介時執行上述內容。例如，處理器可為微處理器、中央處理器(CPU)、圖形處理器(GPU)等，記憶體可為隨機存取記憶體(RAM)、記憶卡、硬碟(如雲端/網路硬碟)、資料庫等，但不以此為限。 In addition, the present invention also provides a computer-readable medium for a method of formulating potential hot words based on timing analysis indicators and momentum evaluation indicators, which is applied to a computing device or computer with a processor and/or memory, and the computer can The computer-readable medium stores instructions and can be executed by a computing device or computer through a processor and/or memory to execute the above contents when the computer-readable medium is executed. For example, the processor can be a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), etc., and the memory can be a random access memory (RAM), a memory card, a hard drive (such as a cloud/network hard drive). discs), databases, etc., but are not limited to this.

綜上，本發明之制定潛力熱詞度之系統、方法及電腦可讀媒介至少具有下列特色、優點或技術功效。 In summary, the system, method and computer-readable medium for formulating potential hot words of the present invention have at least the following features, advantages or technical effects.

一、本發明之資料蒐集模組能設定啟動之時間點(時間區間)與擷取之資料範圍，以利自動化排程之執行，亦能控管資料或資料來源之執行進度。 1. The data collection module of the present invention can set the start-up time point (time interval) and the data range to be retrieved to facilitate the execution of automated schedules, and can also control the execution progress of data or data sources.

二、本發明之重要詞擷取模組能採用例如為條件隨機場(CRF)斷詞模型之斷詞模型與命名實體模型，有利於依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。 2. The important word extraction module of the present invention can use word segmentation models and named entity models, such as conditional random field (CRF) word segmentation models, which is beneficial to extracting important words from the data based on the part-of-speech of words and the labeling of named entities. words.

三、本發明之資料分群模組能透過隱含狄利克雷分布(LDA)主題模型將資料中每篇文章按照機率分布之形式產出，以利決定每篇文章之主題。 3. The data grouping module of the present invention can generate each article in the data in the form of probability distribution through the latent Dirichlet distribution (LDA) topic model, so as to facilitate the determination of the topic of each article.

四、本發明之潛力熱詞度計算模組能計算重要詞之時序分析指標與動量評估指標兩者，以依據兩者之重要程度制定第一潛力熱詞度權重與第二潛力熱詞度權重，有利於自動化或系統化產生重要詞之潛力熱詞度。 4. The potential hot word degree calculation module of the present invention can calculate the timing analysis of important words. Both the indicator and the momentum evaluation indicator are used to formulate the first potential hot word weight and the second potential hot word weight based on the importance of the two, which is conducive to automatically or systematically generating the potential hot word degree of important words.

五、本發明之潛力熱詞預測模組能依據重要詞之潛力熱詞度持續進行潛力熱詞之預測，以利因應複雜情況，亦能有效且合適地挑選出潛力熱詞或最佳潛力熱詞。 5. The potential hot word prediction module of the present invention can continuously predict potential hot words based on the potential hot words of important words, so as to facilitate the response to complex situations, and can also effectively and appropriately select potential hot words or the best potential hot words. words.

六、本發明採用更廣義之潛力熱詞評估方式，能將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重，以利自動化或系統化產生重要詞之潛力熱詞度，亦利於潛力熱詞預測模組預測出潛力熱詞或最佳潛力熱詞。 6. The present invention adopts a broader potential hot word evaluation method, which can combine the timing analysis indicators and momentum evaluation indicators of important words with the first potential hot word weight and the second potential hot word weight respectively to facilitate automation or systemization. Generating the potential hot word degree of important words also helps the potential hot word prediction module to predict potential hot words or the best potential hot words.

七、相較於習知僅採用特定熱詞評估指標(單一熱詞評估指標)，本發明可依據時序分析指標與動量評估指標等重要指標(多重評估因子)產生重要詞之潛力熱詞度，亦可依據潛力熱詞度預測出潛力熱詞或最佳潛力熱詞，藉此能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效，亦能提供最佳分析服務予使用者。 7. Compared with the conventional method of only using specific hot word evaluation indicators (single hot word evaluation index), the present invention can generate the potential hot word degree of important words based on important indicators (multiple evaluation factors) such as time series analysis indicators and momentum evaluation indicators. Potential hot words or the best potential hot words can also be predicted based on the degree of potential hot words. This can achieve the effect of effectively and appropriately selecting potential hot words or the best potential hot words, and can also provide the best analysis services for use. By.

八、本發明採用時序分析指標與動量評估指標等重要指標產生重要詞之潛力熱詞度，能改善習知之潛力熱詞預測系統僅以點擊數、轉載數、評論數、話題關注度甚至網紅影響力等數據挑選潛力熱詞，導致無法因應複雜情況進行潛力熱詞預測之缺點。 8. The present invention uses important indicators such as time series analysis indicators and momentum evaluation indicators to generate the potential hot word degree of important words, which can improve the conventional potential hot word prediction system based only on the number of clicks, the number of reprints, the number of comments, topic attention, and even Internet celebrities. Influence and other data select potential hot words, resulting in the shortcoming of being unable to predict potential hot words in response to complex situations.

上述實施形態僅例示性說明本發明之原理、特點及其功效，並非用以限制本發明之可實施範疇，任何熟習此項技藝之人士均能在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。任何使用本發明所揭示內容而完成之等效改變及修飾，均仍應為申請專利範圍所涵蓋。因此，本發明之權利保護範圍應如申請專利範圍所列。 The above embodiments are only illustrative of the principles, characteristics and effects of the present invention, and are not intended to limit the scope of the present invention. Anyone skilled in the art can make the above-mentioned modifications without violating the spirit and scope of the present invention. Modify and change the implementation form. Any use of this Equivalent changes and modifications made based on the disclosed content of the invention should still be covered by the scope of the patent application. Therefore, the protection scope of the present invention should be as listed in the patent application scope.

10:資料蒐集模組 10:Data collection module

20:重要詞擷取模組 20: Key word extraction module

30:資料分群模組 30: Data grouping module

40:潛力熱詞度計算模組 40:Potential hot word calculation module

50:潛力熱詞預測模組 50:Potential hot word prediction module

A:斷詞模型 A: Word segmentation model

B:命名實體模型 B: Named entity model

I₁:時序分析指標 I ₁ : Timing analysis indicators

I₂:動量評估指標 I ₂ : Momentum Assessment Indicator

W₁:第一潛力熱詞度權重 W ₁ : The first potential hot word weight

W₂:第二潛力熱詞度權重 W ₂ : The second potential hot word weight

Claims

一種制定潛力熱詞度之系統，包括：擷取模組，係依據詞頻選出資料之詞彙之詞性且使用命名實體模型對該資料之詞彙進行命名實體之標註，以由該擷取模組依據該詞頻得到之該資料之詞彙之詞性與該命名實體模型得到之該命名實體之標註擷取該資料中之重要詞；潛力熱詞度計算模組，係計算該擷取模組依據該詞頻得到之該資料之詞彙之詞性與該命名實體模型得到之該命名實體之標註所擷取之該資料中之該重要詞之時序分析指標與動量評估指標，其中，當該潛力熱詞度計算模組計算該重要詞之時序分析指標時，由該潛力熱詞度計算模組透過關聯於該重要詞之時序分析指標之動態主題模型(DTM)及動態影響模型(DIM)之至少一者以產生該重要詞之時序變化數值或時序分析結果，且其中，由該潛力熱詞度計算模組依據該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標之重要程度與該重要詞之動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重，再由該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總以計算出該重要詞之潛力熱詞度；以及潛力熱詞預測模組，係取得在該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總所計算出之該重要詞之潛力熱詞度，以由該潛力熱詞預測模組依據在該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總所計算出之該重要詞之潛力熱詞度預測出潛力熱詞。 A system for formulating the degree of potential hot words, including: a retrieval module, which selects the part of speech of the vocabulary of the data based on the word frequency and uses the named entity model to label the vocabulary of the data with named entities, so that the retrieval module can use the named entity model to label the words of the data according to the word frequency. The part-of-speech of the vocabulary of the data obtained by the word frequency and the annotation of the named entity obtained by the named entity model are used to extract important words in the data; the potential hot word degree calculation module calculates the results obtained by the retrieval module based on the word frequency. The part-of-speech of the vocabulary of the data and the annotation of the named entity obtained by the named entity model are the timing analysis indicators and momentum evaluation indicators of the important words in the data. Among them, when the potential hot word calculation module calculates When the temporal analysis index of the keyword is used, the potential hot word degree calculation module generates the important word through at least one of the dynamic topic model (DTM) and the dynamic influence model (DIM) associated with the temporal analysis index of the keyword. The temporal change value or temporal analysis result of the word, and among them, the potential hot word degree calculation module is based on the importance of the temporal analysis index of the important word associated with the dynamic topic model and the dynamic influence model and the momentum of the important word The importance of the evaluation index is determined by formulating the first potential hot word weight and the second potential hot word weight, and then the potential hot word calculation module associates the important words with the timing of the dynamic topic model and the dynamic impact model. The analysis index and the momentum evaluation index of the important word are multiplied by the first potential hot word weight and the second potential hot word weight respectively, and then summed to calculate the potential hot word of the important word; and the potential hot word The prediction module obtains the timing analysis index of the important word associated with the dynamic topic model and the dynamic influence model and the momentum evaluation index of the important word from the potential hot word calculation module and multiplies the first potential hot word respectively. The potential hot word degree of the important word is calculated by summing the word degree weight and the second potential hot word degree weight, so that the potential hot word prediction module calculates the potential hot word degree according to the potential hot word degree calculation module. The timing analysis index of the important word related to the dynamic topic model and the dynamic influence model and the momentum evaluation index of the important word are multiplied by the weight of the first potential hot word respectively. The potential hot words are predicted by adding the calculated potential hot words of the important words after adding the weight of the second potential hot words.

如請求項1所述之系統，更包括資料蒐集模組與資料分群模組，其中，該資料蒐集模組係用以蒐集該資料，以由該擷取模組擷取該資料中之複數不同的重要詞，再由該資料分群模組依據該複數不同的重要詞對該資料進行分群。 The system as described in claim 1 further includes a data collection module and a data grouping module, wherein the data collection module is used to collect the data, so that the retrieval module retrieves a plurality of different data in the data. keywords, and then the data grouping module groups the data according to the plurality of different keywords.

如請求項1所述之系統，更包括資料分群模組，係將該資料中之重要詞作為特徵向量，以由該資料分群模組依據該資料中之重要詞制定該資料之分群數，且由該資料分群模組透過隱含狄利克雷分布主題模型將該資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。 The system described in claim 1 further includes a data grouping module that uses important words in the data as feature vectors, so that the data grouping module formulates the number of groups for the data based on the important words in the data, and The data grouping module generates each article in the data in the form of a probability distribution through the implicit Dirichlet distribution topic model to determine the topic of each article.

如請求項1所述之系統，其中，該擷取模組更透過斷詞模型對該資料進行斷詞及詞彙之詞性標註。 The system as described in claim 1, wherein the acquisition module further performs word segmentation and part-of-speech tagging of words on the data through a word segmentation model.

如請求項1所述之系統，其中，該潛力熱詞度計算模組係利用潛力熱詞度計算式將該時序分析指標與該動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總，以計算出該重要詞之潛力熱詞度。 The system as described in claim 1, wherein the potential hot word degree calculation module uses a potential hot word degree calculation formula to multiply the timing analysis index and the momentum evaluation index by the first potential hot word degree weight and the The second potential hot word weights are then summed to calculate the potential hot word degree of the important word.

如請求項1所述之系統，其中，該潛力熱詞度計算模組更利用基於隱含狄利克雷分布主題模型而得之資料以評估該資料中之重要詞之時序變化。 The system of claim 1, wherein the potential hot word degree calculation module further utilizes data based on the implicit Dirichlet distribution topic model to evaluate the temporal changes of important words in the data.

如請求項1所述之系統，其中，該潛力熱詞度計算模組更採取包括映射函數方式之正規化方式，以將該重要詞之時序變化量或動量值轉為0至1之間，且該映射函數方式包括Sigmoid函數方式與最小最大正規化方式之至少一者。 The system as described in claim 1, wherein the potential hot word degree calculation module further adopts a normalization method including a mapping function method to convert the temporal change or momentum value of the important word into a value between 0 and 1 , and the mapping function method includes at least one of the Sigmoid function method and the min-max normalization method.

一種制定潛力熱詞度之方法，包括：由擷取模組依據詞頻選出資料之詞彙之詞性且使用命名實體模型對該資料之詞彙進行命名實體之標註，以由該擷取模組依據該詞頻得到之該資料之詞彙之詞性與該命名實體模型得到之該命名實體之標註擷取該資料中之重要詞；由潛力熱詞度計算模組計算該擷取模組依據該詞頻得到之該資料之詞彙之詞性與該命名實體模型得到之該命名實體之標註所擷取之該資料中之該重要詞之時序分析指標與動量評估指標，其中，當該潛力熱詞度計算模組計算該重要詞之時序分析指標時，由該潛力熱詞度計算模組透過關聯於該重要詞之時序分析指標之動態主題模型(DTM)及動態影響模型(DIM)之至少一者以產生該重要詞之時序變化數值或時序分析結果，且其中，由該潛力熱詞度計算模組依據該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標之重要程度與該重要詞之動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重，再由該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總以計算出該重要詞之潛力熱詞度；以及由潛力熱詞預測模組取得在該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總所計算出之該重要詞之潛力熱詞度，以由該潛力熱詞預測模組依據在該潛力熱詞度計算模組將該重要詞之關聯於該動態主題模型及動態影響模型之時序分析指標與該重要詞之動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總所計算出之該重要詞之潛力熱詞度預測出潛力熱詞。 A method to develop potential hot words, including: The retrieval module selects the part of speech of the vocabulary of the data based on the word frequency and uses the named entity model to annotate the vocabulary of the data with named entities, so that the retrieval module obtains the part of speech of the vocabulary of the data based on the word frequency and the name The annotation of the named entity obtained by the entity model extracts important words in the data; the potential hot word degree calculation module calculates the part of speech of the vocabulary of the data obtained by the acquisition module based on the word frequency and the part of speech obtained by the named entity model. The temporal analysis index and momentum evaluation index of the important word in the data extracted from the annotation of the named entity. When the potential hot word degree calculation module calculates the temporal analysis index of the important word, the potential hot word degree calculation module calculates the sequential analysis index of the important word. The word degree calculation module uses at least one of a dynamic topic model (DTM) and a dynamic influence model (DIM) associated with the temporal analysis index of the important word to generate the temporal change value or the temporal analysis result of the important word, and among them, The potential hot word degree calculation module determines the first potential hot word degree based on the importance of the important word's correlation with the timing analysis index of the dynamic topic model and the dynamic influence model and the importance of the momentum evaluation index of the important word. The weight and the second potential hot word degree weight, and then the potential hot word degree calculation module multiplies the timing analysis index of the important word associated with the dynamic topic model and the dynamic influence model and the momentum evaluation index of the important word respectively. The first potential hot word weight and the second potential hot word weight are added together to calculate the potential hot word degree of the important word; and the potential hot word prediction module obtains the potential hot word degree calculation module. The group multiplies the timing analysis index of the important word associated with the dynamic topic model and the dynamic influence model and the momentum evaluation index of the important word by the first potential hot word weight and the second potential hot word weight respectively. The calculated potential hot word degree of the important word is summed, so that the potential hot word prediction module associates the important word with the dynamic topic model and the dynamic influence model based on the potential hot word degree calculation module. The time series analysis index and the momentum evaluation index of the important word are multiplied by the first potential hot word weight and the second potential hot word weight respectively, and then summed to calculate the potential hot word degree of the important word to predict the potential. Hot words.

如請求項8所述之方法，更包括由資料蒐集模組蒐集該資料，以由該擷取模組擷取該資料中之複數不同的重要詞，再由資料分群模組依據該複數不同的重要詞對該資料進行分群。 The method described in claim 8 further includes collecting the data by a data collection module, so that the retrieval module can retrieve a plurality of different keywords in the data, and then the data grouping module can use the data grouping module according to the plurality of different keywords. Keywords are used to group the data.

如請求項8所述之方法，更包括將該資料中之重要詞作為特徵向量，以由資料分群模組依據該資料中之重要詞制定該資料之分群數，且由該資料分群模組透過隱含狄利克雷分布主題模型將該資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。 The method described in claim 8 further includes using key words in the data as feature vectors, so that the data grouping module determines the number of groups for the data based on the key words in the data, and the data grouping module uses The implicit Dirichlet distribution topic model produces each article in the data in the form of a probability distribution to determine the topic of each article.

如請求項8所述之方法，更包括由該擷取模組透過斷詞模型對該資料進行斷詞及詞彙之詞性標註。 The method described in claim 8 further includes using the retrieval module to perform word segmentation and part-of-speech tagging of words on the data through a word segmentation model.

如請求項8所述之方法，其中，該潛力熱詞度計算模組係利用潛力熱詞度計算式將該時序分析指標與該動量評估指標分別乘以該第一潛力熱詞度權重與該第二潛力熱詞度權重後進行加總，以計算出該重要詞之潛力熱詞度。 The method as described in claim 8, wherein the potential hot word degree calculation module uses the potential hot word degree calculation formula to multiply the timing analysis index and the momentum evaluation index by the first potential hot word degree weight and the The second potential hot word weights are then summed to calculate the potential hot word degree of the important word.

如請求項8所述之方法，更包括由該潛力熱詞度計算模組利用基於隱含狄利克雷分布主題模型而得之資料以評估該資料中之重要詞之時序變化。 The method described in claim 8 further includes using the potential hot word degree calculation module to evaluate the temporal changes of important words in the data using data based on the latent Dirichlet distribution topic model.

如請求項8所述之方法，更包括由該潛力熱詞度計算模組採取包括映射函數方式之正規化方式，以將該重要詞之時序變化量或動量值轉為0至1之間，且該映射函數方式包括Sigmoid函數方式與最小最大正規化方式之至少一者。 The method described in claim 8 further includes using the potential hot word degree calculation module to adopt a normalization method including a mapping function method to convert the time series change or momentum value of the important word into a value between 0 and 1. , and the mapping function method includes at least one of the Sigmoid function method and the min-max normalization method.

一種電腦可讀媒介，應用於計算裝置或電腦中，係儲存有指令，以執行如請求項8至14之任一者所述制定潛力熱詞度之方法。 A computer-readable medium, used in a computing device or computer, storing instructions to execute the method of formulating a potential hot word index as described in any one of claims 8 to 14.