TWI825535B - System, method and computer-readable medium for formulating potential hot word degree - Google Patents
System, method and computer-readable medium for formulating potential hot word degree Download PDFInfo
- Publication number
- TWI825535B TWI825535B TW110148174A TW110148174A TWI825535B TW I825535 B TWI825535 B TW I825535B TW 110148174 A TW110148174 A TW 110148174A TW 110148174 A TW110148174 A TW 110148174A TW I825535 B TWI825535 B TW I825535B
- Authority
- TW
- Taiwan
- Prior art keywords
- potential hot
- word
- hot word
- data
- important
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 238000004364 calculation method Methods 0.000 claims abstract description 116
- 238000011156 evaluation Methods 0.000 claims abstract description 108
- 238000012731 temporal analysis Methods 0.000 claims abstract description 48
- 238000000700 time series analysis Methods 0.000 claims abstract description 32
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000004458 analytical method Methods 0.000 claims description 63
- 230000002123 temporal effect Effects 0.000 claims description 21
- 230000008859 change Effects 0.000 claims description 16
- 238000013480 data collection Methods 0.000 claims description 15
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 8
- 238000012882 sequential analysis Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 14
- 230000006870 function Effects 0.000 description 10
- 238000007619 statistical method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001417495 Serranidae Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本發明係關於一種制定潛力熱詞度或預測潛力熱詞之技術,特別是指一種基於時序分析指標與動量評估指標制定潛力熱詞度之系統、方法及電腦可讀媒介。 The present invention relates to a technology for formulating the degree of potential hot words or predicting the degree of potential hot words. In particular, it refers to a system, method and computer-readable medium for formulating the degree of potential hot words based on time series analysis indicators and momentum evaluation indicators.
習知之潛力熱詞預測系統僅以點擊數、轉載數、評論數、話題關注度甚至網紅影響力等數據進行判斷,據以挑選出潛力熱詞。但此潛力熱詞預測系統並不能全面分析熱詞話題之特徵,亦無法發現潛在成為熱詞(潛力熱詞)之話題。 Xi Zhi's potential hot word prediction system only uses data such as the number of clicks, reprints, comments, topic attention, and even the influence of Internet celebrities to judge and select potential hot words. However, this potential hot word prediction system cannot fully analyze the characteristics of hot word topics, nor can it discover topics that may potentially become hot words (potential hot words).
再者,此潛力熱詞預測系統僅採用特定熱詞評估指標(單一熱詞評估指標)進行潛力熱詞之預測,亦即在潛力熱詞之預測及評估之實作層面上,僅會以特定熱詞評估指標(單一熱詞評估指標)作為衡量潛力熱詞之依據,卻因未考量時序分析指標與動量評估指標等多重評估因子以預測及評估重要詞,據以決定此重要詞是否為潛力熱詞,故無法適應多種不同的需求條件,也無法提供使用者預測出潛力熱詞之趨勢或分析服務。同時, 此潛力熱詞預測系統在複雜的情況下,因未能採用重要詞之時序分析指標與動量評估指標等重要指標,便難以從潛力熱詞平台中有效且合適地挑選出最佳潛力熱詞。 Furthermore, this potential hot word prediction system only uses specific hot word evaluation indicators (single hot word evaluation indicators) to predict potential hot words. That is to say, at the implementation level of prediction and evaluation of potential hot words, it only uses specific hot word evaluation indicators. Hot word evaluation index (single hot word evaluation index) is used as the basis for measuring potential hot words, but it does not consider multiple evaluation factors such as time series analysis indicators and momentum evaluation indicators to predict and evaluate important words to determine whether this important word is a potential hot word. Hot words, so it cannot adapt to a variety of different demand conditions, nor can it provide users with trends or analysis services to predict potential hot words. At the same time, Under complex circumstances, this potential hot word prediction system fails to use important indicators such as timing analysis indicators and momentum evaluation indicators of important words, so it is difficult to effectively and appropriately select the best potential hot words from the potential hot word platform.
因此,如何提供一種創新之制定潛力熱詞度或預測潛力熱詞之技術,以解決上述之任一問題或提供相關之功能(技術/服務),已成為本領域技術人員之一大研究課題。 Therefore, how to provide an innovative technology for formulating potential hot words or predicting potential hot words to solve any of the above problems or provide related functions (technology/services) has become a major research topic for those skilled in the art.
本發明提供一種創新之制定潛力熱詞度之系統、方法及電腦可讀媒介,係將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以產生重要詞之潛力熱詞度,或者依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞,抑或者採用斷詞模型與命名實體模型以依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞,又或者透過隱含狄利克雷分布(LDA)主題模型將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。 The present invention provides an innovative system, method and computer-readable medium for formulating the degree of potential hot words. The timing analysis index and the momentum evaluation index of important words are respectively matched with the weight of the first potential hot word and the second weight of the second potential hot word. To generate the potential hot words of important words, or to predict the potential hot words or the best potential hot words based on the potential hot words of important words, or to use the word segmentation model and the named entity model to predict the relationship between the word's part of speech and the named entity. Annotation is used to extract important words from the data, or the Latent Dirichlet Distribution (LDA) topic model is used to generate each article in the data in the form of a probability distribution to determine the topic of each article.
本發明之制定潛力熱詞度之系統包括:擷取模組,係擷取資料中之重要詞;潛力熱詞度計算模組,係計算擷取模組所擷取之資料中之重要詞之時序分析指標與動量評估指標,以由潛力熱詞度計算模組依據重要詞之時序分析指標與動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重,再由潛力熱詞度計算模組將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以計算出重要詞之潛力熱詞度;以及潛力熱詞預測模組,係取得潛力熱詞度計算模組所計算出之重要詞之潛力熱詞度,以由潛力熱詞預測模組依 據重要詞之潛力熱詞度預測出潛力熱詞。 The system for formulating the degree of potential hot words of the present invention includes: a retrieval module, which retrieves important words in the data; and a calculation module, which calculates the degree of potential hot words, which calculates the important words in the data retrieved by the retrieval module. The timing analysis index and the momentum evaluation index are determined by the potential hot word degree calculation module according to the importance of the timing analysis index and the momentum evaluation index of the important words. The first potential hot word degree weight and the second potential hot word degree weight are respectively determined. The potential hot word degree calculation module combines the timing analysis index and momentum evaluation index of important words with the first potential hot word degree weight and the second potential hot word degree weight respectively to calculate the potential hot word degree of the important word; and the potential hot word degree. The word prediction module obtains the potential hot word degree of important words calculated by the potential hot word degree calculation module, and uses the potential hot word prediction module according to the Potential hot words are predicted based on their potential hot word degrees.
本發明之制定潛力熱詞度之方法包括:由擷取模組擷取資料中之重要詞;由潛力熱詞度計算模組計算擷取模組所擷取之資料中之重要詞之時序分析指標與動量評估指標,以由潛力熱詞度計算模組依據重要詞之時序分析指標與動量評估指標之重要程度分別制定第一潛力熱詞度權重與第二潛力熱詞度權重,再由潛力熱詞度計算模組將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重以計算出重要詞之潛力熱詞度;以及由潛力熱詞預測模組取得潛力熱詞度計算模組所計算出之重要詞之潛力熱詞度,以由潛力熱詞預測模組依據重要詞之潛力熱詞度預測出潛力熱詞。 The method of formulating the degree of potential hot words of the present invention includes: extracting important words from the data by an acquisition module; and using the potential hot word degree calculation module to calculate the timing analysis of the important words in the data extracted by the acquisition module. Indicators and momentum evaluation indicators, the potential hot word degree calculation module formulates the first potential hot word degree weight and the second potential hot word degree weight based on the importance of the time series analysis indicators and momentum evaluation indicators of important words, and then calculates the first potential hot word degree weight and the second potential hot word degree weight based on the potential The hot word degree calculation module combines the time series analysis indicators and momentum evaluation indicators of important words with the first potential hot word degree weight and the second potential hot word degree weight to calculate the potential hot word degree of important words; and based on the potential hot word degree The prediction module obtains the potential hot word degree of the important words calculated by the potential hot word degree calculation module, so that the potential hot word prediction module predicts the potential hot words based on the potential hot word degree of the important words.
本發明之電腦可讀媒介應用於計算裝置或電腦中,係儲存有指令,以執行上述制定潛力熱詞度之方法。 The computer-readable medium of the present invention is used in a computing device or computer and stores instructions to execute the above method of formulating potential hot words.
為讓本發明之上述特徵與優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明。在以下描述內容中將部分闡述本發明之額外特徵及優點,且此等特徵及優點將部分自所述描述內容可得而知,或可藉由對本發明之實踐習得。應理解,前文一般描述與以下詳細描述兩者均為例示性及解釋性的,且不欲約束本發明所欲主張之範圍。 In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and explained in detail with reference to the accompanying drawings. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not intended to limit the scope of the invention.
1:制定潛力熱詞度之系統 1: Develop a system for identifying potential hot words
10:資料蒐集模組 10:Data collection module
20:重要詞擷取模組 20: Key word extraction module
30:資料分群模組 30: Data grouping module
40:潛力熱詞度計算模組 40:Potential hot word calculation module
50:潛力熱詞預測模組 50:Potential hot word prediction module
A:斷詞模型 A: Word segmentation model
B:命名實體模型 B: Named entity model
C:隱含狄利克雷分布(LDA)主題模型 C: Latent Dirichlet distribution (LDA) topic model
I1:時序分析指標 I 1 : Timing analysis indicators
I2:動量評估指標 I 2 : Momentum Assessment Indicator
S1至S8:步驟 S1 to S8: Steps
W1:第一潛力熱詞度權重 W 1 : The first potential hot word weight
W2:第二潛力熱詞度權重 W 2 : The second potential hot word weight
圖1為本發明之制定潛力熱詞度之系統之架構示意圖。 Figure 1 is a schematic diagram of the architecture of a system for formulating potential hot word ratings according to the present invention.
圖2為本發明之制定潛力熱詞度之方法之流程示意圖。 FIG. 2 is a schematic flowchart of the method for formulating the degree of potential hot words according to the present invention.
圖3為本發明之制定潛力熱詞度及其方法中,有關重要詞之時序變化數值之示意圖。 Figure 3 is a schematic diagram illustrating the temporal change values of important words in the method and method for formulating potential hot words according to the present invention.
以下藉由特定的具體實施形態說明本發明之實施方式,熟悉此技術之人士可由本說明書所揭示之內容了解本發明之其它優點與功效,亦可因而藉由其它不同具體等同實施形態加以施行或運用。 The following describes the embodiments of the present invention through specific specific embodiments. Those skilled in the art can understand other advantages and effects of the present invention from the content disclosed in this specification, and can also implement it through other different specific equivalent embodiments or Use.
圖1為本發明之制定潛力熱詞度之系統1之架構示意圖,其中,該系統1係基於時序分析指標與動量評估指標而制定潛力熱詞度。如圖所示,該制定潛力熱詞度之系統1係自動化產生或計算出資料中之重要詞之潛力熱詞度以預測出潛力熱詞,並包括互相連接或通訊之一資料蒐集模組10、一重要詞擷取模組20、一資料分群模組30、一潛力熱詞度計算模組40及一潛力熱詞預測模組50等。此外,本發明所述「複數」代表二個以上(如二、三、四、五、十或百個以上),「連接」或「通訊」代表無線或有線連接或通訊等,時序分析指標I1或動量評估指標I2可用「數值」呈現,且第一潛力熱詞度權重W1或第二潛力熱詞度權重W2可用「權重值」呈現。
Figure 1 is a schematic structural diagram of a
在一實施例中,資料蒐集模組10可為資料蒐集器(晶片/電路)、資料蒐集軟體(程式)等,重要詞擷取模組20可為重要詞擷取器(晶片/電路)、重要詞擷取軟體(程式)等,資料分群模組30可為資料分群器(晶片/電路)、資料分群軟體(程式)等,潛力熱詞度計算模組40可為潛力熱詞度軟體(程式)、潛力熱詞度運算器(晶片/電路)、潛力熱詞度運算軟體(程式)等,潛力熱詞預測模組50可為潛力熱詞預測器(晶片/電路)、潛力熱詞預測軟體(程式)等。但是,本發明並不以此為限。
In one embodiment, the
資料蒐集模組10係蒐集資料(如資料集),以由重要詞擷取
模組20擷取資料中之重要詞(如複數不同的重要詞),再由資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。潛力熱詞度計算模組40係計算重要詞之時序分析指標I1(數值)與動量評估指標I2(數值)等重要指標(多重評估因子),以依據重要詞之時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2。
The
潛力熱詞度計算模組40係將重要詞之時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2以產生或計算出重要詞之潛力熱詞度,例如由潛力熱詞度計算模組40依據「時序分析指標I1乘以第一潛力熱詞度權重W1」加上「動量評估指標I2乘以第二潛力熱詞度權重W2」所得到之加總(即I1*W1+I2*W2)計算出重要詞之潛力熱詞度,再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。最後,直到資料蒐集模組10取得新的資料(如資料集),再進行下一輪次之潛力熱詞度之計算及潛力熱詞之預測等。
The potential hot word
詳言之,潛力熱詞度計算模組40與潛力熱詞預測模組50可互相連接或通訊以傳遞潛力熱詞度。當潛力熱詞預測模組50向潛力熱詞度計算模組40發出潛力熱詞度之計算請求時,潛力熱詞度計算模組40可依據此計算請求計算出重要詞之潛力熱詞度,再將潛力熱詞度之計算結果傳送至潛力熱詞預測模組50。
Specifically, the potential hot word
例如,潛力熱詞度計算模組40可接收基於大數據蒐集而得之資料(如資料集),以計算資料中之重要詞之時序分析指標I1與動量評估指標I2等重要指標。潛力熱詞度計算模組40亦可採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如
經過計算/標準化加上系統預設/使用者預設),以依據重要詞之時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2,再由潛力熱詞度計算模組40將重要詞之時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2以產生或計算出重要詞之潛力熱詞度,俾由潛力熱詞預測模組50依據重要詞之潛力熱詞度以預測出潛力熱詞或最佳潛力熱詞。
For example, the potential hot word
因此,本發明之潛力熱詞度計算模組40係利用一潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法),將重要詞之時序分析指標I1與動量評估指標I2分別乘以第一潛力熱詞度權重W1與第二潛力熱詞度權重W2後進行加總以產生或計算出重要詞之潛力熱詞度,再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞,更能有效且合適地挑選出潛力熱詞或最佳潛力熱詞,也利於提供最佳分析服務予使用者。
Therefore, the potential hot word
再者,本發明採用更廣義之潛力熱詞評估方式,先由潛力熱詞度計算模組40將重要詞之時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2以自動化或系統化產生(計算出)重要詞之潛力熱詞度,再由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞,俾利於因應複雜情況以有效且合適地挑選出潛力熱詞或最佳潛力熱詞。是以,相較於習知僅採用特定熱詞評估指標(單一熱詞評估指標),本發明依據重要詞之時序分析指標I1與動量評估指標I2等重要指標(多重評估因子),更能選擇到最佳潛力熱詞。
Furthermore, the present invention adopts a broader potential hot word evaluation method. First, the potential hot word
圖2為本發明之制定潛力熱詞度之方法之流程示意圖,並參 閱圖1予以說明。同時,該制定潛力熱詞度之方法之主要技術內容如下,其餘內容相同於上述圖1之說明,於此不再重覆敘述。 Figure 2 is a schematic flow chart of the method for formulating potential hot words according to the present invention. See Figure 1 for explanation. At the same time, the main technical contents of this method of formulating the potential hot word index are as follows. The rest of the contents are the same as the description in Figure 1 above, and will not be repeated here.
如圖2所示,在步驟S1中,資料蒐集模組10蒐集資料(如資料集)。亦即,資料蒐集模組10可蒐集輿情新聞、輿情文章、論壇及/或社群資料等各種資料(如大數據之資料集)。
As shown in Figure 2, in step S1, the
在步驟S2中,重要詞擷取模組20擷取資料(如資料集)中之重要詞(如複數不同的重要詞)。亦即,重要詞擷取模組20可透過例如為條件隨機場(Conditional Random Field;CRF)斷詞模型之斷詞模型A對資料進行斷詞及詞彙之詞性標註,且使用命名實體模型B對資料之詞彙進行命名實體之標註,以由重要詞擷取模組20依據詞頻(Term Frequency;TF)選出詞彙之詞性(如代表詞性),再由重要詞擷取模組20依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。例如,詞彙之詞性可為名詞、代名詞、動詞、形容詞或副詞等詞類(如八大詞類),命名實體可為人名、地名、組織、時間或地點等,但不以此為限。
In step S2, the
在步驟S3中,資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。亦即,資料分群模組30可將資料中之重要詞作為特徵向量,以由資料分群模組30依據資料中之重要詞採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如經過計算/標準化加上系統預設/使用者預設)制定資料之分群數(分群之數量)。又,資料分群模組30可透過隱含狄利克雷分布(Latent Dirichlet Allocation;LDA)主題模型C將資料中每篇文章按照機率分布之形式產出(產生出來),以決定每篇文章之主題。
In step S3, the
在步驟S4中,潛力熱詞度計算模組40計算重要詞之時序分析指標I1與動量評估指標I2等重要指標。亦即,潛力熱詞度計算模組40可計算重要詞擷取模組20所擷取之資料中之重要詞之時序分析指標I1與動量評估指標I2等,以供後續進行潛力熱詞度之計算。因此,潛力熱詞度計算模組40可藉由重要詞之時序分析指標I1與動量評估指標I2等重要指標以因應複雜情況,亦利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效,還能提供最佳分析服務予使用者。
In step S4, the potential hot word
在步驟S5中,潛力熱詞度計算模組40依據時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2。亦即,潛力熱詞度計算模組40可採取客觀方式、預設方式或客觀加預設之混合方式,以依據重要詞之時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2。
In step S5, the potential hot word
當計算重要詞之時序分析指標I1時,潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)以評估資料中之重要詞之時序變化,再由潛力熱詞度計算模組40透過有關時序分析指標I1之動態主題模型(Dynamic Topic Models;DTM)與動態影響模型(Dynamic Influence Models;DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。
When calculating the temporal analysis index I 1 of important words, the potential hot word
當計算重要詞之動量評估指標I2時,潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集),以將資料(如各群資料集)中之重要詞視為一物體在此物體之運動方向上保持運動之趨勢,並藉由有關動量評估指標I2之動量演算法(如動量公式)
搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果,且潛力熱詞度計算模組40可使用重要詞對於時間之詞頻(Term Frequency;TF)結果或詞頻-逆向文件頻率(Term Frequency-Inverted Document Frequency;TF-IDF)結果以產生此統計分析結果。
When calculating the momentum evaluation index I 2 of important words, the potential hot word
在步驟S6中,潛力熱詞度計算模組40將重要詞之時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2以產生或計算出重要詞之潛力熱詞度。亦即,潛力熱詞度計算模組40可利用潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法),將重要詞之時序分析指標I1與動量評估指標I2分別乘以第一潛力熱詞度權重W1與第二潛力熱詞度權重W2後進行加總(如線性加總),以產生或計算出重要詞之潛力熱詞度。
In step S6, the potential hot word
在步驟S7中,潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。亦即,潛力熱詞預測模組50可接收或取得潛力熱詞度計算模組40所產生或計算之重要詞之潛力熱詞度,以由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞(如預測出此重要詞是否為潛力熱詞或最佳潛力熱詞),有利於預先察覺潛力熱詞之趨勢。
In step S7, the potential hot
上述步驟S4至步驟S7中,潛力熱詞度計算模組40可將時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2,以自動化產生或計算出重要詞之潛力熱詞度,有利於藉由重要詞之時序分析指標I1與動量評估指標I2等重要指標以因應複雜情況,亦能自動化或系統化產生(計算出)重要詞之潛力熱詞度。然後,潛力熱詞預
測模組50可依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞,以利達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效。
In the above steps S4 to S7, the potential hot word
在步驟S8中,當資料蒐集模組10取得新的資料(如資料集)時,再重新返回步驟S1至步驟S7,以進行下一輪次之潛力熱詞度之計算及潛力熱詞之預測等。
In step S8, when the
以下說明本發明之制定潛力熱詞度之系統1及其方法之實施例。本實施例係採用輿情新聞等資料(如資料集),以由重要詞擷取模組20擷取資料中之重要詞(如複數不同的重要詞)作為特徵向量,再由資料分群模組30透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。繼之,由潛力熱詞度計算模組40依據時序分析與動量評估等需求條件分別訂定重要詞之時序分析指標I1與動量評估指標I2等重要指標,以提升潛力熱詞之預測及掌握潛力熱詞之趨勢為目標,再將時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2進行加總(如線性加總)以得到重要詞之潛力熱詞度,俾由潛力熱詞預測模組50依據重要詞之潛力熱詞度預測出潛力熱詞或最佳潛力熱詞。
The following describes the embodiments of the
因此,本發明在時序分析指標I1之數值低、或動量評估指標I2之數值低、或時序分析指標I1之數值高但動量評估指標I2之數值低等各種複雜的組合條件下,仍能預測較合適或最佳潛力熱詞,亦能有效掌握潛力熱詞之趨勢。 Therefore, in the present invention, under various complex combination conditions such as a low value of the time series analysis index I 1 , a low value of the momentum evaluation index I 2 , or a high value of the time series analysis index I 1 but a low value of the momentum evaluation index I 2 , It can still predict the more suitable or best potential hot words, and can also effectively grasp the trend of potential hot words.
舉例而言,資料蒐集模組10可蒐集或採用輿情新聞1個月之資料,以供使用者利用即時方式於線上設定欲查詢之資料範圍及時間區
間。接著,重要詞擷取模組20可擷取每筆資料中之重要詞來組成序列作為特徵向量,再由資料分群模組30依據層次之隱含狄利克雷分布(Hierarchical LDA)主題模型以次高層級之層級數(如層級數為3)作為資料之分群數,再透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出以決定每篇文章之主題。
For example, the
資料分群模組30可利用客觀方式、預設方式、或客觀加預設之混合方式(即客觀方式加上預設方式),例如利用客觀加預設之混合方式(如取客觀方式與預設方式之平均數或不同比例),將此資料之分群數制定為5群,使主題數等於次高層級之層級數(3)加上資料之分群數(5)兩者之平均數,即主題數=(3+5)/2=4,故資料分群模組30可透過隱含狄利克雷分布(LDA)主題模型C將資料分成第一主題群Topic1、第二主題群Topic2、第三主題群Topic3及第四主題群Topic4。然後,潛力熱詞度計算模組40可計算時序分析指標I1與動量評估指標I2,並分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2,以進一步計算出潛力熱詞度及預測出潛力熱詞或最佳潛力熱詞。
The
詳言之,在圖2之步驟S1中,資料蒐集模組10可蒐集資料並持續蒐集新的資料(如資料集),再以即時方式或定期方式上傳資料。亦即,使用者可利用即時方式於線上設定欲查詢之資料範圍及時間區間,也可利用定期方式設定系統排程所需之資料範圍、評估起迄時間及/或頻率。
Specifically, in step S1 of FIG. 2 , the
在圖2之步驟S2中,重要詞擷取模組20擷取資料(如資料集)中之重要詞(如複數不同的重要詞)。亦即,重要詞擷取模組20可透過例如為條件隨機場(CRF)斷詞模型之斷詞模型A對資料進行斷詞及詞彙之詞
性標註,且使用命名實體模型B對資料之詞彙進行命名實體之標註,以由重要詞擷取模組20依據詞頻(TF)選出詞彙之詞性(如代表詞性),再依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。
In step S2 of FIG. 2 , the
在圖2之步驟S3中,資料分群模組30依據資料中之重要詞(如複數不同的重要詞)對資料進行分群。亦即,資料分群模組30可將資料中之重要詞作為特徵向量,以依據資料中之重要詞採取客觀方式(如經過計算/標準化)、預設方式(如系統預設/使用者預設)或客觀加預設之混合方式(如經過計算/標準化加上系統預設/使用者預設)制定資料之分群數。又,資料分群模組30可透過隱含狄利克雷分布(LDA)主題模型C將資料中每篇文章按照機率分布之形式產出,以決定每篇文章之主題。
In step S3 of FIG. 2 , the
在圖2之步驟S4中,潛力熱詞度計算模組40計算重要詞之時序分析指標I1與動量評估指標I2等重要指標。亦即,本發明可藉由重要詞之時序分析指標I1與動量評估指標I2等重要指標以因應複雜情況,也利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效,還能提供最佳分析服務予使用者。
In step S4 of FIG. 2 , the potential hot word
當計算重要詞之時序分析指標I1時,潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集)以評估資料中之重要詞之時序變化,再由潛力熱詞度計算模組40透過有關時序分析指標I1之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。
When calculating the temporal analysis index I 1 of important words, the potential hot word
資料分群模組30可將資料進行分群以得到分群之結果為Topici(其中i=1,2,...,r,r代表分群數),再依據分群之結果對各分群以每筆
資料中之重要詞來組成序列作為特徵向量,俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(Time Slices)及擷取重要詞中前幾名之數量TNN(Top n Number)。
The
潛力熱詞度計算模組40可透過有關時序分析指標I1之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變
化數值,且將重要詞之時序變化數值以函數表示為,
再依據重要詞之時序分析結果記錄重要詞之時序變化量與時間的乘積之總和(即重要詞之時序變化量對於時間t之一階微分與時間的乘積之總和)。因此,潛力熱詞度計算模組40可採取正規化方式將重要詞之時序變化量與時間的乘積之總和進行正規化以作為重要詞之時序分析指標I1,如下列公式(1)所示。
The potential hot word
在上列公式(1)中,j=1,2,...,TNN;k=1,2,...,TS;k-1<t<k。t代表時間,TNN代表重要詞中前幾名之數量,TS代表時間區段數量。
(t)代表重要詞之時序變化數值之函數表示,代表重要詞之時序變化數值。而且,潛力熱詞度計算模組40可採取包括映射函數方式之正規化方式,以將重要詞之時序變化量轉為0至1之間,且映射函數方式包括Sigmoid函數方式(亦稱S型函數方式)與最小最大正規化(Min-Max Normalization)方式之至少一者。
In the above formula (1), j=1,2,...,TNN; k=1,2,...,TS; k-1<t<k. t represents time, TNN represents the number of top words in important words, and TS represents the number of time segments. (t) represents the functional expression of the temporal change value of important words, Represents the temporal change value of important words. Moreover, the potential hot word
例如,資料分群模組30可將資料進行分群以得到分群之結果為Topici(其中i=1,2,...,4),再依據分群之結果對各分群以每筆資料中之
重要詞來組成序列作為特徵向量,俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(如TS=10)及擷取重要詞中前幾名之數量TNN(如TNN=20),進而由潛力熱詞度計算模組40透過有關時序分析指標I1之動態主題模型(DTM)與動態影響模型(DIM)之至少一者以產生重要詞之時序變化數值或時序分析結果。
For example, the
舉例而言,如圖3所示之重要詞(例如“疫情”)的時序變化數值之示意圖。以第一主題群Topic1及重要詞之重要度第1名“疫情”為例,潛力熱詞度計算模組40對於重要詞之時序變化數值分別為:、
、、、、、、
9、、,進而將重要詞之時序變化數值以函數表示為如
下列公式(2)所示,其中t代表時間。
For example, Figure 3 shows a schematic diagram of the temporal changes in values of important words (such as "epidemic"). Taking the first topic group Topic 1 and the first important word "epidemic" as an example, the time series change values of the important words by the potential hot
繼之,潛力熱詞度計算模組40可進行公式(2)中時間t之一階微分如下列公式(3)所示。
Subsequently, the potential hot word
假設已計算出,
,則潛力熱詞度計算模組40可正規化
重要詞之時序變化量與時間的乘積之總和以作為重要詞如“疫情”之時序分析指標I1,即依據上述公式(1)計算出下列公式(4)所示之時序分析指標I1等於0.47。
Assume that it has been calculated , , then the potential hot word
當計算重要詞之動量評估指標I2時,潛力熱詞度計算模組40可利用基於隱含狄利克雷分布(LDA)主題模型C而得之資料(如各群資料集),以將資料(如各群資料集)中之重要詞視為一物體在此物體之運動方向上保持運動之趨勢,並藉由有關動量評估指標I2之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結果,且潛力熱詞度計算模組40可使用重要詞對於時間之詞頻(TF)結果或詞頻-逆向文件頻率(TF-IDF)結果以產生此統計分析結果。因此,由於動量評估指標I2可以衡量重要詞之動量值,故潛力熱詞度計算模組40依據動量評估指標I2計算出重要詞之潛力熱詞度,有利於潛力熱詞預測模組50依據重要詞之潛力熱詞度更合適地挑選出潛力熱詞或最佳潛力熱詞。
When calculating the momentum evaluation index I 2 of important words, the potential hot word
詳言之,資料分群模組30可將資料進行分群以得到分群之結果為Topici(其中i=1,2,...,r,r代表分群數),再採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(Time Slices)及擷取重要詞中前幾名之數量TNN(Top n Number)。潛力熱詞度計算模組40可將資料中之重要詞(如Wordj,其中j=1,2,...,TNN)視為一物體在此物體之運動方
向上保持運動之趨勢,並藉由有關動量評估指標I2之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之動量分析結
果。例如,動量演算法為;其中,代表重要詞之
動量,單位為公斤‧公尺/秒(kg‧m/s);代表重要詞之質量,單位為公斤(kg);代表重要詞之速度,單位為公尺/秒(m/s)。
Specifically, the
潛力熱詞度計算模組40可依據統計分析結果取重要詞對於每個時間區段之詞頻(其中j=1,2,...,TNN以及t=1,2,...,TS)以制定
重要詞之質量及重要詞之速度。同時,潛力熱詞度計算模組40
可將重要詞之質量取重要詞對於每個時間區段之詞頻之和,亦即
;以及將重要詞之速度取重要詞對於每個時間
區段之詞頻與時間的乘積之和,亦即。因此,潛力
熱詞度計算模組40可採取正規化方式將重要詞之動量值進行正規化以作為重要詞之動量評估指標I2,如下列公式(5)所示。
The potential hot
在上列公式(5)中,j=1,2,...,TNN;t=1,2,...,TS。TNN代表 重要詞中前幾名之數量,TS代表時間區段數量,代表詞頻。 代表重要詞之質量,代表重要詞之速度。 In the above formula (5), j=1,2,...,TNN; t=1,2,...,TS. TNN represents the number of top keywords, TS represents the number of time segments, Represents word frequency. Represents the quality of important words, Represents the speed of important words.
資料分群模組30可採取客觀方式、預設方式或客觀加預設之混合方式,以制定時間區段數量TS及擷取重要詞中前幾名之數量TNN。而且,潛力熱詞度計算模組40可採取包括映射函數方式之正規化方式,以
將重要詞之動量值轉為0至1之間,且映射函數方式包括Sigmoid函數方式(亦稱S型函數方式)與最小最大正規化(Min-Max Normalization)方式之至少一者。
The
資料分群模組30可將資料進行分群以得到分群之結果為Topici(其中i=1,2,...,4),再依據分群之結果對各分群以每筆資料中之重要詞來組成序列作為特徵向量,俾由資料分群模組30採取客觀方式、預設方式或客觀加預設之混合方式制定時間區段數量TS(如TS=10)及擷取重要詞中前幾名之數量TNN(如TNN=20)。
The
同樣地,以第一主題群Topic1及重要詞之重要度第1名如“疫情”為例,潛力熱詞度計算模組40可藉由有關動量評估指標I2之動量演算法(如動量公式)搭配統計分析結果以計算重要詞之動量值及產生重要詞之
動量分析結果。例如,動量演算法為;其中,代表
重要詞之動量,單位為公斤‧公尺/秒(kg‧m/s);代表重要詞之質量,
單位為公斤(kg);代表重要詞之速度,單位為公尺/秒(m/s)。
Similarly, taking Topic 1 of the first topic group and the first important word such as "epidemic" as an example, the potential hot
潛力熱詞度計算模組40可依據統計分析結果取重要詞對於每個時間區段之詞頻以制定重要詞之質量及重要詞之速度
,例如詞頻分別為、、、、、、、、
、。同時,潛力熱詞度計算模組40可將重要詞之
質量取重要詞對於每個時間區段之詞頻之和,例如 ;以及將重要詞之速度取重要詞對於每個時間區段
之詞頻與時間的乘積之和,例如。
The potential hot
假設已計算出
t))=2392516,,則潛力熱詞度計算模組40可採取正規化方式將重要詞之動量值進行正規化以作為重要詞(如“疫情”)之動量評估指標I2,例如依據上述公式(5)計算出下列公式(6)所示之動量評估指標I2等於0.73。
Assume that it has been calculated t))=2392516, , then the potential hot word
在圖2之步驟S5中,潛力熱詞度計算模組40依據時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2。亦即,潛力熱詞度計算模組40可接收基於大數據蒐集而得之資料(如資料集),並採用客觀方式、預設方式或客觀加預設之混合方式,以依據重要詞之時序分析指標I1與動量評估指標I2之重要程度分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2,其中,時序分析指標I1、動量評估指標I2、第一潛力熱詞度權重W1或第二潛力熱詞度權重W2可介於0至1之間,且第一潛力熱詞度權重W1與第二潛力熱詞度權重W2之總和可為1,但不以此為限。因此,本發明可藉由重要詞之時序分析指標I1與動量評估指標I2等重要指標以因應複雜情況,也利於後續之潛力熱詞預測模組50能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞,還能提供最佳分析服務予使用者。
In step S5 of FIG. 2 , the potential hot
換言之,潛力熱詞度計算模組40可計算重要詞之時序分析
指標I1與動量評估指標I2等重要指標,以依據時序分析指標I1與動量評估指標I2分別制定第一潛力熱詞度權重W1與第二潛力熱詞度權重W2,並採取客觀方式、預設方式或客觀加預設之混合方式制定一組權重值(W1 ,W2),其中,時序分析指標I1、動量評估指標I2、第一潛力熱詞度權重W1或第二潛力熱詞度權重W2(權重值)可介於0至1之間,且第一潛力熱詞度權重W1與第二潛力熱詞度權重W2(權重值)之總和可為1,俾供後續將時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2進行加總(如線性加總)以產生或計算出重要詞之潛力熱詞度。
In other words, the potential hot
舉例而言,可在Google網站上分別採用「time series analysis forecasting」與「Momentum」兩個關鍵字進行搜尋,以各自獲得搜尋次數為98,000,000次及219,000,000次,且採取客觀方式以制定經過計算/標準化後之一組客觀權重值為(W1 ,W2)=(0.31,0.69),後續亦可依據此客觀權重值之組合計算出重要詞之潛力熱詞度。 For example, you can search on the Google website using the two keywords "time series analysis forecasting" and "Momentum" respectively, and obtain 98,000,000 and 219,000,000 searches respectively, and adopt an objective approach to develop a calculated/standardized The latter set of objective weight values is (W 1 , W 2 )=(0.31 , 0.69). Subsequently, the potential popularity of important words can also be calculated based on this combination of objective weight values.
潛力熱詞度計算模組40除採取客觀方式以制定一組權重值外,亦可採取預設方式、或客觀加預設之混合方式(即客觀方式加上預設方式)以制定不同之一組權重值。例如,採取上述客觀方式以制定一組客觀權重值(W1 ,W2)=(0.31,0.69),或者採取預設方式以制定一組預設權重值(W1 ,W2)=(0.38,0.62),抑或者採取客觀方式之客觀權重值加上預設方式之預設權重值兩者之混合方式以制定一組權重值。
In addition to adopting an objective method to formulate a set of weight values, the potential hot word
在一實施例中,潛力熱詞度計算模組40採取客觀方式之客觀權重值(0.31,0.69)與預設方式之預設權重值(0.38,0.62)兩者之混合方式以制定一組權重值,且客觀方式與預設方式之採用比例=0.6:0.4(不同比例),
以對客觀權重值與預設權重值進行加權,進而得到(第一潛力熱詞度權重W1,第二潛力熱詞度權重W2)=(0.31,0.69)×0.6+(0.38,0.62)×0.4=(0.34,0.66),即第一潛力熱詞度權重W1與第二潛力熱詞度權重W2(權重值)分別為0.34及0.66。
In one embodiment, the potential hot
在圖2之步驟S6中,潛力熱詞度計算模組40將重要詞之時序分析指標I1與動量評估指標I2分別搭配第一潛力熱詞度權重W1與第二潛力熱詞度權重W2以產生或計算出重要詞之潛力熱詞度。亦即,潛力熱詞度計算模組40可利用潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法),將重要詞之時序分析指標I1與動量評估指標I2分別乘以第一潛力熱詞度權重W1與第二潛力熱詞度權重W2後進行加總(如線性加總),以產生或計算出重要詞之潛力熱詞度。
In step S6 of FIG. 2 , the potential hot word
例如,潛力熱詞度計算模組40之潛力熱詞度計算式(如潛力熱詞度計算公式或潛力熱詞度演算法)為:潛力熱詞度,其中,i=1,2;j=1,2,...,TNN;TNN代表重要詞中前幾名之數量,I1、I2、W1、W2分別代表時序分析指標、動量評估指標、第一潛力熱詞度權重、第二潛力熱詞度權重。時序分析指標I1、動量評估指標I2、第一潛力熱詞度權重W1或第二潛力熱詞度權重W2均可介於0至1之間,且第一潛力熱詞度權重W1與第二潛力熱詞度權重W2之總和可為1,故能推得潛力熱詞度亦介於0至1之間。 For example, the potential hot word degree calculation formula of the potential hot word degree calculation module 40 (such as the potential hot word degree calculation formula or the potential hot word degree algorithm) is: potential hot word degree , among them, i=1,2; j=1,2,...,TNN; TNN represents the number of top words in the important words, I 1 , I 2 , W 1 , and W 2 represent time series analysis indicators and momentum respectively. Evaluation indicators, first potential hot word weight, second potential hot word weight. The time series analysis index I 1 , the momentum evaluation index I 2 , the first potential hot word weight W 1 or the second potential hot word weight W 2 can be between 0 and 1, and the first potential hot word weight W The sum of 1 and the second potential hot word degree weight W 2 can be 1, so it can be deduced that the potential hot word degree is also between 0 and 1.
在一實施例中,時序分析指標I1與動量評估指標I2之組合(I1 ,I2)=(0.47,0.73),且第一潛力熱詞度權重W1與第二潛力熱詞度權重W2之組合(W1 ,W2)=(0.34,0.66)。潛力熱詞度計算模組40可利用潛力熱詞度計
算式,將「時序分析指標I1之數值0.47乘以第一潛力熱詞度權重W1之數值0.34」加上「動量評估指標I2之數值0.73乘以第二潛力熱詞度權重W2之數值0.66)」以計算出重要詞之潛力熱詞之數值0.64,故潛力熱詞之數值0.64亦介於0至1之間。
In one embodiment, the combination of the time series analysis index I 1 and the momentum evaluation index I 2 (I 1 , I 2 )=(0.47 , 0.73), and the first potential hot word weight W 1 and the second potential hot
潛力熱詞度=0.64係屬第一主題群Topic1之重要詞如“疫情”之重要度第1名數據,且第一主題群Topic1之重要詞包括例如:疫情、口罩、三級警戒、病毒、莫德納、奧運防疫措施、經濟艙、破口、BNT、老人...等。又,潛力熱詞度計算模組40可將重要詞之重要度由大至小排序分別為:、、、
、、潛力熱詞
、、、潛力
、。
The potential hot word degree = 0.64 is the data of the importance of the important words such as "epidemic" belonging to the first topic group Topic 1 , and the important words of the first topic group Topic 1 include, for example: epidemic, masks, three-level alert, Virus, Moderna, Olympic epidemic prevention measures, economy class, breach, BNT, the elderly...etc. In addition, the potential hot word
因此,重要詞之潛力熱詞度之數值或指標可反映未來潛力熱詞之趨勢,使用者亦可由重要詞之潛力熱詞度之數值(數據)立即掌握重要詞之潛力程度,並據此決策將潛力熱詞提供至例如輿情或社群媒體相關之分析服務。同時,由上述重要詞之潛力熱詞度之數據分析可知,重要詞之潛力熱詞度之數值越高,表示此重要詞之潛力程度越好,有利於判定此重要詞是否需持續關注及後續相關之分析服務。 Therefore, the numerical values or indicators of the potential hot words of important words can reflect the trend of potential hot words in the future. Users can also immediately grasp the potential degree of important words based on the numerical values (data) of the potential hot words, and make decisions accordingly. Provide potential hot words to analysis services related to public opinion or social media. At the same time, from the above data analysis of the potential hot word degree of important words, it can be seen that the higher the value of the potential hot word degree of an important word, the better the potential of this important word, which is helpful to determine whether this important word requires continued attention and follow-up. Related analysis services.
本發明依據重要詞之時序分析指標I1與動量評估指標I2等重要指標產生或計算出重要詞之潛力熱詞度,且此潛力熱詞度能反映時序分析指標I1與動量評估指標I2等兩個向度。所以,相較於習知僅以特定熱詞評估指標(單一熱詞評估指標)作為衡量潛力熱詞,本發明採用時序分析指標I1 與動量評估指標I2等重要指標(多重評估因子),以利因應複雜情況,更能有效且合適地挑選出潛力熱詞或最佳潛力熱詞,亦能提供最佳分析服務予使用者。 The present invention generates or calculates the potential hot word degree of important words based on important indicators such as the timing analysis index I 1 and the momentum evaluation index I 2 of the important words, and this potential hot word degree can reflect the timing analysis index I 1 and the momentum evaluation index I 2 and other two dimensions. Therefore, compared with the common practice of only using specific hot word evaluation indicators (single hot word evaluation index) as a measure of potential hot words, the present invention uses important indicators (multiple evaluation factors) such as the time series analysis index I 1 and the momentum evaluation index I 2 . In order to cope with complex situations, potential hot words or the best potential hot words can be more effectively and appropriately selected, and the best analysis services can be provided to users.
另外,本發明還提供一種針對基於時序分析指標與動量評估指標制定潛力熱詞度之方法之電腦可讀媒介,係應用於具有處理器及/或記憶體的計算裝置或電腦中,且電腦可讀媒介儲存有指令,並可利用計算裝置或電腦透過處理器及/或記憶體執行電腦可讀媒介,以於執行電腦可讀媒介時執行上述內容。例如,處理器可為微處理器、中央處理器(CPU)、圖形處理器(GPU)等,記憶體可為隨機存取記憶體(RAM)、記憶卡、硬碟(如雲端/網路硬碟)、資料庫等,但不以此為限。 In addition, the present invention also provides a computer-readable medium for a method of formulating potential hot words based on timing analysis indicators and momentum evaluation indicators, which is applied to a computing device or computer with a processor and/or memory, and the computer can The computer-readable medium stores instructions and can be executed by a computing device or computer through a processor and/or memory to execute the above contents when the computer-readable medium is executed. For example, the processor can be a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), etc., and the memory can be a random access memory (RAM), a memory card, a hard drive (such as a cloud/network hard drive). discs), databases, etc., but are not limited to this.
綜上,本發明之制定潛力熱詞度之系統、方法及電腦可讀媒介至少具有下列特色、優點或技術功效。 In summary, the system, method and computer-readable medium for formulating potential hot words of the present invention have at least the following features, advantages or technical effects.
一、本發明之資料蒐集模組能設定啟動之時間點(時間區間)與擷取之資料範圍,以利自動化排程之執行,亦能控管資料或資料來源之執行進度。 1. The data collection module of the present invention can set the start-up time point (time interval) and the data range to be retrieved to facilitate the execution of automated schedules, and can also control the execution progress of data or data sources.
二、本發明之重要詞擷取模組能採用例如為條件隨機場(CRF)斷詞模型之斷詞模型與命名實體模型,有利於依據詞彙之詞性與命名實體之標註擷取出資料中之重要詞。 2. The important word extraction module of the present invention can use word segmentation models and named entity models, such as conditional random field (CRF) word segmentation models, which is beneficial to extracting important words from the data based on the part-of-speech of words and the labeling of named entities. words.
三、本發明之資料分群模組能透過隱含狄利克雷分布(LDA)主題模型將資料中每篇文章按照機率分布之形式產出,以利決定每篇文章之主題。 3. The data grouping module of the present invention can generate each article in the data in the form of probability distribution through the latent Dirichlet distribution (LDA) topic model, so as to facilitate the determination of the topic of each article.
四、本發明之潛力熱詞度計算模組能計算重要詞之時序分析 指標與動量評估指標兩者,以依據兩者之重要程度制定第一潛力熱詞度權重與第二潛力熱詞度權重,有利於自動化或系統化產生重要詞之潛力熱詞度。 4. The potential hot word degree calculation module of the present invention can calculate the timing analysis of important words. Both the indicator and the momentum evaluation indicator are used to formulate the first potential hot word weight and the second potential hot word weight based on the importance of the two, which is conducive to automatically or systematically generating the potential hot word degree of important words.
五、本發明之潛力熱詞預測模組能依據重要詞之潛力熱詞度持續進行潛力熱詞之預測,以利因應複雜情況,亦能有效且合適地挑選出潛力熱詞或最佳潛力熱詞。 5. The potential hot word prediction module of the present invention can continuously predict potential hot words based on the potential hot words of important words, so as to facilitate the response to complex situations, and can also effectively and appropriately select potential hot words or the best potential hot words. words.
六、本發明採用更廣義之潛力熱詞評估方式,能將重要詞之時序分析指標與動量評估指標分別搭配第一潛力熱詞度權重與第二潛力熱詞度權重,以利自動化或系統化產生重要詞之潛力熱詞度,亦利於潛力熱詞預測模組預測出潛力熱詞或最佳潛力熱詞。 6. The present invention adopts a broader potential hot word evaluation method, which can combine the timing analysis indicators and momentum evaluation indicators of important words with the first potential hot word weight and the second potential hot word weight respectively to facilitate automation or systemization. Generating the potential hot word degree of important words also helps the potential hot word prediction module to predict potential hot words or the best potential hot words.
七、相較於習知僅採用特定熱詞評估指標(單一熱詞評估指標),本發明可依據時序分析指標與動量評估指標等重要指標(多重評估因子)產生重要詞之潛力熱詞度,亦可依據潛力熱詞度預測出潛力熱詞或最佳潛力熱詞,藉此能達到有效且合適地挑選出潛力熱詞或最佳潛力熱詞之功效,亦能提供最佳分析服務予使用者。 7. Compared with the conventional method of only using specific hot word evaluation indicators (single hot word evaluation index), the present invention can generate the potential hot word degree of important words based on important indicators (multiple evaluation factors) such as time series analysis indicators and momentum evaluation indicators. Potential hot words or the best potential hot words can also be predicted based on the degree of potential hot words. This can achieve the effect of effectively and appropriately selecting potential hot words or the best potential hot words, and can also provide the best analysis services for use. By.
八、本發明採用時序分析指標與動量評估指標等重要指標產生重要詞之潛力熱詞度,能改善習知之潛力熱詞預測系統僅以點擊數、轉載數、評論數、話題關注度甚至網紅影響力等數據挑選潛力熱詞,導致無法因應複雜情況進行潛力熱詞預測之缺點。 8. The present invention uses important indicators such as time series analysis indicators and momentum evaluation indicators to generate the potential hot word degree of important words, which can improve the conventional potential hot word prediction system based only on the number of clicks, the number of reprints, the number of comments, topic attention, and even Internet celebrities. Influence and other data select potential hot words, resulting in the shortcoming of being unable to predict potential hot words in response to complex situations.
上述實施形態僅例示性說明本發明之原理、特點及其功效,並非用以限制本發明之可實施範疇,任何熟習此項技藝之人士均能在不違背本發明之精神及範疇下,對上述實施形態進行修飾與改變。任何使用本 發明所揭示內容而完成之等效改變及修飾,均仍應為申請專利範圍所涵蓋。因此,本發明之權利保護範圍應如申請專利範圍所列。 The above embodiments are only illustrative of the principles, characteristics and effects of the present invention, and are not intended to limit the scope of the present invention. Anyone skilled in the art can make the above-mentioned modifications without violating the spirit and scope of the present invention. Modify and change the implementation form. Any use of this Equivalent changes and modifications made based on the disclosed content of the invention should still be covered by the scope of the patent application. Therefore, the protection scope of the present invention should be as listed in the patent application scope.
1:制定潛力熱詞度之系統 1: Develop a system for identifying potential hot words
10:資料蒐集模組 10:Data collection module
20:重要詞擷取模組 20: Key word extraction module
30:資料分群模組 30: Data grouping module
40:潛力熱詞度計算模組 40:Potential hot word calculation module
50:潛力熱詞預測模組 50:Potential hot word prediction module
A:斷詞模型 A: Word segmentation model
B:命名實體模型 B: Named entity model
C:隱含狄利克雷分布(LDA)主題模型 C: Latent Dirichlet distribution (LDA) topic model
I1:時序分析指標 I 1 : Timing analysis indicators
I2:動量評估指標 I 2 : Momentum Assessment Indicator
W1:第一潛力熱詞度權重 W 1 : The first potential hot word weight
W2:第二潛力熱詞度權重 W 2 : The second potential hot word weight
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110148174A TWI825535B (en) | 2021-12-22 | 2021-12-22 | System, method and computer-readable medium for formulating potential hot word degree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110148174A TWI825535B (en) | 2021-12-22 | 2021-12-22 | System, method and computer-readable medium for formulating potential hot word degree |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202326471A TW202326471A (en) | 2023-07-01 |
TWI825535B true TWI825535B (en) | 2023-12-11 |
Family
ID=88147644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110148174A TWI825535B (en) | 2021-12-22 | 2021-12-22 | System, method and computer-readable medium for formulating potential hot word degree |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI825535B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085573A (en) * | 2016-02-14 | 2017-08-22 | 北京国双科技有限公司 | The acquisition methods and device of hot information |
CN107832418A (en) * | 2017-11-08 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device |
CN107895053A (en) * | 2017-12-13 | 2018-04-10 | 福州大学 | Emerging much-talked-about topic detecting system and method based on topic cluster momentum model |
CN111368072A (en) * | 2019-08-20 | 2020-07-03 | 河北工程大学 | Microblog hot topic discovery algorithm based on linear fusion of BTM and GloVe similarity |
CN111694930A (en) * | 2020-06-11 | 2020-09-22 | 中国农业科学院农业信息研究所 | Dynamic knowledge hotspot evolution and trend analysis method |
CN112559936A (en) * | 2020-12-16 | 2021-03-26 | 北京百度网讯科技有限公司 | Community content processing method and device, electronic equipment and storage medium |
CN113343118A (en) * | 2021-04-23 | 2021-09-03 | 东南大学 | Hot event discovery method under mixed new media |
-
2021
- 2021-12-22 TW TW110148174A patent/TWI825535B/en active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085573A (en) * | 2016-02-14 | 2017-08-22 | 北京国双科技有限公司 | The acquisition methods and device of hot information |
CN107832418A (en) * | 2017-11-08 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device |
CN107895053A (en) * | 2017-12-13 | 2018-04-10 | 福州大学 | Emerging much-talked-about topic detecting system and method based on topic cluster momentum model |
CN111368072A (en) * | 2019-08-20 | 2020-07-03 | 河北工程大学 | Microblog hot topic discovery algorithm based on linear fusion of BTM and GloVe similarity |
CN111694930A (en) * | 2020-06-11 | 2020-09-22 | 中国农业科学院农业信息研究所 | Dynamic knowledge hotspot evolution and trend analysis method |
CN112559936A (en) * | 2020-12-16 | 2021-03-26 | 北京百度网讯科技有限公司 | Community content processing method and device, electronic equipment and storage medium |
CN113343118A (en) * | 2021-04-23 | 2021-09-03 | 东南大学 | Hot event discovery method under mixed new media |
Also Published As
Publication number | Publication date |
---|---|
TW202326471A (en) | 2023-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980692B (en) | Influence calculation method based on microblog specific events | |
Lakkaraju et al. | What's in a name? understanding the interplay between titles, content, and communities in social media | |
Combarro et al. | Introducing a family of linear measures for feature selection in text categorization | |
Yin et al. | A straw shows which way the wind blows: ranking potentially popular items from early votes | |
Zhang et al. | Predicting users' domain knowledge from search behaviors | |
Du et al. | Microblog bursty topic detection based on user relationship | |
Lim et al. | Clustop: A clustering-based topic modelling algorithm for twitter using word networks | |
Kalampokis et al. | Combining social and government open data for participatory decision-making | |
Dong et al. | Micro-blog social moods and Chinese stock market: The influence of emotional valence and arousal on Shanghai Composite Index volume | |
Li et al. | A hybrid model for experts finding in community question answering | |
JP2007219929A (en) | Sensitivity evaluation system and method | |
Jiang et al. | Message clustering based matrix factorization model for retweeting behavior prediction | |
Medić et al. | Improved local citation recommendation based on context enhanced with global information | |
Shang et al. | Investigating rumor news using agreement-aware search | |
Alzazah et al. | Predict market movements based on the sentiment of financial video news sites | |
Wang et al. | Towards computational assessment of idea novelty | |
Wu et al. | Literature Explorer: effective retrieval of scientific documents through nonparametric thematic topic detection | |
Zhu et al. | Evolution analysis of online topics based on ‘word-topic’coupling network | |
JP2009116457A (en) | Method and device for analyzing internet site information | |
Darena et al. | Machine learning-based analysis of the association between online texts and stock price movements | |
TWI825535B (en) | System, method and computer-readable medium for formulating potential hot word degree | |
CN107895053A (en) | Emerging much-talked-about topic detecting system and method based on topic cluster momentum model | |
CN112989161A (en) | News public opinion monitoring method and device, electronic equipment and storage medium | |
Ding et al. | Leveraging text and knowledge bases for triple scoring: an ensemble approach-the Bokchoy triple scorer at WSDM Cup 2017 | |
TW201126359A (en) | Keyword evaluation systems and methods |