CN117034904A

CN117034904A - Method for obtaining hot words with stable heat, electronic equipment and storage medium

Info

Publication number: CN117034904A
Application number: CN202311296040.0A
Authority: CN
Inventors: 石江枫; 靳雯; 王全修; 赵洲洋; 于伟
Original assignee: Rizhao Ruian Information Technology Co ltd; Beijing Rich Information Technology Co ltd
Current assignee: Rizhao Ruian Information Technology Co ltd; Beijing Rich Information Technology Co ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-11-10
Anticipated expiration: 2043-10-09
Also published as: CN117034904B

Abstract

The invention relates to the technical field of hot word processing, and provides a method for acquiring hot words with stable heat, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a second feature similarity; acquiring a preset time period list according to the second feature similarity; acquiring a historical time period list; acquiring a first priority corresponding to a target keyword; acquiring a second priority corresponding to the target keyword; acquiring a third priority corresponding to the target keyword; comparing the third priority with a preset priority threshold corresponding to a preset time period to determine the hot words with stable heat, and obtaining the third priority corresponding to the target keyword according to the times of occurrence of the target keyword in the preset time period, the times of occurrence of the target keyword in the historical time period and the preset weight corresponding to the preset time period, comparing the third priority, further determining the hot words with stable heat, and improving the accuracy of obtaining the hot words with stable heat.

Description

Method for obtaining hot words with stable heat, electronic equipment and storage medium

Technical Field

The present invention relates to the field of hot word processing technologies, and in particular, to a method for obtaining hot words with stable heat, an electronic device, and a storage medium.

Background

The method comprises the steps of analyzing the hot words, facilitating a user to comprehensively know the essence, details or trend of event development, analyzing the hot words in a certain time period, facilitating the user to comprehensively know the essence of the event, analyzing the hot words with stable heat, acquiring the hot words with stable heat, analyzing event texts, extracting key feature words in the texts, analyzing the frequency values of the key feature words in the texts and the frequency values of the key feature words in a preset time period, acquiring the heat values corresponding to the key feature words, comparing the heat values corresponding to the key feature words with the historical heat values of the key feature words, and determining whether the key feature words are the hot words with stable heat.

However, the above method also has the following technical problems:

according to the method, the heat value of the key feature words in the preset time period is obtained, the heat value of the key feature words is analyzed, whether the key feature words are heat words with stable heat or not is determined, only the heat words with stable heat in a short time period can be obtained, the number of times of occurrence of the key feature words in the historical time period and the importance degree of the preset time period cannot be analyzed, the heat words with stable heat in a long time period are obtained, the heat words with stable heat in a short time period are limited in nature and not comprehensive enough compared with the heat words with stable heat in a long time period, the number of heat words with stable heat in a short time period is large, and the heat words with the same meaning exist.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

according to a first aspect of the present invention, there is provided a method for obtaining a hot word with stable heat, comprising the steps of:

s100, according to a keyword list C= { C corresponding to the target text ₁ ，C ₂ ，……，C _j ，……，C _n List of pre-set thermally stable feature words a ₂ ={A ₂₍₁₎ ，A ₂₍₂₎ ，……，A _2(i1），……，A _2(m1) Obtaining C and A ₂ Second feature similarity D between ₂ ，C _j For the j-th key feature word, j=1, 2, … …, n, n is the number of key feature words, a _2(i1) For the i1 st preset thermally stable feature word, i1=1, 2, … …, m1, m1 is the number of preset thermally stable feature words, where D ₂ Meets the following conditions:

D ₂ =Σ ⁿ _j=1 (Σ ^m1 _i1=1 E ^2j _(i1) /m1)/n，E ^2j _(i1) is C _j And A is a _2(i1） And the second word similarity is the similarity between the key feature words and the preset heat stability feature words.

S200, when D ₂ ≤ΔD ₂ At this time, a preset time period list j= { J is acquired ₁ ，J ₂ ，……，J _b ，……，J _d }，J _b For the b-th preset time period, b=1, 2, … …, D, D is the number of preset time periods, where Δd ₂ Is a second similarity threshold.

S300, according to J, acquiring a historical time period column corresponding to JTable J ⁰ ={J ⁰ ₁ ，J ⁰ ₂ ，……，J ⁰ _b ，……，J ⁰ _d }，J ⁰ _b ={J ⁰ _b1 ，J ⁰ _b2 ，……，J ⁰ _bf ，……，J ⁰ _bz }，J ⁰ _bf Is J _b Corresponding historical time period list J ⁰ _b F=1, 2, … …, z, z is the number of history periods corresponding to the preset period.

S400, according to G _x And J _b Obtaining J _b Middle G _x Corresponding first priority K ^b _x ，G _x For the xth target keyword in the target keyword list G, g= { G ₁ ，G ₂ ，……，G _x ，……，G _p X=1, 2, … …, p, p is the target keyword number, K ^b _x Meets the following conditions:

K ^b _x =β ^b ×K ^b-1 _x +(1-β ^b )×P ^b _x wherein K is ^b-1 _x Is J _b-1 Middle G _x Corresponding first priority, beta ^b Is J _b Corresponding to preset weight, P ^b _x Is J _b Middle G _x Number of occurrences in the system, where when b=1, K ¹ _x =β ¹ +(1-β ¹ )×P ¹ _x The preset weight is used for representing the importance degree of a preset time period, and the target keywords are keywords stored in the system and used for acquiring hot words.

S500 according to K ^b _x And J ⁰ _bf Obtaining J _b Middle G _x Corresponding second priority level K ^0b _x ，K ^0b _x Meets the following conditions:

K ^0b _x =log(P ^b _x ×Q ^b _x /K ^b-1 _x )，Q ^b _x is J ⁰ _b Comprises G _x Time point of occurrence in systemJ of (2) ⁰ _bf And when b=1, K ⁰¹ _x =log(P ¹ _x ×Q ¹ _x )。

S600 according to K ^0b _x Obtaining J _b Middle G _x Corresponding third priority level K ^1b _x ，K ^1b _x Meets the following conditions:

K ^1b _x =P ^b _x /(P ^b _x +Σ ^p _x=1 P ^b _x /p)×K ^0b _x +Σ ^p _x=1 P ^b _x /p/(P ^b _x +Σ ^p _x=1 P ^b _x /p)×(Σ ^p _x=1 K ^0b _x /p)。

s700, when K ^1b _x ≥K ² _b When G is to _x K as a heat word with stable heat ² _b Is J _b A corresponding preset priority threshold.

According to a second aspect of the present invention there is provided a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by a processor to implement the method as described above.

According to a third aspect of the present invention there is provided an electronic device comprising a processor and the non-transitory computer readable storage medium as described above.

The invention has at least the following beneficial effects:

the invention provides a method for obtaining hot words with stable heat, which comprises the following steps: obtaining a second feature similarity between the keyword feature word list and the preset heat stable feature word list according to the keyword feature word list and the preset heat stable feature word list corresponding to the target text; when the second feature similarity is not greater than a second similarity threshold, acquiring a preset time period list; acquiring a historical time period list according to a preset time period list; acquiring a first priority corresponding to the target keyword in a preset time period according to the occurrence times of the target keyword in the preset time period and a preset weight corresponding to the preset time period; acquiring a second priority corresponding to the target keyword in a preset time period according to the occurrence times of the target keyword in the historical time period and the first priority; acquiring a third priority corresponding to the target keyword in a preset time period according to the second priority; comparing the third priority with a preset priority threshold corresponding to a preset time period to determine a thermally stable hot word, wherein the invention can acquire the second feature similarity, compares the second feature similarity to determine whether the thermally stable hot word needs to be acquired, and when the thermally stable hot word needs to be acquired, acquires the third priority corresponding to the target keyword according to the occurrence times of the target keyword in the preset time period, the occurrence times of the target keyword in the historical time period and the preset weight corresponding to the preset time period, compares the third priority to further determine the thermally stable hot word, thereby being beneficial to improving the accuracy of acquiring the thermally stable hot word and further enabling a user to know the essence of an event more comprehensively.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for obtaining hot words with stable heat according to an embodiment of the present invention;

FIG. 2 is a flowchart of a computer program executed by a data processing system for obtaining hot words with fluctuating heat levels according to a second embodiment of the present invention;

fig. 3 is a flowchart of a hot word acquiring system with increasing hotness for executing a computer program according to a third embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Example 1

The first embodiment provides a method for obtaining a hot word with stable heat, as shown in fig. 1, including the following steps:

D ₂ =Σ ⁿ _j=1 (Σ ^m1 _i1=1 E ^2j _(i1) /m1)/n，E ^2j _(i1) is C _j And A is a _2(i1） The corresponding second term similarity, which is the similarity between the key feature word and the preset heat stability feature word, wherein, any method for obtaining the similarity of two terms in the prior art is known to those skilled in the art, and all the methods belong to the protection scope of the present invention, and are not described herein, for example: cosine distance, word bag model, TF-IDF, K-mxans cluster.

Specifically, the smaller the similarity of the second word, the more similar the key feature word is to the preset heat stability feature word.

Further, the smaller the second feature similarity, the more similar the keyword feature word list is to the preset heat stable feature word list.

Specifically, the target text is text which can express the user requirement and is input by the user in the system.

Further, the keyword corresponding to the target text is a word extracted from the target text and capable of expressing the text feature of the target text, where those skilled in the art know that any method in the prior art for extracting a word from the text and capable of expressing the text feature belongs to the protection scope of the present invention, and is not repeated herein, for example: word2Vx, association analysis, NLP model.

Specifically, the preset heat stability feature word is a preset word capable of characterizing the heat stability of the heat word, for example: stable, unchanged and fluctuation-free, wherein the preset heat stability characteristic words are words preset by the person skilled in the art according to actual requirements, and are not described herein.

S200, when D ₂ ≤ΔD ₂ At this time, a preset time period list j= { J is acquired ₁ ，J ₂ ，……，J _b ，……，J _d }，J _b For the b-th preset time period, b=1, 2, … …, D, D is the number of preset time periods, where Δd ₂ For the second similarity threshold, the measurement unit of the preset time period length is day, and those skilled in the art know that the preset time period and the length of the preset time period are set by those skilled in the art according to actual requirements, and are not described herein.

Specifically, S200 includes the steps of obtaining ΔD ₂ ：

S201, according to C and a preset heat fluctuation feature word list A ₁ ={A ₁₁ ，A ₁₂ ，……，A _1i ，……，A _1m Obtaining C and A ₁ First feature similarity D between ₁ ，A _1i For the i-th preset heat fluctuation feature word, i=1, 2, … …, m, m is the number of preset heat fluctuation feature words, wherein D ₁ Meets the following conditions:

D ₁ =Σ ⁿ _j=1 (Σ ^m _i=1 E ^1j _i /m)/n，E ^1j _i is C _j And A is a _1i Corresponding first word phaseThe similarity, the first term similarity is the similarity between the key feature word and the preset heat fluctuation feature word, and those skilled in the art know that the manner of obtaining the first term similarity is the same as the manner of obtaining the second term similarity, and will not be described herein.

Specifically, the smaller the similarity of the first word, the more similar the key feature word is to the preset heat fluctuation feature word.

Further, the smaller the first feature similarity, the more similar the keyword feature word list and the preset heat fluctuation feature word list are.

Specifically, the preset heat fluctuation feature word is a preset word which can represent the feature that the heat of the heat word slightly floats up and down, for example: the preset heat fluctuation feature words are known to those skilled in the art and are preset words according to actual requirements, and are not described herein.

S203, increasing the feature word list A according to the C and the preset heat ₃ ={A ₃₍₁₎ ，A ₃₍₂₎ ，……，A _3(i2) ，……，A _3(m2) Obtaining C and A ₃ Third feature similarity D between ₃ ，A _3(i2) For the i2 th preset heat increment feature word, i2=1, 2, … …, m2, m2 is the number of preset heat increment feature words, where D ₃ Meets the following conditions:

D ₃ =Σ ⁿ _j=1 (Σ ^m2 _i2=1 E ^3j _(i2) /m2)/n，E ^3j _(i2) is C _j And A is a _3(i2) The corresponding third word similarity, which is the similarity between the key feature word and the preset heat increment feature word, is known to those skilled in the art, and the manner of obtaining the third word similarity is the same as that of obtaining the second word similarity, and is not described herein.

Specifically, the smaller the similarity of the third word, the more similar the key feature word is to the preset heat increment feature word.

Further, the smaller the third feature similarity, the more similar the keyword list and the preset heat increment feature word list are.

Specifically, the preset heat increment feature word is a preset word capable of characterizing the rapid heat increment of the heat word, for example: the characteristic words of increasing rapidly and gradually increase, wherein the preset heat increasing characteristic words are words preset by the person skilled in the art according to actual demands, and are not repeated here.

S205 according to D ₁ And D ₃ Obtaining DeltaD ₂ ，ΔD ₂ Meets the following conditions:

ΔD ₂ =(D ₁ +D ₃ )/2。

according to the method, the second feature similarity between the keyword list and the preset heat stability feature word list is obtained according to the second feature similarity between the keyword list and the preset heat stability feature word list, the first feature similarity between the keyword list and the preset heat fluctuation feature word list and the first feature similarity between the keyword list and the preset heat increment feature word list are obtained by the same process, when the second feature similarity is not more than one half of the sum of the first feature similarity and the third feature similarity, the keyword list and the preset heat stability feature word list are explained, and further, the requirement of a user can be determined to be that the heat word with the heat stability is obtained, at the moment, the preset time period list is obtained according to the preset time period list, the third priority corresponding to the target keyword is obtained according to the times of occurrence of the target keyword in the preset time period, the times of occurrence of the target keyword in the preset time period and the preset weight corresponding to the preset time period, and the third priority corresponding to the preset time period is compared, the heat stability is further determined, and the heat stability of the heat stability is better understood by the user.

S300, acquiring a history time period list J corresponding to J according to the J ⁰ ={J ⁰ ₁ ，J ⁰ ₂ ，……，J ⁰ _b ，……，J ⁰ _d }，J ⁰ _b ={J ⁰ _b1 ，J ⁰ _b2 ，……，J ⁰ _bf ，……，J ⁰ _bz }，J ⁰ _bf Is J _b Corresponding historical time period list J ⁰ _b F=1, 2, … …, z, z is the number of history periods corresponding to the preset period.

Specifically, the length of the preset time period is the same as the length of its corresponding history time period.

Further, the sum of the lengths of all the history time periods corresponding to the same preset time period is one year.

Further, the ending time point of the last history time period corresponding to the preset time period is the starting time point of the preset time period.

K ^b _x =β ^b ×K ^b-1 _x +(1-β ^b )×P ^b _x wherein K is ^b-1 _x Is J _b-1 Middle G _x Corresponding first priority, beta ^b Is J _b Corresponding to preset weight, P ^b _x Is J _b Middle G _x Number of occurrences in the system, where when b=1, K ¹ _x =β ¹ +(1-β ¹ )×P ¹ _x The preset weights are used for representing the importance degree of the preset time period, and those skilled in the art know that the preset weights are set by those skilled in the art according to actual requirements, and are not described herein.

Specifically, the target keyword is a keyword stored in the system for acquiring a hotword.

Specifically, the greater the preset weight, the higher the importance of the preset time period.

S500. According to K ^b _x And J ⁰ _bf Obtaining J _b Middle G _x Corresponding second priority level K ^0b _x ，K ^0b _x Meets the following conditions:

K ^0b _x =log(P ^b _x ×Q ^b _x /K ^b-1 _x )，Q ^b _x is J ⁰ _b Comprises G _x J at the time point of occurrence in the System ⁰ _bf And when b=1, K ⁰¹ _x =log(P ¹ _x ×Q ¹ _x )。

s700, when K ^1b _x ≥K ² _b When G is to _x K as a heat word with stable heat ² _b Is J _b The corresponding preset priority threshold is known to those skilled in the art, and is set by those skilled in the art according to actual requirements, and will not be described herein.

Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.

Example two

The second embodiment provides a method for acquiring hot words with fluctuation of heatA data processing system, comprising: keyword list C= { C corresponding to target text ₁ ，C ₂ ，……，C _j ，……，C _n List A of preset heat fluctuation feature words ₁ ={A ₁₁ ，A ₁₂ ，……，A _1i ，……，A _1m A processor and a memory storing a computer program, which when executed by the processor, performs the steps of, as shown in fig. 2:

S1, according to C and A ₁ Obtaining D ₁ Wherein D is ₁ Meets the following conditions:

D ₁ =Σ ⁿ _j=1 (Σ ^m _i=1 E ^1j _i /m)/n，E ^1j _i is C _j And A is a _1i And the corresponding first word similarity is the similarity between the key feature words and the preset heat fluctuation feature words.

S2, when D ₁ ≤ΔD ₁ In this case, a preset period list t= { T is acquired ₁ ，T ₂ ，……，T _g ，……，T _h }，T _g ={T _g1 ，T _g2 ，……，T _gr ，……，T _gs }，T _gr For the g-th preset period list T _g G=1, 2, … …, h, h is the number of preset cycles list, r=1, 2, … …, s, s is the number of preset cycles in the preset cycles list, Δd ₁ Is a first similarity threshold and ΔD ₁ Meets the following conditions:

ΔD ₁ =(D ₂ +D ₃ ) And/2, wherein the preset period in the preset period list is a period preset by a person skilled in the art according to actual needs, and is not described herein.

Specifically, the measurement unit of the preset period is days.

Further, the lengths of any two preset periods in the same preset period list are the same, and the lengths of the preset periods in any two different preset period lists are different.

When the first feature similarity is not more than half of the sum of the second feature similarity and the third feature similarity, the keyword list is the most similar to the preset heat fluctuation feature word list, and further it can be determined that the user's requirement is a heat word with fluctuation of heat, at this time, the preset period list is obtained, the first keyword is obtained according to the number of times of occurrence of the target keyword in the system in the preset period and the preset keyword, the first character string is obtained according to the first keyword, space characters in the first character string are deleted, the second character string is obtained, the length of the second character string is compared, the heat word with fluctuation of heat is determined, and accuracy of obtaining the heat word with fluctuation of heat is improved.

S3, according to the target keyword list G and T _gr Obtaining T _gr Corresponding first keyword list H _gr ，H _gr Comprises a plurality of first keywords, G= { G ₁ ，G ₂ ，……，G _x ，……，G _p }，G _x For the x-th target keyword, x=1, 2, … …, p, p is the target keyword number.

Specifically, S3 includes the steps of obtaining H _gr ：

S31, obtaining T _gr Middle G _x Corresponding frequency value L ^x _gr The frequency value is the number of times the target keyword appears in the system in a preset period.

S32, when L ^x _gr /(Σ ^s _r=1 L ^x _gr /s)≥L ⁰ At the time, T is acquired _gr Middle G _x Corresponding critical priority M ^x _gr Wherein M is ^x _gr Meets the following conditions:

M ^x _gr =log((L ^x _gr /(Σ ^s _r=1 L ^x _gr /s)+e)×(L ^x _gr +e)×log10×(Σ ^s _r=1 L ^x _gr +10)), where e is a natural constant, L ⁰ To preset the frequency-to-frequency ratio, those skilled in the art know that the preset frequency-to-frequency ratio is set by those skilled in the art according to actual requirements and is not repeated hereSaid.

S33, when L ^x _gr /(Σ ^s _r=1 L ^x _gr /s)＜L ⁰ At the time, M is acquired ^x _gr =0。

S34, when Σ ^s _r=1 L ^x _gr ≥L ¹ And all L ^x _gr ≥L ² And M is ^x _gr ＞M ⁰ When G is to _x As T _gr Corresponding second keywords to obtain T _gr Corresponding second keyword list N _gr ={N ¹ _gr ，N ² _gr ，……，N ^y _gr ，……，N ^q _gr }，N ^y _gr Is T _gr The corresponding y second keywords, y=1, 2, … …, q, q is the number of second keywords corresponding to the preset period, wherein L ¹ For the first preset frequency value, L ² For the second preset frequency value, M ⁰ For the preset key priority threshold, those skilled in the art know that the first preset frequency value, the second preset frequency value and the preset key priority threshold are set by those skilled in the art according to actual requirements, and are not described herein.

S35, acquiring a preset keyword list U= { U ₁ ，U ₂ ，……，U _a ，……，U _c }，U _a For the a-th preset keyword, a=1, 2, … …, c, and c are the number of preset keywords, where those skilled in the art know that the preset keywords are preset keywords set by those skilled in the art according to actual needs, and are not described herein.

S36, obtaining N ^y _gr And U _a Key similarity V between ^ya _gr The key similarity is the similarity between the second keyword and the preset keyword, and those skilled in the art know that the manner of obtaining the key similarity is the same as the manner of obtaining the similarity of the first word, which is not described herein.

Specifically, the smaller the key similarity, the more similar the second key and the preset key are.

S37, when V ^ya _gr ＞V ⁰ When N is set ^y _gr As T _gr Corresponding first keywords to obtain H _gr ，V ⁰ The preset key similarity threshold is known to those skilled in the art, and a specific value of the preset key similarity threshold is set by those skilled in the art according to actual needs, which is not described herein.

The method comprises the steps of processing the occurrence times of the target keywords in the system in each preset period, obtaining the keyword priority corresponding to the target keywords, comparing the occurrence times of the target keywords in the system in the preset period with the keyword priority to determine the second keywords, screening out the target keywords which are obviously not hot words, obtaining the keyword similarity between the second keywords and the preset keywords, comparing the second keyword similarity, screening out the second keywords which are identical to the preset keywords, determining the first keywords, wherein the preset keywords can be understood as words in a blacklist set by a user, thereby being beneficial to improving the accuracy of obtaining the first keywords, obtaining hot words with fluctuating heat according to the first keywords, and further being beneficial to improving the accuracy of obtaining the hot words with fluctuating heat.

S4, when T ⁰ _gr ∈[T ¹ ，T ² ]When H is taken _gr The first keyword in the list is used as a first character string to obtain a first character string list R= { R ₁ ，R ₂ ，……，R _k ，……，R _t }，R _k For the kth first string, k=1, 2, … …, T, T is the number of first strings, where T ¹ For a first preset period length, T ² For the second preset period length, those skilled in the art know that the first preset period length and the second preset period length are set by those skilled in the art according to actual needs, and are not described herein.

S5, R is _k Blank character deletion in order to obtain R _k Corresponding second character string R ⁰ _k Wherein, those skilled in the art know that any method of deleting space characters in a character string belongs to the prior artThe protection scope of the invention is not described in detail herein.

S6, when R ¹ _k At > 0, R is ⁰ _k As a hotword whose heat fluctuates, R ¹ _k Is R ⁰ _k Those skilled in the art know that any method for obtaining the length of the character string in the prior art belongs to the protection scope of the present invention, and is not described herein.

When the first feature similarity is not more than two times of the sum of the second feature similarity and the third feature similarity, the hot word with the fluctuation of the heat is determined, at this time, a preset period list is obtained, according to the number of times of occurrence of the target keyword in the system in the preset period and the preset keyword, the first keyword is obtained, the first character string is obtained according to the first keyword, space characters in the first character string are deleted, the second character string is obtained, the length of the second character string is compared, the hot word with the fluctuation of the heat is determined, the hot word with the small fluctuation of the heat is analyzed, the user can know details of an event more deeply, in the prior art, only the keyword with the heat value of the keyword being more than the threshold value can be determined as the hot word, whether the keyword is the hot word with the small fluctuation of the heat value cannot be determined, when the user needs to know details of the event, the hot word with the small fluctuation of the heat value cannot be obtained, therefore, the method cannot meet the requirements of the user and the requirements of the user can acquire the details more accurately, the invention can better meet the requirements of the user, and the user can know the details better than the details of the user.

Example III

The third embodiment provides a hotword acquisition system with incremental hotness, which includes: keyword list C= { C corresponding to target text ₁ ，C ₂ ，……，C _j ，……，C _n Incremental feature word list A of } and preset heat ₃ ={A ₃₍₁₎ ，A ₃₍₂₎ ，……，A _3(i2) ，……，A _3(m2) A processor and a memory storing a computer program, which when executed by the processor, performs the steps of, as shown in fig. 3:

s10 according to C and A ₃ Obtaining D ₃ Wherein D is ₃ Meets the following conditions:

D ₃ =Σ ⁿ _j=1 (Σ ^m2 _i2=1 E ^3j _(i2) /m2)/n，E ^3j _(i2) is C _j And A is a _3(i2) And the corresponding third word similarity is the similarity between the key feature words and the preset heat increment feature words.

S20, when D ₃ ≤ΔD ₃ In this case, a preset intermediate period list w= { W is obtained ₁ ，W ₂ ，W ₃ ，W ₄ ，W ₅ }，W ₁ For a first preset intermediate period, W ₂ For a second preset intermediate period, W ₃ For a third preset intermediate period, W ₄ For a fourth preset intermediate period, W ₅ For a fifth preset intermediate period, the second preset intermediate period is the period before the first preset intermediate period, the third preset intermediate period is the period before the second preset intermediate period, the fourth preset intermediate period is the period before the third preset intermediate period, the fifth preset intermediate period is the period with the time interval of one year with the first preset intermediate period and the fifth preset intermediate period is before the first preset intermediate period, and the delta D is the sum of the time interval of the first preset intermediate period and the time interval of the first preset intermediate period ₃ Is a third similarity threshold and ΔD ₃ Meets the following conditions:

ΔD ₃ =(D ₁ +D ₂ ) And/2, wherein the first preset intermediate period and the length of the first preset intermediate period are set by the person skilled in the art according to the actual requirement, and are not described herein.

Specifically, the lengths of the first preset intermediate period, the second preset intermediate period, the third preset intermediate period, the fourth preset intermediate period and the fifth preset intermediate period are all the same, and the measurement units of the lengths are days.

When the third feature similarity is not greater than half of the sum of the first feature similarity and the second feature similarity, the keyword list is most similar to the preset heat increment feature word list, and further it can be determined that the user's requirement is a heat word with increased heat, at this time, the preset intermediate period list is obtained, the increase rate list corresponding to the target keyword is obtained according to the number of times that the target keyword appears in the first preset intermediate period, the second preset intermediate period, the third preset intermediate period, the fourth preset intermediate period and the fifth preset intermediate period, the increase rate in the increase rate list is analyzed, the candidate weight corresponding to the target keyword is obtained, further, the intermediate priority corresponding to the target keyword is obtained, the intermediate priority is compared, and the target keyword with gradually or suddenly increased heat in the near-term is used as the heat word with increased heat, thereby being beneficial to improving the accuracy of obtaining the heat word with increased heat.

S30, according to G _x And W, obtain G _x Corresponding growth rate list W ⁰ _x ={W ⁰ _x1 ，W ⁰ _x2 ，W ⁰ _x3 ，W ⁰ _x4 }，W ⁰ _x1 Is W ⁰ _x First growth rate of W ⁰ _x2 Is W ⁰ _x A second rate of increase of W ⁰ _x3 Is W ⁰ _x Third growth rate of W ⁰ _x4 Is W ⁰ _x Fourth rate of increase of G _x For the xth target keyword in the target keyword list G, g= { G ₁ ，G ₂ ，……，G _x ，……，G _p X=1, 2, … …, p, p is the target keyword number.

Specifically, S30 includes the steps of:

s301 according to G _x 、W ₁ And W is ₂ Obtaining W ⁰ _x1 ，W ⁰ _x1 Meets the following conditions:

W ⁰ _x1 =(G _x1 -G _x2 )/G _x2 x 100%, where G _x1 Is W ₁ Middle G _x Is tied inNumber of occurrences in system, G _x2 Is W ₂ Middle G _x Number of occurrences in the system.

S303 according to G _x 、G _x1 、G _x2 And W is ₃ Obtaining W ⁰ _x2 ，W ⁰ _x2 Meets the following conditions:

W ⁰ _x2 =(G _x1 -(G _x2 +G _x3 ))/(G _x2 +G _x3 ) X 100%, where G _x3 Is W ₃ Middle G _x Number of occurrences in the system.

S305 according to G _x 、G _x1 、G _x2 、G _x3 And W is ₄ Obtaining W ⁰ _x3 ，W ⁰ _x3 Meets the following conditions:

W ⁰ _x3 =(G _x1 -(G _x2 +G _x3 +G _x4 ))/(G _x2 +G _x3 +G _x4 ) X 100%, where G _x4 Is W ₄ Middle G _x Number of occurrences in the system.

S307 according to G _x 、G _x1 And W is ₅ Obtaining W ⁰ _x4 ，W ⁰ _x4 Meets the following conditions:

W ⁰ _x4 =(G _x1 -G _x5 )/G _x5 ×100%，G _x5 is W ₅ Middle G _x Number of occurrences in the system.

According to the times of the target keywords in the first preset middle period, the second preset middle period, the third preset middle period, the fourth preset middle period and the fifth preset middle period, the growth rate list corresponding to the target keywords is obtained, the growth rate in the growth rate list is analyzed, the candidate weights corresponding to the target keywords are obtained, further, the middle priority corresponding to the target keywords is obtained, the middle priority is compared, and the target keywords with gradually or suddenly increased heat degree are used as heat words with gradually increased heat degree, so that the accuracy of obtaining the heat words with gradually increased heat degree is improved.

S40, according to W ⁰ _x1 、W ⁰ _x2 、W ⁰ _x3 、W ⁰ _x4 Acquisition of G _x Corresponding candidate weight G ⁰ _x 。

Specifically, S40 includes the steps of:

s401, when W ⁰ _x1 < Y and W ⁰ _x2 < Y and W ⁰ _x3 < Y and W ⁰ _x4 < Y or W ⁰ _x1 +W ⁰ _x2 +W ⁰ _x3 +W ⁰ _x4 ＜Y ⁰ When G is generated _x The corresponding first feedback mark is marked as '1', otherwise, G is generated _x The corresponding first feedback mark is marked as '0', Y is a first preset growth rate, Y ⁰ For the second preset growth rate, those skilled in the art know that the first preset growth rate and the second and preset growth rates are set by those skilled in the art according to actual needs, and will not be described herein.

In particular, the first feedback indicator is an indicator for characterizing whether the growth rate is less than a preset growth rate threshold.

Further, the identification "1" is characterized by: the first growth rate, the second growth rate, the third growth rate and the fourth growth rate are all smaller than the first preset growth rate or the sum of the first growth rate, the second growth rate, the third growth rate and the fourth growth rate is smaller than the second preset growth rate.

Further, the identification "0" is characterized by: the first growth rate, the second growth rate, the third growth rate and the fourth growth rate are all not smaller than the first preset growth rate, and the sum of the first growth rate, the second growth rate, the third growth rate and the fourth growth rate is not smaller than the second preset growth rate.

S403, when G _x1 ＜Y ¹ When G is generated _x The corresponding second feedback mark is mark "-1", otherwise, G is generated _x The corresponding second feedback mark is mark '-2', Y ¹ The preset frequency value is known to those skilled in the art, and is set by those skilled in the art according to actual needs, and will not be described herein.

Specifically, the second feedback identifier is an identifier for indicating whether the number of times of occurrence of the target keyword in the system in the first preset period is smaller than a preset frequency value.

Further, the identifier "-1" is characterized by: and the frequency of occurrence of the target keyword in the system in the first preset period is smaller than the mark of the preset frequency value.

Further, the identifier "-2" is characterized by: and the frequency of occurrence of the target keyword in the system in the first preset period is not less than the identification of the preset frequency value.

S405, when G _x The corresponding first feedback is identified as "1" and G _x When the corresponding second feedback mark is the mark '-1', G is acquired ⁰ _x =0, otherwise, obtain G ⁰ _x =1。

According to the method, the first feedback identification corresponding to the target keyword is determined by comparing the growth rate in the growth rate list, the second feedback identification corresponding to the target keyword is determined by comparing the frequency of occurrence of the target keyword in the system in the first preset middle period, the candidate weight corresponding to the target keyword can be accurately determined according to the first feedback identification and the second feedback identification, further, the middle priority corresponding to the target keyword is obtained, the middle priority is compared, and the target keyword with gradually or suddenly increased heat degree is used as the heat word with gradually increased heat degree, so that the accuracy of obtaining the heat word with gradually increased heat degree is improved.

S50 according to W ⁰ _x1 、W ⁰ _x2 、W ⁰ _x3 、W ⁰ _x4 And G ⁰ _x Acquisition of G _x Corresponding intermediate priority G ¹ _x ，G ¹ _x Meets the following conditions:

G ¹ _x =logG _x1 ×(W ⁰ _x1 +W ⁰ _x4 ×η ⁰ _x4 +((G _x2 -G _x3 )/G _x3 ×100%)×η ¹ +((G _x3 -G _x4 )/G _x4 ×100%)×η ² )×(1+1.5×γ)×((1-G ⁰ _x )×1+G ⁰ _x ×α)，η ⁰ _x4 is W ⁰ _x4 A corresponding first intermediate weight for representing the importance degree, eta of the fourth growth rate ¹ For the second intermediate weight for adjusting the intermediate priority, η ² For the third intermediate weight for adjusting the intermediate priority, γ is the fourth intermediate weight for adjusting the intermediate priority, α is G ⁰ _x The corresponding assigned weights are used for adjusting the intermediate priority according to the specific values of the candidate weights, wherein the first intermediate weight, the second intermediate weight, the third intermediate weight, the fourth intermediate weight and the assigned weights are known to those skilled in the art and are set by those skilled in the art according to actual requirements, and are not described herein.

S60, when G ¹ _x ≥G ² _x When G is to _x As a hotword with increasing hotness, G ² _x The preset intermediate priority threshold is known to those skilled in the art, and the value of the preset intermediate priority threshold is set by those skilled in the art according to the actual requirement, which is not described herein.

When the third feature similarity is not greater than two times of the sum of the first feature similarity and the second feature similarity, determining to obtain the hot word with increased heat, at this time, obtaining a preset intermediate period list, according to the number of times that the target keyword appears in the first preset intermediate period, the second preset intermediate period, the third preset intermediate period, the fourth preset intermediate period and the fifth preset intermediate period, obtaining a growth rate list corresponding to the target keyword, analyzing the growth rate in the growth rate list, determining a first feedback identifier corresponding to the target keyword, comparing the number of times that the target keyword appears in the system according to the first preset intermediate period, determining a second feedback identifier corresponding to the target keyword, according to the first feedback identifier and the second feedback identifier, accurately determining a candidate weight corresponding to the target keyword, further obtaining an intermediate priority corresponding to the target keyword, comparing the intermediate priority, and gradually or suddenly rising the target keyword as the heat with increased heat of the increased heat, thereby being beneficial to improving the heat accuracy of the acquired hot word with increased heat. In the prior art, the method for determining the hotword with the increased popularity is used for monitoring the hotword with the increased popularity in real time, when the hotword with the increased popularity is increased in real time, the hotword with the increased popularity is determined to be the hotword with the increased popularity, and when the hotword with the increased popularity is the hotword with the increased popularity in a shorter time but the hotword with the increased popularity is not the hotword with the increased popularity in a longer time, compared with the prior art, the method can determine the increase rate list of the target keyword according to the occurrence frequency of the target keyword in the preset middle period list, compare the increase rate in the increase rate list corresponding to the target keyword with the occurrence frequency of the target keyword in the first preset middle period, determine the hotword with the increased popularity, and not determine the hotword with the increased popularity according to the hotword with the popularity, thereby being beneficial to improving the accuracy of acquiring the hotword with the increased popularity and further being beneficial to helping the user to know the event development trend more accurately.

Specifically, the present invention also provides another embodiment, which is different from the above embodiment in that the method includes the following steps of:

s1000, inputting a key text into a preset text word segmentation model to obtain a first keyword information list corresponding to the key text, wherein the first keyword information list comprises a plurality of first keyword information, the first keyword information comprises first keywords and keyword parts of speech corresponding to the first keywords, the key text is a text which is input into a system by a user and needs to be extracted with hot words, a person skilled in the art knows that the preset text word segmentation model is an NLP model which is trained by the person skilled in the art according to actual requirements and can segment the text and output word segmentation information, and the details are omitted.

Specifically, the keyword parts of speech include: nouns, verbs, adjectives, and the like.

S2000, inputting the first keyword information list into the entity recognition model to obtain the keyword type corresponding to the first keyword, wherein a person skilled in the art knows that any entity recognition model in the prior art belongs to the protection scope of the present invention, and details are not repeated here.

Specifically, the keyword types include: name of person, place, organization, etc.

S3000, acquiring a preset regular expression list AB= { AB ₍₁₎ ，AB ₍₂₎ ，……，AB _(ai) ，……，AB _(am) }，AB _(ai) For the ai-th preset regular expression, ai=1, 2, … …, am, am is the number of preset regular expressions, and the preset regular expressions comprise preset word types or preset parts of speech, wherein the preset regular expressions are known to the person skilled in the art and preset regular expressions preset by the person skilled in the art according to actual requirements, and are not described in detail herein.

S4000, according to the keyword type, the keyword part of speech and AB corresponding to the first keyword _(ai) Obtaining a candidate keyword list AE= { AE ₍₁₎ ，AE ₍₂₎ ，……，AE _(ae) ，……，AE _(af) }，AE _(ae) For the ae candidate keyword, ae=1, 2 … …, af, af is the number of candidate keywords.

Specifically, S4000 includes the steps of obtaining AE:

s4100, associating the keyword type and keyword part of speech corresponding to the first keyword with AB _(ai) Matching to obtain AB _(ai) Corresponding second keyword list AC _(ai) The second keyword list comprises a plurality of second keywords, wherein the second keywords are keywords of which the part of speech and the type of the keywords of the first keywords accord with AB _(ai) Is a first keyword of the filtering logic of (a).

S4300, acquiring a key regular expression list AD= { AD input by a user ₍₁₎ ，AD ₍₂₎ ，……，AD _(aj) ，……，AD _(an) }，AD _(aj) For the ajth key regular expression, aj=1, 2, … …, an, an is the number of key regular expressions.

S4500, as AB _(ai) And AD (analog to digital) _(aj) When all are identical, determine AC _(ai) The second keywords in the list are first intermediate keywords to obtain a first intermediate keyword list, wherein the first intermediate keyword list comprises a plurality of first intermediate keywordsThe word, wherein identical is understood to be identical to the characters in the preset regular expression and the key regular expression.

S4700, deleting space characters in the first intermediate keywords in the first intermediate keyword list to obtain a second intermediate keyword list.

S4900, performing duplicate removal processing on the second intermediate keyword list to obtain AE.

Above-mentioned, handle the keyword, obtain first keyword information and the keyword type that first keyword corresponds, match according to the keyword part of speech and keyword type and regular expression of first keyword and obtain candidate keyword, and then handle candidate keyword, obtain the target keyword, can be according to text content and user's demand accurate determination target keyword, be favorable to improving the accuracy of obtaining the target keyword.

S5000 according to AE _(ae) And obtaining the target keywords.

Specifically, S5000 includes the steps of:

s5100, acquiring a preset word type list AF= { AF ₍₁₎ ，AF ₍₂₎ ，……，AF _(ar) ，……，AF _(as) }，AF _(ar) For the ar-th preset word type, ar=1, 2, … …, as, as is the number of preset word types, where those skilled in the art know that the preset word types are word types preset by those skilled in the art according to actual requirements, and are not described herein.

S5200, acquisition AE _(ae) Corresponding intermediate word type AE ⁰ _(ae) The intermediate word type is a keyword type corresponding to a first keyword identical to a first intermediate keyword corresponding to the candidate keyword.

S5300, acquisition AE ⁰ _(ae) And AF _(ar) Type similarity AG between ^(ae) _(ar) The term type may be understood as a label, and those skilled in the art know that any method for obtaining the similarity between two labels in the prior art belongs to the protection scope of the present invention, and is not described herein again, for exampleSuch as: cosine similarity.

Specifically, when the type similarity is 1, the intermediate word type is the most similar to the preset word type.

S5400 obtaining preset designated word list AH= { AH ₍₁₎ ，AH ₍₂₎ ，……，AH _(ax) ，……，AH _(ap) }，AH _(ax) For the ax preset specified words, ax=1, 2, … …, ap, and ap are the number of preset specified words, where those skilled in the art know that the preset specified words are words preset by those skilled in the art according to actual needs, and are not described herein.

S5500, acquisition AE _(ae) With AH (AH) _(ax) Word similarity AR between ^(ae) _(ar) The manner of obtaining the term similarity is the same as that of obtaining the type similarity, and will not be described herein.

Specifically, when the word similarity is 1, the candidate keyword is the most similar to the preset specified word.

S5600, a preset matching rule list AS is obtained, wherein the preset matching rule list comprises a plurality of preset matching rules, and a person skilled in the art knows that the preset matching rules are rules preset by a person skilled in the art according to actual requirements, and are not described in detail herein, for example: the keyword does not comprise an inaccurate digital suffix, wherein the inaccurate digital suffix is set by a person skilled in the art according to actual requirements.

S5700, when AE _(ae) When all preset matching rules in AS are met, AE is generated _(ae) The corresponding first mark is mark '2', otherwise, AE is generated _(ae) The corresponding first identifier is identifier "3", where those skilled in the art know that any method for determining whether the keyword meets the matching rule in the prior art belongs to the protection scope of the present invention, and is not described herein.

Specifically, the first identifier is an identifier for indicating whether the candidate keywords conform to all preset matching rules.

Specifically, the label "2" is characterized by: the candidate keywords conform to all preset matching rules.

Further, the label "3" is characterized by: the candidate keywords conform to all preset matching rules.

S5800, when AE _(ae) Any AG corresponding to ^(ae) _(ar) =1 and any AR ^(ae) _(ar) =1 and the first designation is designation "2" and length _(ae) ≥length ⁰ At the time, AE _(ae) Length as a target keyword _(ae) Length is the length of the key word ⁰ The preset keyword length is known to those skilled in the art, and is set by those skilled in the art according to actual needs, and will not be described herein.

In the prior art, most of keywords are acquired through TF-IDF, but the keywords cannot be acquired according to text content and user requirements, the keywords are processed, first keyword information and keyword types corresponding to the first keywords are acquired, candidate keywords are acquired by matching the keyword part of speech of the first keywords with the keyword types and regular expressions, the candidate keywords are processed, target keywords are acquired, the target keywords can be accurately determined according to the text content and the user requirements, and the accuracy of acquiring the target keywords is improved.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A method of obtaining a thermally stable hotword, the method comprising the steps of:

D ₂ =Σ ⁿ _j=1 (Σ ^m1 _i1=1 E ^2j _(i1) /m1)/n，E ^2j _(i1) is C _j And A is a _2(i1） The second word similarity is the similarity between the key feature words and the preset heat stability feature words;

s200, when D ₂ ≤ΔD ₂ At this time, a preset time period list j= { J is acquired ₁ ，J ₂ ，……，J _b ，……，J _d }，J _b For the b-th preset time period, b=1, 2, … …, D, D is the number of preset time periods, where Δd ₂ Is a second similarity threshold;

s300, acquiring a history time period list J corresponding to J according to the J ⁰ ={J ⁰ ₁ ，J ⁰ ₂ ，……，J ⁰ _b ，……，J ⁰ _d }，J ⁰ _b ={J ⁰ _b1 ，J ⁰ _b2 ，……，J ⁰ _bf ，……，J ⁰ _bz }，J ⁰ _bf Is J _b Corresponding historical time period list J ⁰ _b F=1, 2, … …, z, z is the number of historical time periods corresponding to the preset time period;

s400, according to G _x And J _b Obtaining J _b Middle G _x Corresponding first priority K ^b _x ，G _x For the xth target keyword in the target keyword list G, g= { G ₁ ，G ₂ ，……，G _x ，……，G _p X=1, 2, … …, p, p is the target keyword number，K ^b _x Meets the following conditions:

K ^b _x =β ^b ×K ^b-1 _x +(1-β ^b )×P ^b _x wherein K is ^b-1 _x Is J _b-1 Middle G _x Corresponding first priority, beta ^b Is J _b Corresponding to preset weight, P ^b _x Is J _b Middle G _x Number of occurrences in the system, where when b=1, K ¹ _x =β+(1-β)×P ¹ _x The preset weight is used for representing the importance degree of a preset time period, and the target keywords are keywords stored in the system and used for acquiring hot words;

s500, obtaining J _b Middle G _x Corresponding second priority level K ^0b _x ，K ^0b _x Meets the following conditions:

K ^0b _x =log(P ^b _x ×Q ^b _x /K ^b-1 _x )，Q ^b _x is J ⁰ _b Comprises G _x J at the time point of occurrence in the System ⁰ _bf And when b=1, K ⁰¹ _x =log(P ¹ _x ×Q ¹ _x )；

S600, obtaining J _b Middle G _x Corresponding third priority level K ^1b _x ，K ^1b _x Meets the following conditions:

K ^1b _x =P ^b _x /(P ^b _x +Σ ^p _x=1 P ^b _x /p)×K ^0b _x +Σ ^p _x=1 P ^b _x /p/(P ^b _x +Σ ^p _x=1 P ^b _x /p)×(Σ ^p _x=1 K ^0b _x /p)；

2. The method for obtaining hot words with stable heat according to claim 1, wherein S200 includes the steps of obtaining Δd ₂ ：

D ₁ =Σ ⁿ _j=1 (Σ ^m _i=1 E ^1j _i /m)/n，E ^1j _i is C _j And A is a _1i The similarity of the corresponding first words is the similarity between the key feature words and the preset heat fluctuation feature words;

D ₃ =Σ ⁿ _j=1 (Σ ^m2 _i2=1 E ^3j _(i2) /m2)/n，E ^3j _(i2) is C _j And A is a _3(i2) The similarity of the corresponding third words is the similarity between the key feature words and the preset heat increment feature words;

ΔD ₂ =(D ₁ +D ₃ )/2。

3. the method for obtaining hot words with stable heat according to claim 1, wherein the target text is a text which can express the requirement of the user and is input by the user in the system.

4. The method for obtaining hot words with stable heat according to claim 3, wherein the key feature words corresponding to the target text are words extracted from the target text and capable of expressing text features of the target text.

5. The method for obtaining a heat word with stable heat according to claim 1, wherein the preset heat stability feature word is a preset word capable of characterizing the heat stability of the heat word.

6. The method of claim 1, wherein the predetermined period of time is measured in days.

7. The method for obtaining hot words with stable heat according to claim 1, wherein the length of the preset time period is the same as the length of the corresponding history time period.

8. The method for obtaining a hot word with stable heat according to claim 1, wherein the length of all the history periods corresponding to the same preset period and the ending time point of the last history period corresponding to the preset period which is one year are the starting time points of the preset periods.

9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the method of any one of claims 1-8.

10. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 9.