CN107688596B

CN107688596B - Burst topic detection method and burst topic detection equipment

Info

Publication number: CN107688596B
Application number: CN201710433359.1A
Authority: CN
Inventors: 王健宗; 黄章成; 吴天博; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2017-06-09
Filing date: 2017-06-09
Publication date: 2020-02-21
Anticipated expiration: 2037-06-09
Also published as: CN107688596A; WO2018223718A1

Abstract

The invention provides a method and a device for detecting a burst topic, which are suitable for the technical field of Internet, and the method comprises the following steps: continuously acquiring topic data in the information sharing platform; when each topic data is obtained, matching the topic data with each word in a preset word bank to output a plurality of word segmentation results; outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data; updating summary information associated with the topic data according to the key words; and displaying the key words and the abstract information so as to enable a user to know the burst topics at the current moment. According to the method and the device, the keyword corresponding to the topic data can be determined, and the abstract information is updated based on the keyword, so that a user can quickly know the burst topic on the information sharing platform from the output keyword and the abstract information.

Description

Burst topic detection method and burst topic detection equipment

Technical Field

The invention belongs to the technical field of internet, and particularly relates to a burst topic detection method and a burst topic detection device.

Background

On information sharing platforms such as microblogs, Twitter and forums, users can share and forward various information anytime and anywhere based on the openness of the platforms. In a short time, if a large number of users share or forward the same information, the specific topic corresponding to the information is changed into a sudden topic with higher popularity. These outburst topics, if related to a specific enterprise, may bring a huge public opinion impact to the enterprise. If the enterprise cannot timely find and track the emergent topic events related to the company, the optimal time for eliminating the negative public opinion influence can be missed, so that the self soft strength of the enterprise is reduced.

However, in the prior art, it is difficult to quickly know the burst topics on the information sharing platform through technical means, and it is also difficult to determine whether each burst topic is related to the enterprise itself.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method for detecting a sudden topic and a device for detecting a hotness event, so as to solve the problems in the prior art that it is difficult to quickly know the sudden topic on an information sharing platform through a technical means and to determine whether each sudden topic is related to an enterprise itself.

A first aspect of an embodiment of the present invention provides a method for detecting a sudden topic, including:

continuously acquiring topic data in the information sharing platform;

when each topic data is obtained, matching the topic data with each word in a preset word bank so as to output a plurality of word segmentation results;

outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data;

updating summary information associated with the topic data according to the key words;

and displaying the key words and the abstract information so as to enable a user to know the burst topics at the current moment.

A second aspect of the embodiments of the present invention provides a sudden-topic detection apparatus, which includes a memory, a processor, and a sudden-topic detection program that is stored on the memory and can be executed on the processor, and when the processor executes the sudden-topic detection program, the following steps are implemented:

continuously acquiring topic data in the information sharing platform;

A third aspect of embodiments of the present invention provides a computer-readable storage medium storing a sudden-topic detection program, which when executed by at least one processor, implements the steps of:

continuously acquiring topic data in the information sharing platform;

In the embodiment of the invention, when the topic data in the information sharing platform is acquired each time, the keyword corresponding to the topic data is determined, and the abstract information is updated in real time based on the keyword, so that a user can know about what the emergent topic on the information sharing platform is probably from the output keyword and the abstract information at the first time, and can rapidly determine whether the emergent topic is related to the enterprise per se based on the abstract information, thereby effectively finding and tracking and processing the emergent topic event related to the enterprise, and improving the soft strength of the enterprise.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a burst topic detection method provided in an embodiment of the present invention;

fig. 2 is a flowchart of a specific implementation of the burst topic detection method S103 according to an embodiment of the present invention;

fig. 3 is a flowchart of a specific implementation of the burst topic detection method S104 according to an embodiment of the present invention;

fig. 4 is a flowchart of a specific implementation of the burst topic detection method S303 according to an embodiment of the present invention;

fig. 5 is a flowchart of a specific implementation of the method S305 for detecting a sudden topic according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a device for detecting a sudden topic provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a device for breaking out topics provided by an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 shows an implementation flow of the burst topic detection method provided by the embodiment of the present invention, where the method flow includes steps S101 to S105. The specific realization principle of each step is as follows:

s101: topic data in the information sharing platform is continuously acquired.

In the embodiment of the invention, the information sharing platform comprises but is not limited to microblog, Twitter, Facebook, big BBS forums and the like. Each piece of topic data is specifically a piece of text information which can be shown on the information sharing platform and issued by a user, and can be associated with one or more emergencies. The text information includes, but is not limited to, the original text, the transferred text, and the user comment data corresponding to the original text or the transferred text in the information sharing platform.

The topic data in the information sharing platform can be acquired through the following two ways: in the first mode, according to an Application program which is created in advance and can be used for interacting with an Application Programming Interface (API) of an information sharing platform, and according to a pre-acquired account key, in the Application program, an API Interface provided by the information sharing platform is called, so that topic data returned by the information sharing platform is acquired; and in the second mode, topic data in the information sharing platform is continuously crawled through a crawler program.

Because the topic data in the information sharing platform is continuously updated and continuously increased, in the embodiment of the invention, the topic data in the information sharing platform is obtained in real time, namely the topic data is continuously obtained, and the system can be ensured to obtain the latest topic data at all times, so that the detection of the burst topic can be accurately, timely and rapidly executed.

S102: and when each topic data is acquired, matching the topic data with each word in a preset word bank so as to output various word segmentation results.

When receiving a new topic data, the system carries out word matching processing on the topic data. Specifically, the system will determine whether the topic data contains a word in a preset lexicon, starting from the first character of the topic data. And when determining that the words formed by the characters which continuously appear in the topic data are the same as the words in the preset word bank, determining the continuously appearing characters as a participle, and re-executing the word matching process from the first character after the participle in the topic data. And when all the participles in the topic data are determined, determining to finish a word matching process once, and correspondingly outputting a word segmentation result in the word matching process, wherein the word segmentation result comprises a plurality of participles. In particular, the total number of characters per word segment is more than two.

In fact, for a character in the topic data, it can not only form a first segmentation with one or more characters adjacent to the left, but also form a first segmentation with one or more characters adjacent to the right, therefore, under the condition of different segmentation rules, the same topic data can obtain different segmentation results. In the embodiment of the invention, for a piece of topic data, a word segmentation result corresponding to each pre-stored word segmentation rule is output. The matching degrees corresponding to different word segmentation results may be different. The matching degree represents that the user can acquire the actual semantic degree of the topic data according to each participle in the participle result.

S103: and outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data.

In the embodiment of the present invention, the matching degree of each segmentation result may be determined according to the average number of characters of each segmentation, or the matching degree of each segmentation result may be determined according to the variance of the total number of characters of each segmentation, which is not limited herein.

Preferably, the larger the total number of characters of the participle is, the easier it is for the user to determine the actual semantics of the topic data from the participle, so the matching degree of each participle result is measured based on the longest matching principle. And after comparing the matching degree of each word segmentation result, outputting each first word segmentation contained in the word segmentation result with the maximum matching degree as a keyword corresponding to the topic data.

For example, when topic data only has three Chinese characters of "data line", since both "data line" and "data" can form a participle, and the matching degree of "data line" is higher, since the participle included in the participle result with the highest matching degree is determined to be "data line", the "data line" is output as a keyword.

As an embodiment of the present invention, a calculation method of the matching degree of the segmentation result is further defined. As shown in fig. 2, the step S103 specifically includes:

s201: and calculating the average number of the word segmentation characters of each word segmentation result according to the total number of the characters corresponding to each word segmentation in each word segmentation result and the total number of the words corresponding to each word segmentation result.

Each word segmentation result comprises a plurality of word segments, and each word segment comprises at least two characters. In the embodiment of the present invention, the total number of the segmented words is identified, and the total number of the characters of each segmented word is identified (that is, the number of the characters included in each segmented word is determined). And outputting the ratio of the sum of the total number of the characters corresponding to each participle to the total number of the participles as the average number of the participle characters.

For example, if a segmentation result obtained by performing segmentation processing on topic data is { skyway group/data line/yield }, the three segmentation results in the segmentation result are "skyway group", "data line" and "yield", the total number of characters of the three segmentation results is 4, 3 and 3, the total number of segmentation results is 3, and the average number of segmentation characters is (4+3+3)/3 is 3.33.

S202: and performing weighting processing on the word segmentation character average number and the word segmentation total number corresponding to each word segmentation result so as to output the matching degree of each word segmentation result.

In the embodiment of the invention, the average number A of word segmentation characters₁The corresponding weighting coefficient is a preset value a₁Total number of participles A₂The corresponding weighting coefficient is a preset value a₂And a is a₁+a₂1. The matching degree of each word segmentation result is C ═ A₁×a₁+A₂×a₂。

S203: and outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data.

If the topic data is subjected to word segmentation processing, M word segmentation results are obtained, and the matching degrees of the M word segmentation results are respectively C₁、C₂…、C_mThen is at C₁、C₂…、C_mSelecting one value C with the largest value_iAnd C is_lAnd outputting each word segmentation in the corresponding word segmentation result as a keyword corresponding to the topic data. Wherein m is an integer greater than 1, and i is less than or equal to m.

In the embodiment of the invention, because the two factors of the average number of the word segmentation characters and the total number of the word segmentation have larger influence on the word segmentation result, whether the user can determine the actual semantics of the topic data can be determined, the keyword is measured by weighting the average number of the word segmentation characters and the total number of the word segmentation and taking the weighted value as the matching degree of the word segmentation result, the accuracy and the effectiveness of the keyword selection can be improved, and the event content of the emergent topic can be accurately positioned.

S104: and updating the summary information associated with the topic data according to the key words.

At any moment, the system receives a plurality of pieces of topic data in an accumulation mode, and after determining the key words of each piece of topic data, the system regenerates abstract information for describing all the topic data which are received in the accumulation mode currently, so that a user can clearly know the rough content of the burst topic at the current moment based on the abstract information.

The keywords have a decisive characteristic of the topic data, and in order to generate summary information associated with all the currently accumulated and received topic data, the accumulated word frequency of each keyword in each topic data may be counted, so as to generate the summary information according to the keywords with the accumulated word frequency greater than the threshold value. The abstract information generation tool in the TextRank algorithm or the word tool may be used to generate the abstract information associated with the topic data and the keywords.

Preferably, as an embodiment of the present invention, as shown in fig. 3, the S104 specifically includes:

s301: and respectively acquiring the accumulated word frequency of each keyword, and calculating the increase acceleration of the accumulated word frequency, wherein the accumulated word frequency of the keyword represents the accumulated times of occurrence of the keyword in all the topic data acquired at the current moment.

In the embodiment of the present invention, the cumulative word frequency of a keyword indicates the number of occurrences of the keyword in all the currently and cumulatively received topic data. Since the system is in a state of continuously acquiring topic data, the cumulative word frequency of the same keyword is continuously increased. If the system detects that the cumulative word frequency of the keyword a increases by Δ S within the time period Δ T, the rate of increase of the cumulative word frequency of the keyword a is V ═ Δ S/Δ T, and the rate of increase a of the cumulative word frequency is a partial derivative of the rate of increase V with respect to time, i.e., a ═ V' (T). The larger the growth acceleration is, the more times the keyword appears in the topic data in a unit time length is, and the higher the topic burstiness is.

S302: and adding the growth acceleration corresponding to each keyword into a pre-generated matrix.

Every time new topic data is received, the system determines the keywords of the topic data and the increase acceleration of the accumulated word frequency of the keywords. If there are K keywords of the topic data, K growth accelerations will be obtained. If the number of the growth acceleration accumulated by the system is P (P is larger than or equal to K, N belongs to Z), the matrix is expanded into a matrix of P multiplied by P, and the K growth accelerations obtained in real time are added into the matrix of P multiplied by P. In addition to containing P growth accelerations, the P matrix also includes null values.

S303: and calculating the characteristic value of the matrix at the current moment, and determining the growth acceleration which is greater than a second threshold value from the matrix when the characteristic value is greater than a first threshold value.

The system monitors each incremental acceleration in the matrix to detect the eigenvalues of the matrix in real time. As the number of the topic data obtained by accumulation is more and more, the size of the matrix and the total number of the increasing accelerations included in the matrix are also continuously changed, and therefore the eigenvalue of the matrix is increased. When the characteristic value is greater than a preset first threshold value, the system locates one or more increasing accelerations with values greater than a second threshold value from among the increasing accelerations included in the matrix.

As an embodiment of the present invention, as shown in fig. 4, the step S303 specifically includes:

s401: dividing the increasing acceleration in the matrix at the current moment into N groups, and mapping the increasing acceleration of each group into a sub-matrix.

And because the number of the increased acceleration in the matrix is large, in order to improve the positioning speed of the increased acceleration with the numerical value larger than the second threshold value, the matrix is subjected to dimension reduction processing.

Specifically, according to a preset rule, all the growth accelerations present in the matrix are divided into N groups, so that each group contains a smaller number of growth accelerations. Wherein the number of increasing accelerations in each group may be the same or different. And mapping a plurality of increasing accelerations contained in each group into a sub-matrix. Therefore, when the number of the groups is B, the number of the sub-matrices is also B. Under the condition that the topic data is gradually increased, the increasing acceleration obtained by each updating is also mapped into the B sub-matrixes respectively.

S402: and calculating the characteristic value of each sub-matrix, and screening the growth acceleration which is greater than a second threshold value from the sub-matrices when the characteristic value of the sub-matrix is greater than a fourth threshold value.

And calculating the characteristic value of each submatrix, and if the characteristic values of any plurality of the submatrixes in the B submatrixes are all larger than a preset fourth threshold value, screening out the growth accelerations larger than the second threshold value from the submatrixes with the characteristic values larger than the fourth threshold value respectively.

In the embodiment of the invention, because the number of the increasing accelerations in the sub-matrix is greatly less than that of the increasing accelerations in the matrix, the increasing accelerations larger than the second threshold value can be quickly positioned from the corresponding sub-matrix by respectively calculating the characteristic values of the sub-matrices under the condition that the characteristic values are larger than the fourth threshold value, thereby improving the detection efficiency of the unexpected topics.

S304: and screening the topic data containing the participle from all the obtained topic data according to the determined participle corresponding to each growth acceleration.

Each growth acceleration in the matrix or the sub-matrix corresponds to a keyword, and each keyword is one of the participles in the participle result with the maximum matching degree in the topic data, so the system can query the participles corresponding to each growth acceleration with the numerical value larger than the second threshold value according to the pre-stored mapping relation table of the growth acceleration and the participles. If the number of the increasing accelerations is L, which is larger than the second threshold value, the number of the inquired participles is also L.

The system sequentially screens each piece of topic data which is acquired at the current moment, and judges whether each piece of topic data contains the L participles. If a topic data includes the L segmented words, the system filters the topic data and performs step S305 on the topic data.

S305: and performing word segmentation processing on the topic data containing the word segmentation again, and calculating the word frequency characteristic value of each word segmentation obtained after the word segmentation processing.

And for each piece of screened topic data, the system carries out word segmentation processing on the topic data again. The word segmentation process may use various existing word segmentation algorithms, including but not limited to a word segmentation algorithm based on string matching, a word segmentation algorithm based on statistics, and the like. And after the word segmentation is finished, a plurality of word segments of the topic data are obtained again. In order to distinguish between the segmentation word obtained in S102 and the segmentation word obtained in S305, the segmentation word obtained in S102 is referred to as a first segmentation word, and the segmentation word obtained in S305 is referred to as a second segmentation word. The first participle and the second participle may be the same or different. In order to further screen out second participles with large influence degrees on the abstract information, the word frequency characteristic value of each second participle is calculated based on the word frequency characteristic quantity of each second participle. These word frequency feature quantities include, but are not limited to, word frequency, inverse file frequency (term-TF), and the like.

As an embodiment of the present invention, as shown in fig. 5, the S305 specifically includes:

s501: and performing word segmentation processing on the topic data containing the word segmentation again to obtain a plurality of word segmentations.

S502: and respectively calculating the statistical word frequency and the reverse file frequency corresponding to each participle obtained after the participle processing in all the topic data obtained at the current moment.

In the embodiment of the invention, the frequency of each second participle appearing in the screened topic data is calculated, and the counted frequency of the second participle is the counted word frequency F_TF. If the total number of the screened topic data is X, wherein the topic data containing a certain second participle is X '(X' is less than or equal to X, N belongs to Z), the reverse file frequency F of the second participle_IDFIs composed of

S503: and weighting the statistical word frequency and the reverse file frequency of each participle to output a word frequency characteristic value of the participle.

Statistical word frequency F_TFThe corresponding weighting coefficient is a preset value a₃Reverse file frequency F_IDFThe corresponding weighting coefficient is a preset value a₄And a is a₃+a₄1. The word frequency characteristic value of each second participle is F ═ F_TF×a₃+F_IDF×a₄。

In the embodiment of the invention, the word frequency characteristic value of each second participle can be calculated based on the self-defined weighting coefficient according to the TF and IDF value of each second participle, so that the importance degree of each second participle can be quantitatively compared on a plurality of pieces of screened topic data by comprehensively considering the TF-IDF value of each second participle.

S306: and outputting the participles with the word frequency characteristic value larger than a third threshold value as high-frequency words, and performing connection processing on the high-frequency words through a budget algorithm to obtain the abstract information containing the high-frequency words.

And determining each second participle of which the word frequency characteristic value F is greater than a preset third threshold value, wherein the second participles are high-frequency words appearing in the topic data. And connecting the high-frequency words by using the TextRank algorithm, the abstract information generation tool in the word tool, other self-defined algorithms and the like to obtain the topic data and the abstract information associated with the high-frequency words.

S105: and displaying the key words and the abstract information so as to enable a user to know the burst topics at the current moment.

And the system displays the keywords acquired in real time and the updated summary information. In practical situations, only when the topic data is a sudden topic, the increase acceleration of the accumulated word frequency of each keyword is larger than a threshold value, and the summary information is updated, so that the real content of the text content displayed in real time by the system has higher similarity with the real content of the sudden topic event, and has a certain reference value.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 6 shows a schematic diagram of the burst topic detection device provided in the embodiment of the present invention, corresponding to the burst topic detection method described in the above embodiment, and for convenience of description, only the relevant parts to the embodiment of the present invention are shown.

Referring to fig. 6, the apparatus includes:

and the obtaining module 61 is configured to continuously obtain topic data in the information sharing platform.

And the matching module 62 is configured to, when each topic data is obtained, perform matching processing on the topic data and each word in a preset word bank to output multiple word segmentation results.

And an output module 63, configured to output, as the keyword corresponding to the topic data, the multiple participles included in the participle result with the highest matching degree.

And the updating module 64 is used for updating the summary information associated with the topic data according to the keyword.

And the display module 65 is configured to display the keyword and the summary information, so that the user can know the outburst topic at the current time.

Optionally, the update module 64 includes:

and the first calculation submodule is used for respectively acquiring the accumulated word frequency of each keyword and calculating the increase acceleration of the accumulated word frequency, wherein the accumulated word frequency of the keyword represents the accumulated times of occurrence of the keyword in all the acquired topic data at the current moment.

And the adding submodule is used for adding the growth acceleration corresponding to each keyword into a pre-generated matrix.

And the determining submodule is used for calculating the characteristic value of the matrix at the current moment, and determining the growth acceleration which is greater than a second threshold value from the matrix when the characteristic value is greater than the first threshold value.

And the screening submodule is used for screening the topic data containing the participle from all the obtained topic data according to the determined participle corresponding to each growth acceleration.

And the word segmentation sub-module is used for carrying out word segmentation processing on the topic data containing the word segmentation again and calculating the word frequency characteristic value of each word segmentation obtained after the word segmentation processing.

And the first output sub-module is used for outputting the participles with the word frequency characteristic value larger than a third threshold value as high-frequency words, and performing connection processing on the high-frequency words through a budget algorithm to obtain the abstract information containing the high-frequency words.

Optionally, the determining sub-module is specifically configured to:

dividing each increasing acceleration in the matrix at the current moment into N groups, and mapping the increasing acceleration of each group into a sub-matrix;

calculating the characteristic value of each sub-matrix, and screening out the growth acceleration which is greater than a second threshold value from the sub-matrices when the characteristic value of the sub-matrix is greater than a fourth threshold value;

wherein N is an integer greater than 1.

Optionally, the word segmentation sub-module is specifically configured to:

performing word segmentation processing on the topic data containing the word segmentation again to obtain a plurality of word segmentations;

respectively calculating the statistical word frequency and the reverse file frequency corresponding to each participle obtained after the participle processing in all the topic data obtained at the current moment;

and weighting the statistical word frequency and the reverse file frequency of each participle to output a word frequency characteristic value of the participle.

Optionally, the output module 63 includes:

and the second calculation sub-module is used for calculating the average number of the word segmentation characters of each word segmentation result according to the total number of the characters corresponding to each word segmentation in each word segmentation result and the total number of the words corresponding to each word segmentation result.

And the weighting submodule is used for weighting the word segmentation character average number and the word segmentation total number corresponding to each word segmentation result so as to output the matching degree of each word segmentation result.

And the second output sub-module is used for outputting a plurality of participles contained in the participle result with the highest matching degree as the keywords corresponding to the topic data.

Fig. 7 is a schematic diagram of a sudden topic detection device provided by an embodiment of the present invention. As shown in fig. 7, the sudden topic detection apparatus 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72, such as a sudden topic detection program, stored in the memory 71 and executable on the processor 70. The processor 70, when executing the computer program 72, implements the steps in the various embodiments of the burst topic detection method described above, such as the steps 101-105 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 61 to 65 shown in fig. 6.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program 72 in the sudden topic detection device 7. For example, the computer program 72 may be divided into an acquisition module, a matching module, an output module, an update module, and a presentation module, and the specific functions of each module are as follows:

the acquisition module is used for continuously acquiring topic data in the information sharing platform.

And the matching module is used for matching the topic data with each word in a preset word bank when each topic data is obtained so as to output various word segmentation results.

The output module is used for outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data.

And the updating module is used for updating the summary information associated with the topic data according to the key words.

And the display module is used for displaying the key words and the abstract information so as to enable a user to know the burst topics at the current moment.

The sudden topic detection device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. Those skilled in the art will appreciate that fig. 7 is merely an example of the unexpected topic detection device 7, and does not constitute a limitation of the unexpected topic detection device 7, and may include more or less components than those shown, or combine some components, or different components, for example, the unexpected topic detection device may also include an input-output device, a network access device, a bus, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 71 may be an internal storage unit of the sudden topic detection device 7, such as a hard disk or a memory of the sudden topic detection device 7. The memory 71 may also be an external storage device of the sudden topic detection device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the sudden topic detection device 7. Further, the memory 71 may also include both an internal storage unit of the sudden-topic detection device 7 and an external storage device. The memory 71 is used to store the computer program and other programs and data required by the sudden topic detection apparatus. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for detecting a burst topic is characterized by comprising the following steps:

continuously acquiring topic data in the information sharing platform;

displaying the key words and the abstract information so that a user can know the burst topic at the current moment;

the updating the summary information associated with the topic data according to the keyword comprises:

respectively acquiring the accumulated word frequency of each keyword, and calculating the increase acceleration of the accumulated word frequency, wherein the accumulated word frequency of the keyword represents the accumulated times of occurrence of the keyword in all the topic data acquired at the current moment;

adding the growth acceleration corresponding to each keyword into a pre-generated matrix;

calculating a characteristic value of the matrix at the current moment, and determining an increasing acceleration which is greater than a second threshold value from the matrix when the characteristic value is greater than a first threshold value; the first threshold is a threshold set for an eigenvalue of the matrix;

screening topic data containing the participle from all the obtained topic data according to the determined participle corresponding to each growth acceleration;

performing word segmentation processing on the topic data containing the word segmentation again, and calculating the word frequency characteristic value of each word segmentation obtained after the word segmentation processing;

and outputting the participles with the word frequency characteristic value larger than a third threshold value as high-frequency words, and performing connection processing on the high-frequency words through a budget algorithm to obtain the abstract information containing the high-frequency words.

2. The method for detecting the burst topic according to claim 1, wherein the calculating the eigenvalue of the matrix at the current moment, and when the eigenvalue is greater than a first threshold, determining an increase acceleration greater than a second threshold from the matrix comprises:

wherein N is an integer greater than 1; the fourth threshold is a threshold set for an eigenvalue of the submatrix.

3. The method for detecting a sudden topic according to claim 1, wherein the step of performing segmentation processing again on the topic data including the segmentation word and calculating a word frequency feature value of each segmentation word obtained after the segmentation processing comprises:

4. The method for detecting a sudden topic according to claim 1, wherein the outputting a plurality of segmented words included in the segmented word result with the highest matching degree as the keyword corresponding to the topic data includes:

calculating the average number of word segmentation characters of each word segmentation result according to the total number of characters corresponding to each word segmentation in each word segmentation result and the total number of words segmentation corresponding to each word segmentation result;

weighting the word segmentation character average number and the word segmentation total number corresponding to each word segmentation result to output the matching degree of each word segmentation result;

and outputting a plurality of word segmentation included in the word segmentation result with the highest matching degree as the keyword corresponding to the topic data.

5. A computer-readable storage medium storing a sudden-topic detection program, wherein the sudden-topic detection program, when executed by at least one processor, implements the steps of the sudden-topic detection method as recited in any one of claims 1-4.

6. A sudden topic detection device, characterized in that the sudden topic detection device comprises a memory, a processor and a sudden topic detection program stored on the memory and operable on the processor, the processor implementing the following steps when executing the sudden topic detection program:

continuously acquiring topic data in the information sharing platform;

the step of updating the summary information associated with the topic data according to the keyword specifically includes:

7. The device for detecting the unexpected topic according to claim 6, wherein the step of calculating the eigenvalue of the matrix at the current time, and when the eigenvalue is greater than a first threshold, determining the growth acceleration greater than a second threshold from the matrix specifically includes:

8. The apparatus for detecting a sudden topic according to claim 6, wherein the step of performing segmentation processing again on the topic data including the segmentation word and calculating a word frequency feature value of each segmentation word obtained after the segmentation processing specifically includes: