CN112148946A - Microblog-based analysis and view display method and system - Google Patents

Microblog-based analysis and view display method and system Download PDF

Info

Publication number
CN112148946A
CN112148946A CN202011035414.XA CN202011035414A CN112148946A CN 112148946 A CN112148946 A CN 112148946A CN 202011035414 A CN202011035414 A CN 202011035414A CN 112148946 A CN112148946 A CN 112148946A
Authority
CN
China
Prior art keywords
microblog
forwarding
text information
user
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011035414.XA
Other languages
Chinese (zh)
Inventor
王天宇
郭凌峰
杨镭
黄北辰
齐婧含
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011035414.XA priority Critical patent/CN112148946A/en
Publication of CN112148946A publication Critical patent/CN112148946A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a microblog-based analysis and view display method, which comprises the following steps: crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, wherein the hotspot data comprises release time, forwarding users, text information and forwarding amount; analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a visual graph according to an analysis result; setting a label for the text information according to the forwarding amount and the forwarding user; segmenting the text information according to a preset weighting algorithm to obtain target keywords; analyzing the target keywords based on a classifier and the labels to obtain influence coefficients, and drawing an early warning graph according to the influence coefficients; and sending the visual graph and the early warning graph to a front end for view display. The invention has the beneficial effects that the comprehensiveness of microblog analysis is improved, so that information monitoring is facilitated.

Description

Microblog-based analysis and view display method and system
Technical Field
The embodiment of the invention relates to the field of data visualization, in particular to a microblog-based analysis and view display method and system.
Background
With the rapid development of the internet, the social networking platform occupies more and more time and attention of people, and all people can publicly publish their own opinions on the social networking platform due to the freedom and openness of the internet, so that the influence of network public opinions on the stable development of the society is greater and greater.
For network monitoring departments, not only a public opinion analysis method is needed, but also a multidimensional public opinion data visualization system is needed to assist the public opinion analysis methods to monitor network public opinions more efficiently, so as to help to take corresponding measures in time.
However, there are several difficulties with network monitoring:
unstructured data processing is difficult: microblogs are a main space for domestic net friend communication, a large amount of texts or user behavior information such as microblogs, comments, forwarding and praise are collected, and characteristic data are difficult to collect and comb;
the public opinion analysis angle is complex and random: the public opinion analysis angle is complex and scattered, the analysis dimension which has positive effect on public opinion management is difficult to refine, and the presentation of a whole set of analysis flow is lacked;
negative opinions are being explored and often lag: the discovery of negative public opinion condition is often lagged behind the fermentation and propagation of public opinion itself, resulting in great rumors in whole depravation.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and a system for analyzing and displaying views based on microblogs, which can perform comprehensive analysis on the microblogs and facilitate information monitoring.
In order to achieve the above object, an embodiment of the present invention provides a microblog-based analysis and view display method, including:
crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, wherein the hotspot data comprises release time, forwarding users, text information and forwarding amount;
analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a plurality of visual graphs according to the analysis result;
setting a corresponding label for the text information according to the forwarding amount and the forwarding user;
segmenting the text information according to a preset weighting algorithm to obtain target keywords;
analyzing the target keywords based on a classifier and the labels to obtain influence coefficients, and drawing an early warning graph according to the influence coefficients;
and sending the plurality of visual images and the early warning image to a front end for view display.
Further, the analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage to draw a plurality of visual graphs according to the analysis result comprises:
counting the forwarding amount of each microblog passage according to a preset time interval, sequencing each microblog passage according to the release time, and drawing a time flow chart;
analyzing the forwarding amount, the text information and the forwarding user of each microblog passage to obtain exposure, a user quality value, a propagation range value and an emotion index corresponding to each microblog passage;
drawing according to the exposure, the user quality value, the propagation range value and the emotion index to obtain an analysis graph;
analyzing the text information and the emotion indexes corresponding to the text information to obtain a detonation index, generating early warning information according to the detonation index, and drawing a detonation propagation diagram, wherein the plurality of visual diagrams comprise the time flow diagram, the analysis diagram and the detonation propagation diagram.
Further, the analyzing the forwarding amount of each microblog passage and the forwarding user to obtain a user quality value corresponding to each microblog passage includes:
Acquiring forwarding users of each microblog article, and acquiring non-noise users according to the number of fans of the forwarding users;
calculating the number ratio of non-noise users in the forwarding users;
acquiring the daily forwarding amount of the non-noise user;
and calculating the user quality value of each microblog passage according to the ratio of the daily forwarding amount to the quantity.
Further, the analyzing the forwarding user of each microblog passage to obtain a corresponding propagation range value of each microblog passage comprises
And counting the propagation range value of each microblog passage according to the target position of the non-noise user.
Further, the analyzing the text information of each microblog passage to obtain the corresponding sentiment index of each microblog passage comprises:
acquiring text information of each microblog passage;
calculating a similarity value between each piece of text information according to a similarity algorithm, and removing repeated text information according to the similarity value to obtain a real text;
performing natural language processing on the real text to obtain a plurality of emotion texts;
and calculating the emotional text according to an emotional analysis model to obtain the emotional index of each microblog article.
Further, the setting of the corresponding label for the text message according to the forwarding amount and the forwarding user includes:
when the forwarding amount of the microblog articles is larger than a first preset threshold value and the number of fans of the forwarding user is smaller than a second preset threshold value, setting a first label for the text information corresponding to the microblog articles;
when the forwarding amount of the microblog articles is smaller than a first preset threshold value and the number of fans of the forwarding user is larger than a second preset threshold value, the text information is determined to be second text information, and a second label is set for the text information corresponding to the microblog articles.
Further, the method further comprises:
and uploading the plurality of visualization graphs and the early warning graph to a block chain.
In order to achieve the above object, an embodiment of the present invention provides a microblog-based analysis and view display system, including:
the system comprises a crawling module, a searching module and a display module, wherein the crawling module is used for crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, and the hotspot data comprises release time, forwarding users, text information and forwarding amount;
the first analysis module is used for analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a plurality of visual graphs according to the analysis result;
The setting module is used for setting a corresponding label for the text information according to the forwarding amount and the forwarding user;
the word segmentation module is used for segmenting words of the text information according to a preset weighting algorithm to obtain target keywords;
the second analysis module is used for analyzing the target keywords based on the classifier and the labels to obtain influence coefficients and drawing an early warning graph according to the influence coefficients;
and the display module is used for sending the plurality of visual graphs and the early warning graph to a front end for view display.
In order to achieve the above object, an embodiment of the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the computer program, when executed by the processor, implements the steps of the microblog-based analysis and view displaying method as described above.
To achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor, so as to cause the at least one processor to execute the steps of the microblog-based analysis and view displaying method as described above.
According to the microblog-based analysis and view display method and system provided by the embodiment of the invention, the microblog articles are counted in real time, and the release time, forwarding user, forwarding text and forwarding amount of the microblog articles are analyzed, so that a visual view is drawn, and the user can conveniently check the visual view; and not only present popular content is shown, but also the content which is most likely to generate influence in the future is monitored while the spreading node which has generated wide influence is captured, and an early warning function is provided.
Drawings
Fig. 1 is a flowchart of a first embodiment of a microblog-based analysis and view display method according to the present invention.
Fig. 2 is a flowchart of step S102 according to an embodiment of the present invention.
Fig. 3 is a flowchart of step S102B according to an embodiment of the invention.
Fig. 4 is a flowchart of step S102 according to another embodiment of the present invention.
Fig. 5 is a flowchart of step S104 according to an embodiment of the present invention.
Fig. 6 is a schematic view of program modules of a second microblog-based analysis and view display system according to an embodiment of the invention.
Fig. 7 is a schematic diagram of a hardware structure of a third embodiment of the computer apparatus according to the present invention.
Fig. 8 is a first effect diagram of a microblog-based analysis and view display method according to a first embodiment of the invention.
Fig. 9 is a second effect diagram of the first microblog-based analysis and view display method according to the embodiment of the invention.
Fig. 10 is a third effect diagram of the microblog-based analysis and view display method according to the first embodiment of the invention.
Fig. 11 is a fourth effect diagram of the first microblog-based analysis and view display method according to the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart of steps of a microblog-based analysis and view display method according to a first embodiment of the invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by way of example with the computer device 2 as the execution subject. The details are as follows.
Step S100, crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, wherein the hotspot data comprises release time, forwarding users, text information and forwarding amount.
Specifically, according to the keywords provided by the user, the keywords may be a short description of a certain event, such as: the oral liquid # Shuanghuanglian can inhibit a novel coronavirus #, search the microblog content based on the input keywords, crawl all hot microblogs under the current keyword topic, and crawl the release time, forwarding users, forwarding texts and forwarding amount corresponding to the microblog articles. Some forwarding users forward the microblog articles without text description, and the forwarding text can correspond to a part of the forwarding users.
Step S102, analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a plurality of visual graphs according to the analysis result.
Specifically, analyzing forwarding users participating in each microblog passage discussion includes: the locus: drawing a histogram for the number of people participating in each province according to the province number; sex: drawing a pie chart of the number of people according to male, female and other three categories; the use equipment comprises the following steps: drawing a people number pie chart according to different used equipment; the activity degree: the method comprises the following steps of crawling the microblog release number of each participating user in nearly thirty days: high, middle and low, drawing a people number histogram; vermicelli number grade: crawling the number of fans of each participating user, and dividing into: the histogram is drawn according to the number of people in four levels of less than 1 ten thousand, 1-10 ten thousand, 10-100 ten thousand and more than 100 ten thousand. Segmenting words of text information under each microblog article, counting the frequency of each word to obtain weights, drawing corresponding word clouds, and enabling the text information to comprise all comments and forwarding contents. And displaying the emotional words of the text information of each microblog character, wherein the emotional words comprise: the method comprises the following steps of visually drawing favorite items, happy items, sad items, angry items, disgust items and other six items according to the emotional coefficients of each item in the form of a pie chart.
Exemplarily, referring to fig. 2, the step S102 includes:
step S102A, counting the forwarding amount of each microblog passage according to a preset time interval, sequencing each microblog passage according to the release time, and drawing a time flow chart.
Specifically, all the crawled hot microblogs are ranked based on their release time, the preset time can be five minutes, the forwarding amount is calculated every five minutes, a time flow chart, or a forwarding amount trend chart, is drawn, and the propagation trend of the microblog articles is easy to observe. Referring to fig. 9, all the hot microblogs related to the keyword are displayed on the same time line according to the publishing sequence, the higher the forwarding hot degree is, that is, the more the forwarding times are, the darker the background color of the microblog is, the direction of the arrow indicates the propagation direction of the time, the hot microblog articles that propagate most within the preset time are displayed, and it can be clearly visualized from which microblog the event is published and spread widely.
Step S102B, analyzing the forwarding amount, the text information and the forwarding user of each microblog passage to obtain exposure, user quality value, propagation range value and emotion index corresponding to each microblog passage.
Specifically, the forwarding amount of each microblog passage is analyzed to obtain the exposure corresponding to each microblog passage; analyzing the forwarding text of each microblog article to obtain an emotion index corresponding to each character; and analyzing the forwarding user of each microblog passage to obtain a user instruction value and a propagation range value corresponding to each microblog passage.
Exemplarily, referring to fig. 3, the step S102B includes:
and S102B11, obtaining forwarding users of each microblog passage, and obtaining non-noise users according to the number of fans of the forwarding users.
Specifically, the noise user may be a zombie number, the noise user defining: the number of fans is less than 10, the attention number is greater than 100, the number of microblogs is less than 10, or more than 90% of the microblogs are forwarding microblogs, and non-noise users are not zombie number users.
And step S102B12, calculating the number ratio of non-noise users in the forwarding users.
Specifically, the ratio of the non-noise users is calculated according to the number of the non-noise users and the total number of the forwarding users, and specifically is: the ratio of non-noise users is the number of non-noise users/total number of forwarding users.
And step S102B13, acquiring the daily forwarding amount of the non-noise user.
Specifically, the forwarding amount of each non-noise user for forwarding the microblog every day is crawled, and the average value is obtained to obtain the daily forwarding amount.
Step S102B14, calculating the user quality value of each microblog passage according to the ratio of the daily forwarding amount to the quantity.
Specifically, the user quality value is calculated from two perspectives, including: and if the ratio of the non-noise users is a and the daily forwarding amount of the users is b, the interactive user quality value is 0.5 a +0.5 b.
For example: non-noise user ratio: scoring is carried out according to the proportion of the noise user accounts in all the forwarding users, and the detailed rule is as follows: the occupancy ratio is < 1%: 100 minutes; 1% -3%: 90 minutes; 3% -5%: 80 minutes; 5% -10%: 70 minutes; 10% -20%: 60 minutes; 20% -30%: 50 minutes; 30% -40%: 40 minutes; 40% -50%: 30 minutes; 50% -60%: 20 min; 60% -70%: 10 minutes; more than 70 percent: and 0 point. The number of microblogs released by the user per day: scoring according to the average number of microblogs issued by the user every day, wherein the scoring rules are as follows: 0 strip: 0 minute; 0-0.5 strips: 60 minutes; 0.5-1 strips: 70 minutes; 1-2 strips: 80 minutes; 2-5 strips: 90 minutes; more than 5 pieces: 100 minutes.
Illustratively, the S102B includes
And counting the propagation range value of each microblog passage according to the target position of the non-noise user.
Specifically, according to the province corresponding to the position of the forwarding user, statistics is carried out on how many users in the province have forwarded each microblog passage, and scoring is carried out according to the number of the provinces to obtain a propagation range value: 0-3 provinces: 20 min; 3-5 provinces: 30 minutes; 5-10 provinces: 40 minutes; 10-15 provinces: 50 minutes; 15-20 provinces: 60 minutes; 20-25 provinces: 70 minutes; 25-30 provinces: 80 minutes; 30-34 provinces: 90 minutes; 35 provinces: 100 minutes.
Exemplarily, referring to fig. 4, the step S102B includes:
and step S102B21, acquiring text information of each microblog passage.
Specifically, the text information includes comments and forwarded texts, and is obtained by crawling by a crawler.
And step S102B22, calculating a similarity value between each piece of text information according to a similarity algorithm, and removing repeated text information according to the similarity value to obtain a real text.
Specifically, similarity between each piece of text information is carried out according to similarity algorithms such as cosine similarity and Pearson correlation coefficient, before calculation, vectorization is carried out on the text information through a word2vec model, then similarity value calculation is carried out, and repeated text information with the similarity value larger than a preset threshold value is subjected to de-duplication processing to obtain a real text.
And step S102B23, natural language processing is carried out on the real text to obtain a plurality of emotion texts.
Specifically, data cleaning and word segmentation processing are carried out on the real text, stop words are removed, a plurality of words are obtained, an emotion word bank is preset, an inverted index table can be established in the emotion word bank, and when the plurality of words are inquired to appear in the emotion word bank, the words are determined to be emotion words, and the emotion text is obtained.
And S102B24, calculating the emotion texts according to an emotion analysis model to obtain the emotion index of each microblog passage.
Specifically, after natural language processing is carried out on the forwarding text of each microblog character, positive and negative emotions of the forwarding text are judged by using an emotion analysis model to obtain emotion coefficients, wherein 100 points represent positive index full cases, and 0 point represents negative index full cases. The emotion analysis model can be obtained by training according to a deep learning network.
Step S102C, drawing is carried out according to the exposure, the user quality value, the propagation range value and the emotion index, and an analysis chart is obtained.
Specifically, analyzing forwarding users participating in each microblog passage discussion includes: the locus: drawing a histogram for the number of people participating in each province according to the province number; sex: drawing a pie chart of the number of people according to male, female and other three categories; the use equipment comprises the following steps: drawing a people number pie chart according to different used equipment; the activity degree: the method comprises the following steps of crawling the microblog release number of each participating user in nearly thirty days: high, middle and low, drawing a people number histogram; vermicelli number grade: crawling the number of fans of each participating user, and dividing into: the histogram is drawn according to the number of people in four levels of less than 1 ten thousand, 1-10 ten thousand, 10-100 ten thousand and more than 100 ten thousand. Referring to fig. 11, the emotion coefficient for displaying a hot microblog text includes: love, happiness, sadness, anger, disgust, and other six items, which are visualized in the form of a pie chart. And dividing words of all comments and forwarded contents under the microblog, counting the frequency of each word to obtain a weight, drawing a corresponding word cloud, namely putting the word with a larger weight value in the middle of a view, displaying the word with a larger font, and performing corresponding font adjustment on other words according to the weight value.
Step S102D, analyzing the text information and the emotion indexes corresponding to the text information to obtain a detonation point index, generating early warning information according to the detonation point index, and drawing a detonation point propagation diagram, wherein the plurality of visual diagrams comprise the time flow diagram, the analysis diagram and the detonation point propagation diagram.
Specifically, the propagation path of each microblog character is visualized in the form of a force-directed graph, the forwarded amount of each forwarding user (propagator) is used as the weight of a node, the more the forwarding times are, the larger the node radius (pop index) corresponding to the propagator is, and if one forwarding time is far greater than the propagator of other nodes, the forwarding user (pop point) is highlighted. Referring to fig. 10, if the force directed graph is drawn as a pop propagation graph, the larger the number of times each forwarding micro-blog is forwarded for the second time, the larger the radius (pop index) of the node is.
And step S104, setting a corresponding label for the text message according to the forwarding amount and the forwarding user.
Specifically, in the case of collecting microblog articles under the same topic event, text information which has fewer fans of the forwarding user (below 500) but has a large influence on the forwarded text (the forwarding amount is above 500) and text information which has the same fan level but is not spread (the forwarding amount is below 500) of the forwarding user are labeled with label, wherein the influence is 1 and the influence which is not 0.
Exemplarily, referring to fig. 5, the step S104 includes:
step S104A, when the forwarding amount of the microblog passage is larger than a first preset threshold value and the number of fans of the forwarding user is smaller than a second preset threshold value, setting a first label for the text information corresponding to the microblog passage.
Specifically, the first preset threshold and the second preset threshold may be 500, the first label is 1, text information which is less in number of fans of the forwarding user (below 500) but has a large influence on text forwarding (the forwarding amount is above 500) in the microblog articles is queried, and the first label is labeled to the text information.
Step S104B, when the forwarding amount of the microblog passage is smaller than a first preset threshold value and the number of fans of the forwarding user is larger than a second preset threshold value, determining that the text information is second text information, and setting a second label for the text information corresponding to the microblog passage.
Specifically, the second label is 0, text information which is the same in fan magnitude but is not spread (the forwarding amount is below 500) of the user is forwarded in the microblog article, and the first label is labeled for the text information.
And step S106, performing word segmentation on the text information according to a preset weighting algorithm to obtain a target keyword.
Specifically, the text information is segmented, a TF-IDF weighting algorithm is used for calculating the feature weight of the text information, and the text information with representative features is picked out (if the frequency TF of a certain word in one type of microblogs is high, but the word in the other type of microblogs is less, the word is considered to have good category distinguishing capability) to serve as a target keyword.
And S108, analyzing the target keywords based on the classifier and the label to obtain an influence coefficient, and drawing an early warning graph according to the influence coefficient.
Specifically, after the feature weight is calculated by the TF-IDF, the class probability of the target keyword is estimated by using the joint probability of the feature item and the class based on a naive Bayes classifier, whether the potential influence of the microblog article is possible to exist or not can be monitored in real time, and if the potential influence is predicted to be 1, early warning display is performed. The early warning graph not only shows the spread high-heat microblogs, but also shows the early warning of the forwarded texts which potentially have great influence.
And step S110, sending the plurality of visual images and the early warning image to a front end for view display.
Specifically, referring to fig. 8, the view is sent to the front end for displaying, so that the user can analyze the view conveniently, and the user can analyze the view from multiple aspects, thereby achieving the purpose of monitoring the situation direction. Some inflammatory words are obtained to promote the wave, and when the contents with negative influence transmission potential appear, a public opinion monitoring system is needed to give early warning and visualize. Since the forwarding users (propagators) with the inciting words can not have more fans and the forwarding amount in the early period is not high, but since the words with strong inciting words are carried, the words are likely to be issued at once, the forwarding contents can be monitored and displayed on a time line.
Illustratively, the method further comprises:
and uploading the plurality of visualization graphs and the early warning graph to a block chain.
Specifically, uploading the plurality of visualization graphs and the early warning graph to the block chain can ensure the safety and the fair transparency of the blocks to the user. The user equipment can download the visualization graphs and the warning graph from the blockchain so as to verify whether the visualization graphs and the warning graph are tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Example two
Continuing to refer to fig. 6, a schematic diagram of program modules of a second embodiment of the microblog-based analysis and view display system of the invention is shown. In this embodiment, the microblog-based analysis and view display system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the invention and the microblog-based analysis and view display method described above. The program modules referred to in the embodiments of the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable for describing the execution process of the microblog-based analysis and view display system 20 in a storage medium than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
The crawling module 200 is configured to crawl a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, where the hotspot data includes release time, forwarding users, text information, and forwarding amount.
Specifically, according to the keywords provided by the user, the keywords may be a short description of a certain event, such as: the oral liquid # Shuanghuanglian can inhibit a novel coronavirus #, microblog content search is carried out based on the input keywords, all hot microblogs under the current keyword topic are crawled through a crawler, and release time, forwarding users, forwarding texts and forwarding amount corresponding to the microblog articles are crawled. Some forwarding users forward the microblog articles without text description, and the forwarding text can correspond to a part of the forwarding users.
A first analysis module 202, configured to analyze the forwarding amount, the release time, the text information, and the forwarding user of each microblog passage, so as to draw a plurality of visualization graphs according to an analysis result.
Specifically, analyzing forwarding users participating in each microblog passage discussion includes: the locus: drawing a histogram for the number of people participating in each province according to the province number; sex: drawing a pie chart of the number of people according to male, female and other three categories; the use equipment comprises the following steps: drawing a people number pie chart according to different used equipment; the activity degree: the method comprises the following steps of crawling the microblog release number of each participating user in nearly thirty days: high, middle and low, drawing a people number histogram; vermicelli number grade: crawling the number of fans of each participating user, and dividing into: the histogram is drawn according to the number of people in four levels of less than 1 ten thousand, 1-10 ten thousand, 10-100 ten thousand and more than 100 ten thousand. Segmenting words of text information under each microblog article, counting the frequency of each word to obtain weights, drawing corresponding word clouds, and enabling the text information to comprise all comments and forwarding contents. And displaying the emotional words of the text information of each microblog character, wherein the emotional words comprise: the method comprises the following steps of visually drawing favorite items, happy items, sad items, angry items, disgust items and other six items according to the emotional coefficients of each item in the form of a pie chart.
Illustratively, the first analysis module 202 is further configured to:
and counting the forwarding amount of each microblog passage according to a preset time interval, sequencing each microblog passage according to the release time, and drawing a time flow chart.
Specifically, all the crawled hot microblogs are ranked based on their release time, the preset time can be five minutes, the forwarding amount is calculated every five minutes, a time flow chart, or a forwarding amount trend chart, is drawn, and the propagation trend of the microblog articles is easy to observe.
And analyzing the forwarding amount, the text information and the forwarding user of each microblog passage to obtain the exposure, the user quality value, the propagation range value and the emotion index corresponding to each microblog passage.
Specifically, the forwarding amount of each microblog passage is analyzed to obtain the exposure corresponding to each microblog passage; analyzing the forwarding text of each microblog article to obtain an emotion index corresponding to each character; and analyzing the forwarding user of each microblog passage to obtain a user instruction value and a propagation range value corresponding to each microblog passage.
And drawing according to the exposure, the user quality value, the propagation range value and the emotion index to obtain an analysis graph.
Specifically, analyzing forwarding users participating in each microblog passage discussion includes: the locus: drawing a histogram for the number of people participating in each province according to the province number; sex: drawing a pie chart of the number of people according to male, female and other three categories; the use equipment comprises the following steps: drawing a people number pie chart according to different used equipment; the activity degree: the method comprises the following steps of crawling the microblog release number of each participating user in nearly thirty days: high, middle and low, drawing a people number histogram; vermicelli number grade: crawling the number of fans of each participating user, and dividing into: the histogram is drawn according to the number of people in four levels of less than 1 ten thousand, 1-10 ten thousand, 10-100 ten thousand and more than 100 ten thousand. For the emotion coefficient showing a hot microblog text, the method comprises the following steps: love, happiness, sadness, anger, disgust, and other six items, which are visualized in the form of a pie chart. And performing word segmentation on all comments and forwarding contents under the microblog, counting the frequency of each word to obtain a weight, and drawing a corresponding word cloud.
Analyzing the text information and the emotion indexes corresponding to the text information to obtain a detonation index, generating early warning information according to the detonation index, and drawing a detonation propagation diagram, wherein the plurality of visual diagrams comprise the time flow diagram, the analysis diagram and the detonation propagation diagram.
Specifically, the propagation path of each microblog character is visualized in the form of a force-directed graph, the forwarded amount of each forwarding user (propagator) is used as the weight of a node, the more the forwarding times are, the larger the node radius (pop index) corresponding to the propagator is, and if one forwarding time is far greater than the propagator of other nodes, the forwarding user (pop point) is highlighted. If the force directed graph is drawn as a pop propagation graph, the more times each forwarding microblog is forwarded for the second time, the larger the radius (pop index) of the node is.
A setting module 204, configured to set a corresponding label for the text message according to the forwarding amount and the forwarding user.
Specifically, in the case of collecting microblog articles under the same topic event, text information which has fewer fans of the forwarding user (below 500) but has a large influence on the forwarded text (the forwarding amount is above 500) and text information which has the same fan level but is not spread (the forwarding amount is below 500) of the forwarding user are labeled with label, wherein the influence is 1 and the influence which is not 0.
Illustratively, the setup module 204 is further configured to:
and when the forwarding amount of the microblog articles is larger than a first preset threshold value and the number of fans of the forwarding user is smaller than a second preset threshold value, setting a first label for the text information corresponding to the microblog articles.
Specifically, the first preset threshold and the second preset threshold may be 500, the first label is 1, text information which is less in number of fans of the forwarding user (below 500) but has a large influence on text forwarding (the forwarding amount is above 500) in the microblog articles is queried, and the first label is labeled to the text information.
When the forwarding amount of the microblog articles is smaller than a first preset threshold value and the number of fans of the forwarding user is larger than a second preset threshold value, the text information is determined to be second text information, and a second label is set for the text information corresponding to the microblog articles.
Specifically, the second label is 0, text information which is the same in fan magnitude but is not spread (the forwarding amount is below 500) of the user is forwarded in the microblog article, and the first label is labeled for the text information.
And the word segmentation module 206 is configured to perform word segmentation on the text information according to a preset weighting algorithm to obtain a target keyword.
Specifically, the text information is segmented, and a TF-IDF weighting algorithm is used for picking out a feature representing target text information (if a word has a high occurrence frequency TF in one type of microblog but has a small occurrence frequency TF in another type of microblog, the word is considered to have a good category distinguishing capability) as a target keyword.
And the second analysis module 208 is configured to analyze the target keyword based on the classifier and the tag to obtain an influence coefficient, and draw an early warning graph according to the influence coefficient.
Specifically, after the feature weight is calculated by the TF-IDF, the class probability of the target keyword is estimated by using the joint probability of the feature item and the class based on a naive Bayes classifier, whether the potential influence of the microblog article is possible to exist or not can be monitored in real time, and if the potential influence is predicted to be 1, early warning display is performed. The early warning graph not only shows the spread high-heat microblogs, but also shows the early warning of the forwarded texts which potentially have great influence.
And the display module 210 is configured to send the multiple visualization graphs and the early warning graph to a front end for view display.
Specifically, the view is sent to the front end to be displayed, so that the user can analyze the view conveniently, and the user can analyze the view from multiple aspects to achieve the purpose of monitoring the situation direction. Some inflammatory words are obtained to promote the wave, and when the contents with negative influence transmission potential appear, a public opinion monitoring system is needed to give early warning and visualize. Since the forwarding users (propagators) with the inciting words can not have more fans and the forwarding amount in the early period is not high, but since the words with strong inciting words are carried, the words are likely to be issued at once, the forwarding contents can be monitored and displayed on a time line.
EXAMPLE III
Fig. 7 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown in fig. 7, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a microblog-based analysis and view presentation system 20, which are communicatively connected to each other via a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used for storing an operating system and various application software installed in the computer device 2, such as the program codes of the microblog-based analysis and view display system 20 in the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to run the program code stored in the memory 21 or process data, for example, run the microblog-based analysis and view display system 20, so as to implement the microblog-based analysis and view display method according to the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal via a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like. It is noted that fig. 7 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the microblog-based analysis and view presentation system 20 stored in the memory 21 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
For example, fig. 6 is a schematic diagram of program modules of a second embodiment of implementing the microblog-based analysis and view presentation system 20, in which the microblog-based analysis and view presentation system 20 can be divided into a crawling module 200, a first analysis module 202, a setting module 204, a word segmentation module 206, a second analysis module 208, and a presentation module 210. The program modules referred to in the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable for describing the execution process of the microblog-based analysis and view display system 20 in the computer device 2 than a program. The specific functions of the program modules 100 and 210 have been described in detail in the second embodiment, and are not described herein again.
Example four
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of this embodiment is used for storing a computer program, and when executed by a processor, the method for microblog-based analysis and view display of the first embodiment is implemented.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A microblog-based analysis and view display method is characterized by comprising the following steps:
crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, wherein the hotspot data comprises release time, forwarding users, text information and forwarding amount;
analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a plurality of visual graphs according to the analysis result;
Setting a corresponding label for the text information according to the forwarding amount and the forwarding user;
segmenting the text information according to a preset weighting algorithm to obtain target keywords;
analyzing the target keywords based on a classifier and the labels to obtain influence coefficients, and drawing an early warning graph according to the influence coefficients;
and sending the plurality of visual images and the early warning image to a front end for view display.
2. The analysis and view presentation method of claim 1, wherein the analyzing the forwarding amount, the publishing time, the text information and the forwarding user of each microblog passage to draw a plurality of visual graphs according to the analysis result comprises:
counting the forwarding amount of each microblog passage according to a preset time interval, sequencing each microblog passage according to the release time, and drawing a time flow chart;
analyzing the forwarding amount, the text information and the forwarding user of each microblog passage to obtain exposure, a user quality value, a propagation range value and an emotion index corresponding to each microblog passage;
Drawing according to the exposure, the user quality value, the propagation range value and the emotion index to obtain an analysis graph;
analyzing the text information and the emotion indexes corresponding to the text information to obtain a detonation index, generating early warning information according to the detonation index, and drawing a detonation propagation diagram, wherein the plurality of visual diagrams comprise the time flow diagram, the analysis diagram and the detonation propagation diagram.
3. The method for analyzing and displaying a view of claim 2, wherein the analyzing the forwarding amount and the forwarding users of each microblog passage to obtain the user quality value corresponding to each microblog passage comprises:
acquiring forwarding users of each microblog article, and acquiring non-noise users according to the number of fans of the forwarding users;
calculating the number ratio of non-noise users in the forwarding users;
acquiring the daily forwarding amount of the non-noise user;
and calculating the user quality value of each microblog passage according to the ratio of the daily forwarding amount to the quantity.
4. The analysis and view display method of claim 3, wherein the analyzing the forwarding user of each microblog passage to obtain a corresponding propagation range value of each microblog passage comprises
And counting the propagation range value of each microblog passage according to the target position of the non-noise user.
5. The method for analyzing and displaying a view of claim 1, wherein the analyzing the text information of each microblog passage to obtain an emotion index corresponding to each microblog passage comprises:
acquiring text information of each microblog passage;
calculating a similarity value between each piece of text information according to a similarity algorithm, and removing repeated text information according to the similarity value to obtain a real text;
performing natural language processing on the real text to obtain a plurality of emotion texts;
and calculating the emotional text according to an emotional analysis model to obtain the emotional index of each microblog article.
6. The analysis and view presentation method of claim 5, wherein said setting a corresponding label for said text message according to said forwarding amount and said forwarding user comprises:
when the forwarding amount of the microblog articles is larger than a first preset threshold value and the number of fans of the forwarding user is smaller than a second preset threshold value, setting a first label for the text information corresponding to the microblog articles;
When the forwarding amount of the microblog articles is smaller than a first preset threshold value and the number of fans of the forwarding user is larger than a second preset threshold value, the text information is determined to be second text information, and a second label is set for the text information corresponding to the microblog articles.
7. The analysis and view presentation method of claim 1, wherein said method further comprises:
and uploading the plurality of visualization graphs and the early warning graph to a block chain.
8. A microblog-based analysis and view display system is characterized by comprising:
the system comprises a crawling module, a searching module and a display module, wherein the crawling module is used for crawling a plurality of microblog articles related to keywords and hotspot data corresponding to each microblog article, and the hotspot data comprises release time, forwarding users, text information and forwarding amount;
the first analysis module is used for analyzing the forwarding amount, the release time, the text information and the forwarding user of each microblog passage so as to draw a plurality of visual graphs according to the analysis result;
the setting module is used for setting a corresponding label for the text information according to the forwarding amount and the forwarding user;
the word segmentation module is used for segmenting words of the text information according to a preset weighting algorithm to obtain target keywords;
The second analysis module is used for analyzing the target keywords based on the classifier and the labels to obtain influence coefficients and drawing an early warning graph according to the influence coefficients;
and the display module is used for sending the plurality of visual graphs and the early warning graph to a front end for view display.
9. A computer arrangement, characterized in that the computer arrangement comprises a memory, a processor, the memory having stored thereon a computer program being executable on the processor, the computer program, when being executed by the processor, realizing the steps of the microblog based analysis and view showing method according to any one of the claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program for execution by at least one processor to cause the at least one processor to perform the steps of the microblog-based analysis and view presentation method according to any one of claims 1-7.
CN202011035414.XA 2020-09-27 2020-09-27 Microblog-based analysis and view display method and system Pending CN112148946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011035414.XA CN112148946A (en) 2020-09-27 2020-09-27 Microblog-based analysis and view display method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011035414.XA CN112148946A (en) 2020-09-27 2020-09-27 Microblog-based analysis and view display method and system

Publications (1)

Publication Number Publication Date
CN112148946A true CN112148946A (en) 2020-12-29

Family

ID=73895609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011035414.XA Pending CN112148946A (en) 2020-09-27 2020-09-27 Microblog-based analysis and view display method and system

Country Status (1)

Country Link
CN (1) CN112148946A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076415A (en) * 2021-04-30 2021-07-06 中国平安人寿保险股份有限公司 Article sorting method and device, computer equipment and readable storage medium
CN113723991A (en) * 2021-08-10 2021-11-30 上海原圈网络科技有限公司 Marketing article influence analysis processing method and device
CN117093762A (en) * 2023-07-18 2023-11-21 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076415A (en) * 2021-04-30 2021-07-06 中国平安人寿保险股份有限公司 Article sorting method and device, computer equipment and readable storage medium
CN113076415B (en) * 2021-04-30 2024-04-02 中国平安人寿保险股份有限公司 Article ordering method and device, computer equipment and readable storage medium
CN113723991A (en) * 2021-08-10 2021-11-30 上海原圈网络科技有限公司 Marketing article influence analysis processing method and device
CN113723991B (en) * 2021-08-10 2024-04-19 上海原圈网络科技有限公司 Marketing article influence analysis processing method and device
CN117093762A (en) * 2023-07-18 2023-11-21 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method
CN117093762B (en) * 2023-07-18 2024-02-13 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method

Similar Documents

Publication Publication Date Title
CN106980692B (en) Influence calculation method based on microblog specific events
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN112148946A (en) Microblog-based analysis and view display method and system
EP2866421B1 (en) Method and apparatus for identifying a same user in multiple social networks
CN112711705B (en) Public opinion data processing method, equipment and storage medium
CN106599065A (en) Food safety online public opinion early warning system based on Storm distributed framework
Tajbakhsh et al. Microblogging hash tag recommendation system based on semantic TF-IDF: Twitter use case
Zainol et al. Association analysis of cyberbullying on social media using Apriori algorithm
Fouzia Sayeedunnissa et al. Supervised opinion mining of social network data using a bag-of-words approach on the cloud
US11269928B2 (en) Identification and analysis of cohesive and topic-focused groups of user accounts from user-generated content on electronic communication platforms
CN107809370B (en) User recommendation method and device
CN115033668B (en) Story venation construction method and device, electronic equipment and storage medium
CN110825868A (en) Topic popularity based text pushing method, terminal device and storage medium
CN111259220A (en) Data acquisition method and system based on big data
CN110991742A (en) Social network information forwarding probability prediction method and system
WO2023129339A1 (en) Extracting and classifying entities from digital content items
Tarwani et al. Survey of Cyberbulling Detection on Social Media Big-Data.
CN113032554A (en) Decision making system and computer readable storage medium
CN111026940A (en) Network public opinion and risk information monitoring system and electronic equipment for power grid electromagnetic environment
CN116166910A (en) Social media account vermicelli water army detection method, system, equipment and medium
Tarigan et al. Extraction Opinion of Social Media in Higher Education Using Sentiment Analysis
CN109697260B (en) Virtual currency detection method and device, computer equipment and storage medium
Sandim et al. Journalistic relevance classification in social network messages: an exploratory approach
CN113076450A (en) Method and device for determining target recommendation list
Nguyen et al. Pagerank-based approach on ranking social events: a case study with flickr

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination