CN108052586A - The analysis of public opinion method, system, computer equipment and storage medium - Google Patents

The analysis of public opinion method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN108052586A
CN108052586A CN201711307926.5A CN201711307926A CN108052586A CN 108052586 A CN108052586 A CN 108052586A CN 201711307926 A CN201711307926 A CN 201711307926A CN 108052586 A CN108052586 A CN 108052586A
Authority
CN
China
Prior art keywords
analysis
public
public sentiment
public opinion
sentiment data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711307926.5A
Other languages
Chinese (zh)
Inventor
谢家杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201711307926.5A priority Critical patent/CN108052586A/en
Publication of CN108052586A publication Critical patent/CN108052586A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses one kind to be directed to using internet marketing activity, exchanges the analysis of public opinion method, system, computer equipment and the storage medium of the behavior of great number reward for low cost even zero cost, which includes:S101:According to pre-defined search strategy, searched for by web crawlers and read web page files, public sentiment data is extracted from web page files;S102:The public sentiment data of extraction is filtered, removes junk information;S103:Collating sort is carried out to the public sentiment data after filtering;S104:Public sentiment data in each classification results is analyzed and processed;S105:The analysis of public opinion result obtained by S104 is shown and exported with chart and report form;A kind of the analysis of public opinion system, the analysis of public opinion system include reptile module, filtering module, sort module, analysis module and display module.The present invention improves the accuracy of the precision and the analysis of public opinion of data collection, and by active Perceived Risk, risk point can be effectively treated as early as possible.

Description

The analysis of public opinion method, system, computer equipment and storage medium
Technical field
The present invention relates to the technical fields of information analysis more particularly to a kind of be directed to utilize internet marketing activity, with low Cost even zero cost exchanges the analysis of public opinion method, system, computer equipment and the storage medium of the behavior of great number reward for.
Background technology
Public sentiment monitoring system also referred to as internet public feelings monitoring system, refer to by relevant professional public sentiment software according to Certain rules and methods crawl out the public feelings information paid close attention among information numerous and diverse on internet, and pass through analysis filtering Etc. modes working process finally show the public feelings information to match with demand.In recent years, with the white collar after 80s for representative, to searching Collect preferential advertising campaign, the coin free service of each channel such as major electronic emporium, bank, solid shop/brick and mortar store etc information generate it is dense Interest.Their selectively participation activities, so as to exchange the material benefit on substance for relatively low cost even zero cost.This Behavior is referred to as " ulling up wool ", and pays close attention to and be just referred to as " wool party " with the group for being keen to " ulling up wool ".
With flourishing for internet finance, it is abundant for investor is attracted often to release some incomes that some nets borrow platform Activity, such as authentication registration reward supplement with money and return existing, bid rebating, more flourishing with this parasitic above-mentioned group.A for example, work Dynamic, a bank card can earn 20, if someone there are three bank card, an activity just earns 60, if participating in 3 work daily It is dynamic, it is exactly 200 yuan or so, moreover, 90% activity, which is all invited, also reward, they either invite friend or invite Oneself is small size, can just earn more.
The analysis of public opinion method and system in the industry realize four basic functions mostly at present, are obtained including internet information Take, internet information processing, the analysis of public opinion and aid decision, but in face of complicated internet situation, there are following three sides Planar defect:For the comment in artificial special writing, such as change phonetically similar word, additional character interference, data to crawl difficulty big, go out Situation about disorderly being climbed with junk information is climbed in existing key message leakage;It, can not be existing from what is crawled since the collection of data source is interfered Limitation information in accurately analyze a kind of dynamic tendentiousness;It is professional to the analysis shortcoming of information, it is impossible to effectively for utilization Internet marketing activity, the behavior group of people at high risk for exchanging great number reward for low cost even zero cost carry out risk sense exactly Know.
The content of the invention
Present invention aims at a kind of the analysis of public opinion method, system, computer equipment and the storage medium of proposition, to solve Shortcoming in above-mentioned background technology for the comment in artificial special writing, such as changes phonetically similar word, additional character interference, Data crawl that difficulty is big, key message leakage occur and climb situation about disorderly being climbed with junk information;Since the collection of data source is done It disturbs, can not accurately analyze a kind of dynamic tendentiousness from the existing limitation information crawled;Specialty is short of to the analysis of information Property, it is impossible to it is effectively directed to using internet marketing activity, exchanges the behavior people occurred frequently of great number reward for low cost even zero cost Group carries out risk perceptions exactly.
To achieve these goals, the present invention provides following technical solution:
A kind of the analysis of public opinion method, the analysis of public opinion method, is as follows:
S101:According to pre-defined search strategy, searched for by web crawlers and read web page files, from web page files Middle extraction public sentiment data;
S102:The public sentiment data of extraction is filtered, removes junk information;
S103:Collating sort is carried out to the public sentiment data after filtering, classification type includes source, strong correlation and enlivens personnel It is posted;
S104:Public sentiment data in each classification results is analyzed and processed, including the origin, the public opinion emotion Color, the network disperse state, the development trend, the Regional Distribution information, the age bracket range information and described The focus of attention;
S105:The analysis of public opinion result obtained by step S104 is shown and exported with chart and report form.
Preferably, public sentiment data include network address, title, the time, author, source, text, comment, clicking rate, reply number and Reprinting amount.
Preferably, in the step S102, public sentiment data is filtered including:When triggering preset condition, carriage is judged Feelings data are junk information, and are filtered, wherein, junk information=A | | B | | C | | D, A=Chinese length connect less than 4, B= Continuous English length is more than 15, C=blacklist words, and D=includes symbol * &^% $ #@.
Preferably, step S104, analyzing and processing is carried out to the public sentiment data in each classification results to be included:S401:Analysis Source is crawled, obtains the corresponding origin of the public sentiment data;S402:Emotion point is carried out to the public sentiment data in the statistical unit time Analysis, obtains public opinion emotional color;S403:Each reptile source is analyzed whether comprising the public sentiment event, obtains the institute of the public sentiment event State network disperse state;S404:Keyword frequency of occurrences in the unit of analysis time, the development for obtaining the public sentiment event become Gesture;S405:Analysis participates in the login IP and age information of the user of the public sentiment event, obtains the location of public sentiment event generation Domain distributed intelligence and age bracket range information;S406:The word frequency of occurrences in the unit of analysis time, obtains the focus of attention.
Preferably, in step S402, carrying out sentiment analysis to the public sentiment data in the statistical unit time includes:With reference to dictionary Mode, use the sentiment analysis method based on sentence weighting algorithm.
Preferably, public opinion emotional color includes glad, common or angry, and network disperse state includes diffusion initial stage, diffusion Mid-term or diffusion late period.
Preferably, in the step S105, the chart include pie chart, line chart, column diagram, bar chart, area-graph, In one or several kinds or pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, form in scatter diagram, form Two or more composite diagram being formed by stacking.
Based on identical technical concept, the present invention also provides a kind of the analysis of public opinion system, the analysis of public opinion system includes Reptile module, filtering module, sort module, analysis module and display module.
The reptile module, for according to pre-defined search strategy, being searched for by web crawlers and reading webpage text Part extracts public sentiment data from web page files;
The filtering module is filtered for the public sentiment data to extraction, removes junk information;
The sort module, for carrying out collating sort to the public sentiment data after filtering, classification type includes source, Qiang Xiang Personnel are closed and enliven to be posted;
The analysis module for being analyzed and processed to the public sentiment data in each classification results, obtains the analysis of public opinion As a result, including origin, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information And the focus of attention;
The display module, for showing and exporting the public sentiment passed through step S104 and obtained with chart and report form Analysis result.
Based on identical technical concept, the present invention also provides a kind of computer equipment, including memory and processor, storage Computer-readable instruction is stored in device, when computer-readable instruction is executed by processor so that processor performs above-mentioned public sentiment The step of analysis method.
Based on identical technical concept, the present invention also provides a kind of storage medium for being stored with computer-readable instruction, meters When calculation machine readable instruction is executed by one or more processors so that one or more processors perform above-mentioned the analysis of public opinion method The step of.
Above-mentioned the analysis of public opinion method, system, computer equipment and storage medium according to pre-defined search strategy, lead to It crosses web crawlers to search for and read web page files, public sentiment data is extracted from web page files, the public sentiment data of extraction was carried out Filter includes:When triggering preset condition, the public sentiment data is judged for junk information, and is filtered, wherein, junk information= A | | B | | C | | less than 4, B=, continuously English length is more than 15, C=blacklist words to D, A=Chinese length, and D=includes symbol * & ^% $ #@, remove junk information, to after filtering public sentiment data carry out collating sort, classification type include source, strong correlation and The personnel of enlivening are posted;Public sentiment data in each classification results is analyzed and processed, obtains the analysis of public opinion as a result, including rising Source, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information and the focus of attention; The analysis of public opinion result obtained by step S104 is shown and exported with chart and report form.Compared with prior art, The beneficial effects of the invention are as follows:Increase data collection precision, data formatting is handled, actively increases hot spot vocabulary word frequency Afterwards, the analysis of public opinion accuracy is increased;The tendency of concern is obtained, speech collection is carried out in disorder community forum, perceives people's Concern tendency, mood tendency;Active Perceived Risk, by tracking pageview of posting in forum, public platform, microblogging, newpapers and periodicals Before ranking before 50 member or reply volume ranking of posting 50 member, i.e., using internet marketing activity, with low cost even zero Cost exchanges the action of posting of the behavior core person of great number reward for, and look-ahead utilizes internet marketing activity next time, with Low cost even zero cost exchanges the major event content of the behavior group of people at high risk of great number reward for, carrys out active perception risk, can use up It is early that risk point is effectively treated.
Description of the drawings
Fig. 1 is the flow chart of the analysis of public opinion method of the present invention;
Fig. 2 is the structure diagram of the analysis of public opinion system of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment belongs to the scope of protection of the invention.
As shown in Figure 1, the present invention provides a kind of technical solution:
A kind of the analysis of public opinion method, the analysis of public opinion method, is as follows:
S101:According to pre-defined search strategy, searched for by web crawlers and read web page files, from web page files Middle extraction public sentiment data.
Selected several using internet marketing activity, the behavior for exchanging great number reward for low cost even zero cost (ulls up sheep Hair behavior) the active forum of group of people at high risk, then writes a specific aim reptile, the present embodiment is used using reptile frame of increasing income Be Scrapy frames, searched for by web crawlers and read daily all models and comment, carried from forum in all information Public sentiment data is taken, including network address, title, time, author, source, text, comment, amount of reading and replys number;
Web crawlers automatically captures the program or script of the network information according to certain rule.Extreme saturation website Resource, these resources are grabbed into local, specific method is exactly each effective URL of analyzing web site, and submits Http Request so as to obtain accordingly result, generates local file and corresponding log information.Crawl policy is based primarily upon Hyperlink And it corresponds to existing mapping relations between webpage, crawl policy can be depth-first search strategy, breadth first search plan Summary or illumination scan.
In the present embodiment, increasing income property reptile frame is except Scrapy, can also use PySpider, Nutch, Crawler4j, WebMagic, WebCollector or other increasing income property reptile frames.
Title and body matter, these contents contain the information of entire webpage substantially and urtext data are adopted The emphasis of collection.It is the Homepage Publishing time, convenient when public feelings information occurs, content is retrieved sequentially in time.Netizen joins With information, comment amount, transfer amount, click volume etc. are shown as, can be used for analyzing the attention rate of public sentiment.
In the present embodiment, the public sentiment data of extraction is except network address, title, time, author, source, text, comment, reading Amount and number is replied, can also be other information that can be extracted, such as sweep spacing.
S102:The public sentiment data of extraction is filtered, removes junk information.
The method that public sentiment data is filtered can be included:
When triggering preset condition, judge that public sentiment data is junk information, and be filtered, wherein, junk information=A | | B | | C | | less than 4, B=, continuously English length is more than 15, C=blacklist words to D, A=Chinese length, and D=includes symbol * &^% $#@。
The present embodiment filters out web page contents in the useless letters such as audio, video, the markup language of webpage in itself in text Breath only retains content of text, saves memory space.For being unsatisfactory for the information of call format, it is necessary to its format transformation.This reality Example is applied also for the comment in artificial special writing, such as change phonetically similar word, additional character interference and, the high frequency in blacklist Word is filtered, cleaned that online friend largely uses with phonetically similar word, the nearly word of sound, additional character come watch sound word and, The noise data of high frequency words in blacklist efficiently and correctly obtains the public sentiment data on network.This filter method, effectively It has been directed to using internet marketing activity, exchange the behavior group of people at high risk of great number reward for low cost even zero cost.
For example, writing expression of the online friend on network is very arbitrarily various, number, letter, symbol are mingled in Chinese character; Sentence paragraph expression interruption it is imperfect, there is also largely repeat phrase short sentence, such as somebody can comment on " Zan Zanzan ", “ddddddddddddddd”、“A&B”.Text cleaning is to wash these noise datas.
S103:Collating sort is carried out to the public sentiment data after filtering, classification type includes source, strong correlation and enlivens personnel It is posted;
Source, for distinguishing the issue source of each data;Strong correlation, match sensitive words, stamp whether strong correlation Label;Personnel are enlivened, are accumulated by data, matching enlivens the ID of personnel, stamps the label from the personnel that enliven.
For example, Xiao Zhang's model that one having containing flow keyword ulls up wool behavior in A website orientations then should Information can identify 3 information in advance in the database, from A websites;It is strong correlation data containing default sensitive keys word;It is small The ID opened, which belongs to, enlivens personnel ID, is the content for enlivening personnel's issue.
The need to rely on data of a full dose of the method for the classification crawl work, and statistics draws sensitive keys word and active Personnel ID, can subsequent public sentiment data collect during tagged classification, further increase public sentiment data value.
S104:Public sentiment data in each classification results is analyzed and processed, obtains the analysis of public opinion as a result, including rising Source, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information and the focus of attention.
Public sentiment data in each classification results is analyzed and processed, can be directed to using internet marketing activity, with low Cost even zero cost exchanges the active forum of behavior group of people at high risk of great number reward for, and specialty analysis rule is set to know forum Public sentiment trend, big including what model reply volume, what elite model is, before festivals or holidays posting number whether explodes, model Concern utilizes internet marketing activity, exchanges the type of the behavior of great number reward for low cost even zero cost and perceives New utilizes internet marketing activity, the active platform of the behavior for exchanging great number reward for inexpensive even zero cost.
In one embodiment, the step S104, analyzes and processes the public sentiment data in each classification results, can For internet marketing activity is utilized, the behavior group of people at high risk for exchanging great number reward for inexpensive even zero cost is active to be won Visitor sets specialty analysis rule to know the public sentiment trend of blog, and big including what blog article reply volume, what elite blog article is, Whether the blog article number before festivals or holidays explodes, and blog article concern utilizes internet marketing activity, is exchanged for low cost even zero cost The type of great number reward behavior and perceive it is new utilize internet marketing activity, great number is exchanged for low cost even zero cost The active blog of reward behavior.
In one embodiment, the step S104, analyzes and processes the public sentiment data in each classification results, can For internet marketing activity is utilized, active micro- of behavior group of people at high risk of great number reward is exchanged for low cost even zero cost It is rich, specialty analysis rule is set to know the public sentiment trend of microblogging, big including microblogging reply volume, what top set microblogging is, section is false Whether hair microblogging number a few days ago explodes, and microblogging concern utilizes internet marketing activity, and height is exchanged for low cost even zero cost The type of volume reward behavior and perceive it is new utilize internet marketing activity, great number prize is exchanged for low cost even zero cost Encourage the active microblogging homepage of behavior.
In one embodiment, the step S104, analyzes and processes the public sentiment data in each classification results, can For internet marketing activity is utilized, the active wechat of behavior group of people at high risk of great number reward is exchanged for low cost even zero cost Public platform sets specialty analysis rule to know the public sentiment trend of wechat public platform, big including what public platform article reply volume, Whether the hair public platform article number before festivals or holidays explodes, and public platform concern utilizes internet marketing activity, with low cost even Zero cost exchange for the type of great number reward behavior and perceive it is new utilize internet marketing activity, with even zero one-tenth of low cost Originally the active wechat public platform of great number reward behavior is exchanged for.
S401, analysis crawl source, obtain the corresponding origin of the public sentiment data;
URL is the foundation for judging web page source, and the characteristics of being webpage unique mark using URL analyzes website And statistics.
S402 carries out sentiment analysis to the public sentiment data in the statistical unit time, obtains public opinion emotional color.
It can be specifically the mode with reference to dictionary, use the sentiment analysis method based on sentence weighting algorithm.Obtain public opinion Emotional color includes glad, common or angry;
For example, sentiment dictionary, degree adverb table are preset, summarizes vocabulary, association vocabulary, negative vocabulary etc., is each Word assigns corresponding emotion weights, and the final emotion of sentence is drawn according to word emotion weights.
Whether S403 analyzes each reptile source comprising the public sentiment event, obtains the network disperse state of the public sentiment event.
Network disperse state includes diffusion initial stage, diffusion mid-term or diffusion late period;
At diffusion initial stage, basis for estimation is just to start event body occur in public sentiment data, not in the data of more than half It is found in source;
Mid-term is spread, basis for estimation is that event body, and data occurs in the data source of more than half in public sentiment data Amount is in the phenomenon that rises appreciably;
It spreads late period, basis for estimation is that event body, and data occurs in the data source of more than half in public sentiment data Amount increases phenomenon in almost 0;
Non- especially big utilization internet marketing activity, exchanges the public sentiment event of great number reward for low cost even zero cost, It is not necessarily suitable the diffusion analysis.
For example, the mobile phone traffic activity of giving of certain operator goes wrong, and can be utilized internet marketing activity, with The behavior group of people at high risk that inexpensive even zero cost exchanges great number reward for, which largely ulls up, takes mobile phone flow, and on the ground of A, B, C, D, E Fang Fabu is propagated, it is assumed that the data source of current collection is 8, and the information adds 5 today, more than half in public sentiment data Data source there is event body, and data volume is in the phenomenon that rises appreciably, then is judged to spreading mid-term at this time.
S404, the frequency that keyword occurs in the unit of analysis time, obtains the development trend of the public sentiment event;
Difference between the frequency that starting and end time of one word in special time period occur, obtains the public sentiment thing The development trend of part.If positive number, then public sentiment event shows a rising trend;If negative, then public sentiment event is in reducing tendency; If 0, then public sentiment event is in smooth trend.
S405, analysis participate in the login IP and age information of the user of the public sentiment event, obtain public sentiment event generation The distributed intelligence of place region and age bracket range information.
For example, IP 116.238.88.116, age are 19 years old, and analysis obtains regional information as Shanghai and age model Segment information is enclosed for 19 years old.
S406, the word frequency of occurrences, obtains the focus of attention in the unit of analysis time.
Basis for estimation (excludes unexpected care vocabulary, such as the top10 that word in the unit time occurs:Wechat, mobile phone) Currently to utilize internet marketing activity, the concern heat of the behavior group of people at high risk of great number reward is exchanged for low cost even zero cost Point;
S105:The analysis of public opinion result obtained by step S104 is shown and exported with chart and report form.
Displaying exchanges the behavior group of people at high risk's of great number reward for low cost even zero cost using internet marketing activity During the analysis of public opinion result, some screening rules in analysis module are called, portion is integrally formed and utilizes internet marketing activity, with Low cost even zero cost exchanges the analysis of public opinion report of the behavior group of people at high risk of great number reward for, is put on display in display module.
It shows and exports with chart and report form by the obtained the analysis of public opinion of step S104 as a result, being risen including described Source, the public opinion emotional color, the network disperse state, the development trend, the Regional Distribution information, the age bracket Range information and the focus of attention.
Specifically, chart described in step 105 for pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, Two kinds or two in one or several kinds or pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, form in form Kind or more the composite diagram that is formed by stacking.
Wherein, an illustrative public sentiment state table of comparisons, as shown in table 1.
Table 1:
In the present embodiment, the prediction carried out for the public sentiment temperature trend in the 3-5 days network public-opinion future, prediction miss Difference is small, and prediction effect is good.
Based on identical technical concept, the embodiment of the present invention additionally provides a kind of the analysis of public opinion system, as shown in Fig. 2, should System includes:Reptile module, filtering module, sort module, analysis module and display module.
Specifically, the reptile module, for according to pre-defined search strategy, being searched for and being read by web crawlers Web page files extract public sentiment data from web page files;
Specifically, the filtering module is filtered for the public sentiment data to extraction, removes junk information;
Specifically, the sort module, for carrying out collating sort to the public sentiment data after filtering, classification type includes coming It source, strong correlation and enlivens personnel and is posted;
Specifically, the analysis module for being analyzed and processed to the public sentiment data in each classification results, obtains carriage Mutual affection is analysed as a result, including origin, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket model Enclose information and the focus of attention;
Specifically, the display module passes through what step S104 was obtained for being shown and being exported with chart and report form The analysis of public opinion result.
Based on identical technical concept, the embodiment of the present invention additionally provides a kind of computer equipment, and computer equipment includes Memory, processor and storage on a memory and the computer program that can run on a processor, processor execution computer The step in above-mentioned the analysis of public opinion method is realized during program:According to pre-defined search strategy, searched for simultaneously by web crawlers Web page files are read, public sentiment data is extracted from web page files;The public sentiment data of extraction is filtered, removes junk information; Collating sort is carried out to the public sentiment data after filtering, classification type includes source, strong correlation and enlivens personnel and posted;To each Public sentiment data in classification results is analyzed and processed, and obtains the analysis of public opinion as a result, including origin, public opinion emotional color, network Disperse state, development trend, Regional Distribution information, age bracket range information and the focus of attention;It is shown with chart and report form The analysis of public opinion result obtained with output by step S104.
Specifically, the public sentiment data includes network address, title, time, author, source, text, comment, clicking rate, reply Number and reprinting amount.
Specifically, the step S102, the public sentiment data of extraction is filtered including:When triggering preset condition, sentence The fixed public sentiment data is junk information, and is filtered, wherein, junk information=A | | B | | C | | D, A=Chinese length are less than 4, B=continuously English length be more than 15, C=blacklist words, D=includes symbol * &^% $ #@.
Specifically, the step S104, analyzing and processing is carried out to the public sentiment data in each classification results to be included:
S401:Analysis crawls source, obtains the corresponding origin of the public sentiment data;
S402:Sentiment analysis is carried out to the public sentiment data in the statistical unit time, obtains the public opinion emotional color;
S403:Each reptile source is analyzed whether comprising the public sentiment event, obtains the network diffusion type of the public sentiment event State;
S404:Keyword frequency of occurrences in the unit of analysis time obtains the development trend of the public sentiment event;
S405 analyses participate in the login IP and age information of the user of the public sentiment event, obtain occurring the institute of the public sentiment event State Regional Distribution information and the age bracket range information;
S406:The word frequency of occurrences in the unit of analysis time, obtains the focus of attention.
Further, in the step S402, carrying out the sentiment analysis to the public sentiment data in the statistical unit time includes: With reference to the mode of dictionary, the sentiment analysis method based on sentence weighting algorithm is used.
Further, the public opinion emotional color includes glad, common or angry, and the network disperse state includes diffusion just Phase, diffusion mid-term or diffusion late period.
Specifically, in the step S105, the chart include pie chart, line chart, column diagram, bar chart, area-graph, In one or several kinds or pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, form in scatter diagram, form Two or more composite diagram being formed by stacking.
Based on identical technical concept, the embodiment of the present invention additionally provides a kind of storage for being stored with computer-readable instruction Medium, when which is executed by one or more processors so that one or more processors perform above-mentioned carriage Step in feelings analysis method:According to pre-defined search strategy, searched for by web crawlers and read web page files, from net Public sentiment data is extracted in page file;The public sentiment data of extraction is filtered, removes junk information;To the public sentiment data after filtering Collating sort is carried out, classification type includes source, strong correlation and enlivens personnel and posted;To the public sentiment number in each classification results According to being analyzed and processed, obtain the analysis of public opinion as a result, including origin, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information and the focus of attention;It shows and exports with chart and report form by step S104 The obtained the analysis of public opinion result.
Specifically, the public sentiment data includes network address, title, time, author, source, text, comment, clicking rate, reply Number and reprinting amount.
Specifically, the step S102, the public sentiment data of extraction is filtered including:When triggering preset condition, sentence The fixed public sentiment data is junk information, and is filtered, wherein, junk information=A | | B | | C | | D, A=Chinese length are less than 4, B=continuously English length be more than 15, C=blacklist words, D=includes symbol * &^% $ #@.
Specifically, the step S104, analyzing and processing is carried out to the public sentiment data in each classification results to be included:
S401:Analysis crawls source, obtains the corresponding origin of the public sentiment data;
S402:Sentiment analysis is carried out to the public sentiment data in the statistical unit time, obtains the public opinion emotional color;
S403:Each reptile source is analyzed whether comprising the public sentiment event, obtains the network diffusion type of the public sentiment event State;
S404:Keyword frequency of occurrences in the unit of analysis time obtains the development trend of the public sentiment event;
S405 analyses participate in the login IP and age information of the user of the public sentiment event, obtain occurring the institute of the public sentiment event State Regional Distribution information and the age bracket range information;
S406:The word frequency of occurrences in the unit of analysis time, obtains the focus of attention.
Further, in the step S402, carrying out the sentiment analysis to the public sentiment data in the statistical unit time includes: With reference to the mode of dictionary, the sentiment analysis method based on sentence weighting algorithm is used.
Further, the public opinion emotional color includes glad, common or angry, and the network disperse state includes diffusion just Phase, diffusion mid-term or diffusion late period.
Specifically, in the step S105, the chart include pie chart, line chart, column diagram, bar chart, area-graph, In one or several kinds or pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, form in scatter diagram, form Two or more composite diagram being formed by stacking.
Embodiment described above only expresses the several embodiments of the present invention, and description is more specific and detailed, but simultaneously Cannot the limitation to the scope of the claims of the present invention therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention Protect scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

  1. A kind of 1. the analysis of public opinion method, which is characterized in that the analysis of public opinion method includes:
    S101:According to pre-defined search strategy, searched for by web crawlers and read web page files, carried from web page files Take public sentiment data;
    S102:The public sentiment data of extraction is filtered, removes junk information;
    S103:Collating sort is carried out to the public sentiment data after filtering, classification type includes source, strong correlation and enlivens personnel and sent out Patch;
    S104:Public sentiment data in each classification results is analyzed and processed, obtains the analysis of public opinion as a result, including origin, carriage By emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information and the focus of attention;
    S105:The analysis of public opinion result obtained by step S104 is shown and exported with chart and report form.
  2. 2. the analysis of public opinion method according to claim 1, which is characterized in that the public sentiment data include network address, title, when Between, author, source, text, comment, clicking rate, reply number and reprinting amount.
  3. 3. the analysis of public opinion method according to claim 1, which is characterized in that the step S102, to the public sentiment number of extraction According to be filtered including:
    When triggering preset condition, the public sentiment data is judged for junk information, and is filtered, wherein, junk information=A | | B | | C | | less than 4, B=, continuously English length is more than 15, C=blacklist words to D, A=Chinese length, and D=includes symbol * &^% $#@。
  4. 4. the analysis of public opinion method according to claim 1, which is characterized in that the step S104, to each classification results In public sentiment data carry out analyzing and processing include:
    S401:Analysis crawls source, obtains the corresponding origin of the public sentiment data;
    S402:Sentiment analysis is carried out to the public sentiment data in the statistical unit time, obtains the public opinion emotional color;
    S403:Each reptile source is analyzed whether comprising the public sentiment event, obtains the network disperse state of the public sentiment event;
    S404:Keyword frequency of occurrences in the unit of analysis time obtains the development trend of the public sentiment event;
    S405 analyses participate in the login IP and age information of the user of the public sentiment event, obtain occurring the public sentiment event describedly Domain distributed intelligence and the age bracket range information;
    S406:The word frequency of occurrences in the unit of analysis time, obtains the focus of attention.
  5. 5. the analysis of public opinion method according to claim 4, which is characterized in that in the step S402, during to statistical unit Interior public sentiment data, which carries out the sentiment analysis, to be included:With reference to the mode of dictionary, the emotion based on sentence weighting algorithm is used Analysis method.
  6. 6. the analysis of public opinion method according to claim 4, which is characterized in that the public opinion emotional color includes glad, general Logical or indignation, the network disperse state include diffusion initial stage, diffusion mid-term or diffusion late period.
  7. 7. the analysis of public opinion method according to claim 1, which is characterized in that in the step S105, the chart includes One or several kinds or pie chart, line chart in pie chart, line chart, column diagram, bar chart, area-graph, scatter diagram, form, The composite diagram that two or more in column diagram, bar chart, area-graph, scatter diagram, form is formed by stacking.
  8. 8. a kind of the analysis of public opinion system, it is characterised in that:The analysis of public opinion system includes reptile module, filtering module, classification Module, analysis module, display module;
    The reptile module for the pre-defined search strategy of basis, is searched for by web crawlers and reads web page files, from Public sentiment data is extracted in web page files;
    The filtering module is filtered for the public sentiment data to extraction, removes junk information;
    The sort module, for after filtering public sentiment data carry out collating sort, classification type include source, strong correlation and The personnel of enlivening are posted;
    The analysis module, for being analyzed and processed to the public sentiment data in each classification results, obtain the analysis of public opinion as a result, Including origin, public opinion emotional color, network disperse state, development trend, Regional Distribution information, age bracket range information and concern Hot spot;
    The display module, for showing and exporting the analysis of public opinion for passing through step S104 and obtaining with chart and report form As a result.
  9. 9. a kind of computer equipment including memory and processor, is stored with computer-readable instruction in the memory, described When computer-readable instruction is performed by the processor so that the processor is performed as any one of claim 1 to 7 The step of the analysis of public opinion method.
  10. 10. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more When device performs so that one or more processors perform the step of the analysis of public opinion method as any one of claim 1 to 7 Suddenly.
CN201711307926.5A 2017-12-11 2017-12-11 The analysis of public opinion method, system, computer equipment and storage medium Pending CN108052586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711307926.5A CN108052586A (en) 2017-12-11 2017-12-11 The analysis of public opinion method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711307926.5A CN108052586A (en) 2017-12-11 2017-12-11 The analysis of public opinion method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108052586A true CN108052586A (en) 2018-05-18

Family

ID=62123476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711307926.5A Pending CN108052586A (en) 2017-12-11 2017-12-11 The analysis of public opinion method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108052586A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109030480A (en) * 2018-08-16 2018-12-18 湖南友哲科技有限公司 Sample analysis method, device, readable storage medium storing program for executing and computer equipment
CN109446394A (en) * 2018-09-27 2019-03-08 武汉大学 For network public-opinion event based on modular public sentiment monitoring method and system
CN110069686A (en) * 2019-03-15 2019-07-30 平安科技(深圳)有限公司 User behavior analysis method, apparatus, computer installation and storage medium
CN110097250A (en) * 2019-03-20 2019-08-06 平安直通咨询有限公司上海分公司 Product risks prediction technique, device, computer equipment and storage medium
CN110750636A (en) * 2018-07-04 2020-02-04 百度在线网络技术(北京)有限公司 Network public opinion information processing method and device
CN111581500A (en) * 2020-04-24 2020-08-25 贵州力创科技发展有限公司 Network public opinion-oriented data distributed directional storage method and device
CN111639183A (en) * 2020-05-19 2020-09-08 民生科技有限责任公司 Financial industry consensus public opinion analysis method and system based on deep learning algorithm
CN111784492A (en) * 2020-07-10 2020-10-16 讯飞智元信息科技有限公司 Public opinion analysis and financial early warning method, device, electronic equipment and storage medium
CN113254746A (en) * 2021-05-24 2021-08-13 华北科技学院(中国煤矿安全技术培训中心) Online public opinion shows system based on raspberry group
CN113449169A (en) * 2021-09-01 2021-09-28 广州越创智数信息科技有限公司 Public opinion data acquisition method and system based on RPA
CN116861058A (en) * 2023-09-04 2023-10-10 浪潮软件股份有限公司 Public opinion monitoring system and method applied to government affair field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751511B2 (en) * 2010-03-30 2014-06-10 Yahoo! Inc. Ranking of search results based on microblog data
CN104881417A (en) * 2014-02-28 2015-09-02 深圳市网安计算机安全检测技术有限公司 Public opinion analyzing method and system
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751511B2 (en) * 2010-03-30 2014-06-10 Yahoo! Inc. Ranking of search results based on microblog data
CN104881417A (en) * 2014-02-28 2015-09-02 深圳市网安计算机安全检测技术有限公司 Public opinion analyzing method and system
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李爱萍,邸鹏,段利国: "基于句子情感加权算法的篇章情感分析", 《小型微型计算机***》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750636A (en) * 2018-07-04 2020-02-04 百度在线网络技术(北京)有限公司 Network public opinion information processing method and device
CN109030480B (en) * 2018-08-16 2021-03-19 湖南友哲科技有限公司 Sample analysis method, sample analysis device, readable storage medium and computer equipment
CN109030480A (en) * 2018-08-16 2018-12-18 湖南友哲科技有限公司 Sample analysis method, device, readable storage medium storing program for executing and computer equipment
CN109446394A (en) * 2018-09-27 2019-03-08 武汉大学 For network public-opinion event based on modular public sentiment monitoring method and system
CN110069686A (en) * 2019-03-15 2019-07-30 平安科技(深圳)有限公司 User behavior analysis method, apparatus, computer installation and storage medium
CN110097250A (en) * 2019-03-20 2019-08-06 平安直通咨询有限公司上海分公司 Product risks prediction technique, device, computer equipment and storage medium
CN111581500A (en) * 2020-04-24 2020-08-25 贵州力创科技发展有限公司 Network public opinion-oriented data distributed directional storage method and device
CN111639183A (en) * 2020-05-19 2020-09-08 民生科技有限责任公司 Financial industry consensus public opinion analysis method and system based on deep learning algorithm
CN111639183B (en) * 2020-05-19 2023-11-28 民生科技有限责任公司 Financial co-industry public opinion analysis method and system based on deep learning algorithm
CN111784492A (en) * 2020-07-10 2020-10-16 讯飞智元信息科技有限公司 Public opinion analysis and financial early warning method, device, electronic equipment and storage medium
CN113254746A (en) * 2021-05-24 2021-08-13 华北科技学院(中国煤矿安全技术培训中心) Online public opinion shows system based on raspberry group
CN113254746B (en) * 2021-05-24 2023-07-18 华北科技学院(中国煤矿安全技术培训中心) Internet public opinion display system based on raspberry group
CN113449169A (en) * 2021-09-01 2021-09-28 广州越创智数信息科技有限公司 Public opinion data acquisition method and system based on RPA
CN116861058A (en) * 2023-09-04 2023-10-10 浪潮软件股份有限公司 Public opinion monitoring system and method applied to government affair field
CN116861058B (en) * 2023-09-04 2024-04-12 浪潮软件股份有限公司 Public opinion monitoring system and method applied to government affair field

Similar Documents

Publication Publication Date Title
CN108052586A (en) The analysis of public opinion method, system, computer equipment and storage medium
Siering et al. Detecting fraudulent behavior on crowdfunding platforms: The role of linguistic and content-based cues in static and dynamic contexts
Parekh et al. Studying jihadists on social media: A critique of data collection methodologies
CN104899508B (en) A kind of multistage detection method for phishing site and system
CN106980692A (en) A kind of influence power computational methods based on microblogging particular event
Mbarek et al. Suicidal Profiles Detection in Twitter.
WO2019178582A1 (en) Contextual content collection, filtering, enrichment, curation and distribution
Chen et al. In the eyes of the beholder: Sentiment and topic analyses on social media use of neutral and controversial terms for covid-19
CN106919661A (en) A kind of affective style recognition methods and relevant apparatus
Shi et al. Online public opinion during the first epidemic wave of COVID-19 in China based on Weibo data
CN106503907B (en) Service evaluation information determination method and server
KR20120108095A (en) System for analyzing social data collected by communication network
Gilardi et al. Social media and policy responses to the COVID‐19 pandemic in Switzerland
EP3014550A1 (en) Assessing value of brand based on online content
Elyashar et al. Detecting clickbait in online social media: You won’t believe how we did it
CN109284389A (en) A kind of information processing method of text data, device
Dong et al. Rumor detection on hierarchical attention network with user and sentiment information
Chen et al. When national identity meets conspiracies: the contagion of national identity language in public engagement and discourse about COVID-19 conspiracy theories
CN110069686A (en) User behavior analysis method, apparatus, computer installation and storage medium
Pappas et al. Extracting informative textual parts from web pages containing user-generated content
Jeong et al. Social community based blog search framework
CN113672818B (en) Method and system for acquiring social media user portraits
Deshmukh et al. Spin-offs in Indian stock market owing to twitter sentiments, commodity prices and analyst recommendations
Tang et al. Emotion analysis platform on chinese microblog
Ruiter et al. Placing M-Phasis on the plurality of hate: a feature-based corpus of hate online

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20180628

Address after: 518052 Room 201, building A, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 200030 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level.

Applicant before: Shanghai Financial Technologies Ltd

TA01 Transfer of patent application right
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1251050

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20180518

RJ01 Rejection of invention patent application after publication