CN107943966A - Abnormal individual character decision method and device based on microblogging text - Google Patents

Abnormal individual character decision method and device based on microblogging text Download PDF

Info

Publication number
CN107943966A
CN107943966A CN201711211558.4A CN201711211558A CN107943966A CN 107943966 A CN107943966 A CN 107943966A CN 201711211558 A CN201711211558 A CN 201711211558A CN 107943966 A CN107943966 A CN 107943966A
Authority
CN
China
Prior art keywords
mrow
msup
preset time
microblogging text
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711211558.4A
Other languages
Chinese (zh)
Inventor
孙晓
张陈
丁帅
杨善林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201711211558.4A priority Critical patent/CN107943966A/en
Publication of CN107943966A publication Critical patent/CN107943966A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of abnormal individual character decision method and device based on microblogging text.This method includes:Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;Emotion recognition is carried out to the second default quantity bar microblogging text data using support vector machines and is marked, obtains the 3rd default quantity kind emotion;The microblogging text data with affective tag is counted according to preset time unit, obtains cube;Joint probability density calculating is carried out to cube, obtains the joint probability density value of each cube;When joint probability density value is less than density value threshold value, it is abnormal to judge that emotion of the user in preset time unit occurs.As it can be seen that the corresponding emotion processing of the microblogging text of the public is cube by the present invention, then batch calculates the joint probability density value of cube, and what can be quantified detects abnormal individuals, realizes fairly simple.

Description

Abnormal individual character decision method and device based on microblogging text
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of abnormal individual character decision method based on microblogging text And device.
Background technology
Be currently based on the abnormal individual character decision method of microblogging mainly have it is following two:First method is to be based on user behavior The abnormal individual character of mode excavation user, this method point out that the key issue of abnormality detection is normal use pattern (normal Usage profiles) foundation and how active user's behavior to be compared and judged using the pattern.Behavior pattern Refer to certain regularity embodied during program execution or user's operation, once occur and the conventional normal behaviour pattern of user There is different behaviors or word, then whether the mood for needing to consider the user exception occurs.However, user behavior pattern is dug The method for digging the abnormal individual character of user is more specific, it is necessary to which the Social behaviors pattern conventional to each user is tracked description and builds Mould, then could be compared detection to being likely to occur abnormal behavior, implement more time-consuming.
Second method is to detect abnormal individual character based on the interaction between user on social networks.By to user's microblogging Interaction (thumbing up comment etc.) opening relationships network of text and good friend, that is, a social network diagram, from the mould of emotional interaction The individual for abnormal emotion occur is detected in type figure.The abnormality of user occurs close in social media with his/her friend Correlation, such a method are based on large-scale Twitter data sets, from the social platform of real world, systematically study user's Pressure state and the correlation of social interaction.One group of extremely relevant text, vision and social property are defined first, are then proposed A kind of new mixed model-factor graph models coupling convolutional neural networks, utilize Twitter contents and social interaction's information Stress mornitoring, tests the partially connected (i.e. no triangle joint) of social structure for the user for showing to have abnormal individual character than non-exception User will be higher by 14%, show that friend's social structure of the user of abnormal individual character often connects less, uncomplicated.However, it is based on Interaction on social networks between user come detect abnormal individual character need to excavate it is mutual between user and its related friend It is dynamic, but emotion excacation is often relatively difficult, and social architectural feature unobvious, it is more than partially connected and triangle Shape, may can also be related to the structure of many complexity, be unfavorable for therefrom finding rule and the abnormal individual character of detection.
The content of the invention
For in the prior art the defects of, the present invention provides a kind of abnormal individual character decision method based on microblogging text and Device, for solve that abnormal individual character in the prior art is time-consuming and laborious or emotion excavate in social complicated be unfavorable for finding rule The problem of rule and detection exception individual character.
In a first aspect, an embodiment of the present invention provides a kind of abnormal individual character decision method based on microblogging text, the side Method includes:
Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;
Emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked, is obtained 3rd default quantity kind emotion;
The microblogging text data with affective tag is counted according to preset time unit, obtains cube; The length of the preset time period is the several times of the preset time unit;
Joint probability density calculating is carried out to the cube, obtains the joint probability density of each cube Value;
When joint probability density value is less than density value threshold value, judge that emotion of the user in preset time unit occurs It is abnormal.
Alternatively, the described second default quantity kind emotion is 5 kinds, respectively neutral, happy, surprised, sad and angry, right The label answered is 0,1,2,3 and 4.
Alternatively, the microblogging text data with affective tag is counted according to preset time unit, obtains multidimensional Data set includes:
Classified according to support vector machines to the described second default quantity bar microblogging text data;
For each user in the described first default quantity user, the five dimension data collection of the user are determined.
Alternatively, choosing the density value threshold value includes:
Based on the second default quantity bar microblogging text data, according to the described first default quantity user and it is described default when Between the corresponding whole preset time units of section obtain multiple five dimension datas collection;
Batch calculates the joint probability density of the multiple five dimension datas collection;
The multiple five dimension datas collection is divided into cross validation collection and test set;
Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup reality Test result;
Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
Alternatively, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
Second aspect, an embodiment of the present invention provides a kind of abnormal individual character decision maker based on microblogging text, the dress Put including:
Text data acquisition module, for obtaining the second default quantity of the first default quantity user in preset time period Bar microblogging text data;
Text emotion identification module, for using support vector machines to the described second default quantity bar microblogging text data into Row emotion recognition simultaneously marks, and obtains the 3rd default quantity kind emotion;
Data set statistical module, for uniting according to preset time unit to the microblogging text data with affective tag Meter, obtains cube;The length of the preset time period is the several times of the preset time unit;
Density value computing module, for carrying out joint probability density calculating to the cube, obtains each multidimensional The joint probability density value of data set;
Determination module, for when joint probability density value is less than density value threshold value, judging the user in preset time list Emotion in position occurs abnormal.
Alternatively, the described second default quantity kind emotion is 5 kinds, respectively neutral, happy, surprised, sad and angry, right The label answered is 0,1,2,3 and 4.
Alternatively, the data set statistical module includes:
Microblogging text classification unit, for according to support vector machines to the described second default quantity bar microblogging text data into Row classification;
Data set determination unit, for for each user in the described first default quantity user, determining the use The five dimension data collection at family.
Alternatively, described device further includes density value threshold value acquisition module;The density value threshold value acquisition module includes:
Data set acquiring unit, for presetting quantity bar microblogging text data based on second, according to first present count Amount user and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;
Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;
Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;
Experimental considerations unit, for based on different threshold values, being carried out according to joint probability density function to the cross validation collection real Test, obtain multigroup experimental result;
Density value threshold value determination unit, for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as The density value threshold value of the test set.
Alternatively, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
As shown from the above technical solution, the embodiment of the present invention is by obtaining the first default quantity user in preset time period The second default quantity bar microblogging text data;Using support vector machines to the described second default quantity bar microblogging text data into Row emotion recognition simultaneously marks, and obtains the 3rd default quantity kind emotion;According to preset time unit to the microblogging with affective tag Text data is counted, and obtains cube;The length of the preset time period is the several times of the preset time unit; Joint probability density calculating is carried out to the cube, obtains the joint probability density value of each cube;Work as connection When conjunction probability density value is less than density value threshold value, it is abnormal to judge that emotion of the user in preset time unit occurs.As it can be seen that this The corresponding emotion processing of the microblogging text of the public is cube by invention, and then the joint of batch calculating cube is general Rate density value, what can be quantified detects abnormal individuals, realizes fairly simple.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these figures.
Fig. 1 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated Figure;
Fig. 2 is the disaggregated model that support vector machines provided in an embodiment of the present invention handles microblogging text data;
Fig. 3 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated Figure;
Fig. 4 is multivariate Gaussian distribution process result schematic diagram;
Fig. 5 is one abnormality detection proof diagram of case study on implementation;
Fig. 6 is two abnormality detection proof diagram of case study on implementation;
Fig. 7 is the block diagram of the abnormal individual character decision maker provided in an embodiment of the present invention based on microblogging text.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment, belongs to the scope of protection of the invention.
Fig. 1 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated Figure.As shown in Figure 1, being somebody's turn to do the abnormal individual character decision method based on microblogging text includes:
101, obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;
102, emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked, Obtain the 3rd default quantity kind emotion;
103, the microblogging text data with affective tag is counted according to preset time unit, obtains multidimensional data Collection;The length of the preset time period is the several times of the preset time unit;
104, joint probability density calculating is carried out to the cube, obtains the joint probability of each cube Density value;
105, when joint probability density value is less than density value threshold value, judge emotion of the user in preset time unit Occur abnormal.
As it can be seen that the corresponding emotion processing of the microblogging text of the public is cube by the present invention, then batch calculates more The joint probability density value of dimension data collection, what can be quantified detects abnormal individuals, realizes fairly simple.
With reference to the accompanying drawings and examples to the exception sex determination side provided in an embodiment of the present invention based on microblogging text Each step of method is described in detail.
First, 101 are introduced, obtains the second default quantity bar microblogging text of the first default quantity user in preset time period The step of notebook data.
Above-mentioned preset time period can be one day, January or 1 year etc., and those skilled in the art can be according to specific field Scape is configured, and is not limited thereto.In one embodiment, preset time period is a calendar month.
Above-mentioned first default quantity can be 100,1000,10000,100000 etc., and those skilled in the art can basis Concrete scene is configured, and is not limited thereto.In one embodiment, the first default quantity is 100.
Similarly, the above-mentioned second default quantity can be 100,1000,10000,100000 etc., and those skilled in the art can be with It is configured, is not limited thereto according to concrete scene.In two embodiments, the first default quantity is 10000.
In the embodiment of the present invention, 10000 microblogging texts of 100 users in microblogging are collected.
Secondly, 102 are introduced, emotion knowledge is carried out to the described second default quantity bar microblogging text data using support vector machines The step of not and marking, obtaining the 3rd default quantity kind emotion.
Above-mentioned 3rd default quantity can be 3,4,5 even more, and those skilled in the art can be according to specifically being set Put.In one embodiment, the 3rd default quantity is 5 kinds, i.e., the 3rd default quantity kind emotion can be neutral, happy, surprised, wound The heart and anger.
Using support vector machines to above-mentioned 10000 progress emotion recognitions, i.e. every microblogging in one embodiment of the invention Text data corresponds to neutral, happy, surprised, sad or angry.As shown in Fig. 2, microblogging text of the support vector machines to input Notebook data is arranged, and is divided into 5 types.For convenience of subsequent quantitation calculate, in an embodiment of the present invention using label " 0,1, 2nd, 3,4,5 " above-mentioned 5 kinds of emotions are substituted, i.e., using label " 0,1,2,3,4,5 " respectively mark " it is neutral, happy, surprised, sad and The corresponding microblogging text of anger ".The microblogging text data of 5 class labels is subjected to text vector, carries out feature selecting, then, The weight (TF*IDF) of each feature is calculated, model training is finally carried out and prediction obtains the classification results of microblogging text data.
As it can be seen that training set and test set are extracted from microblogging text by support vector machines in the embodiment of the present invention Vector characteristics, and then emotional semantic classification result is provided to test set, it can be ensured that the accuracy of emotional semantic classification.
Again, 103 are introduced, the microblogging text data with affective tag is counted according to preset time unit, is obtained To cube;The step of length of the preset time period is the several times of the preset time unit.
Above-mentioned preset time unit can be one day, January or 1 year etc., can be the part of preset time period, i.e., in advance If the period can be the several times of preset time, those skilled in the art can be configured according to concrete scene, not make herein Limit.In one embodiment, preset time period is a calendar month.
One embodiment of the invention carries out statistical classification to the microblogging text marked, is used for the described first default quantity Each user in family, determines the five dimension data collection of the user, so as to obtain the microblogging issue feelings of each user in every month Condition.
4th, 104 are introduced, joint probability density calculating is carried out to the cube, obtains each cube Joint probability density value the step of.
Joint probability density value is calculated to five dimension data collection batch in one embodiment of the invention, calculation formula is:
X (k) is five dimension variable datas in formula, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
Finally, 105 are introduced, when joint probability density value is less than density value threshold value, judges the user in preset time list There is abnormal step in emotion in position.
In the present embodiment, according to the joint probability density value of batch calculating, suitable density value threshold value is selected, when joint is general When rate density is less than the density value threshold value, then judge that user feeling occurs different in certain a period of time in the month or the month Often, abnormal user is marked.Also, it can also be examined according to the microblogging text in certain a period of time in the month or the month Whether these users there is abnormal emotion really, so as to improve the accuracy of suggestion mode.
It should be noted that choosing the density value threshold value by following steps in the embodiment of the present invention includes:
Based on the second default quantity bar microblogging text data, according to the described first default quantity user and it is described default when Between the corresponding whole preset time units of section obtain multiple five dimension datas collection;
Batch calculates the joint probability density of the multiple five dimension datas collection;
The multiple five dimension datas collection is divided into cross validation collection and test set;
Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup reality Test result;
Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
Embodiment one
With in May, 2016 public's microblog emotional data instance shown in Fig. 3 in the present embodiment.Identified using support vector machines The emotion of microblogging text data, obtains five dimension datas collection (part) as shown in Table 1.
The emotional semantic classification statistics of 1 user's issuing microblog text of table
Multivariate Gaussian distribution process is carried out to above-mentioned five dimension datas collection in the present embodiment, as shown in Figure 4.Five dimension in present case The calculating process of the joint probability density of data is as follows:
Input:
o Data:D x N arrays, represent the data sample of N number of D dimensions, are the matrix of 21*5 in the present embodiment
o Mu:D x K arrays, represent the average of data set
o Sigma:D x D x K arrays, represent the covariance matrix of data set
Output:
o prob:1 x N arrays, represent the probability density of N number of data point.
The matlab codes for calculating joint probability density are as follows:
Mu=mean (Data, 1) % averages by dimension
Sigma=cov (Data) % seeks matrix covariance
Data=Data'-repmat (Mu', nbData, 1);
% seeks joint probability density
Prob=sum ((Data*inv (Sigma)) .*Data, 2);
Prob=exp (- 0.5*prob)/sqrt ((2*pi) ^nbVar* (abs (det (Sigma))+realmin)).
The present embodiment calculates the joint probability density of above-mentioned five dimension datas collection, such as table according to joint probability density function formula Shown in 2.
2 joint probability density value of table
According to the density value threshold value (4e-05) of setting, abnormal user is marked, as the sign user of table 2, corresponding joint are general Rate density value is 2.13e-06.
Finally verify whether abnormal emotion occur with reference to microblogging text data of the user in May, 2016, such as Fig. 5 institutes Show.
Embodiment two
With in January, 2016 public's microblog emotional data in the present embodiment, microblogging text data is identified using support vector machines Emotion, obtain five dimension data collection (part) as shown in table 3.
The emotional semantic classification statistics of 3 user's issuing microblog text of table
Multivariate Gaussian distribution process is carried out to above-mentioned five dimension datas collection in the present embodiment, as shown in Figure 4.Five dimension in present case The calculating process of the joint probability density of data is as follows:
Input:
o Data:D x N arrays, represent the data sample of N number of D dimensions, present case is the matrix o Mu of 15*5:D x K numbers Group, represents the average of data set
o Sigma:D x D x K arrays, represent the covariance matrix of data set
Output:
o prob:1xN arrays, represent the probability density of N number of data point.
The matlab codes for calculating joint probability density are as follows:
Mu=mean (Data, 1) % averages by dimension
Sigma=cov (Data) % seeks matrix covariance
Data=Data'-repmat (Mu', nbData, 1);
% seeks joint probability density
Prob=sum ((Data*inv (Sigma)) .*Data, 2);
Prob=exp (- 0.5*prob)/sqrt ((2*pi) ^nbVar* (abs (det (Sigma))+realmin));
The present embodiment calculates the joint probability density of above-mentioned five dimension datas collection according to joint probability density code, such as the institute of table 4 Show.
4 joint probability density value of table
According to the density value threshold value (4e-05) of setting, abnormal user is marked, as the sign user of table 4, corresponding joint are general Rate density value is 1.46e-08,1.09e-07 and 7.65e-08.
Finally verify whether abnormal emotion occur with reference to microblogging text data of the user in January, 2016, such as the moon in Fig. 6 Shown in the corresponding content of shadow.
The embodiment of the present invention additionally provides a kind of abnormal individual character decision maker based on microblogging text, as shown in fig. 7, described Device includes:
Text data acquisition module 701, second for obtaining the first default quantity user in preset time period are default Quantity bar microblogging text data;
Text emotion identification module 702, for presetting quantity bar microblogging textual data to described second using support vector machines According to progress emotion recognition and mark, obtain the 3rd default quantity kind emotion;
Data set statistical module 703, for according to preset time unit to the microblogging text data with affective tag into Row statistics, obtains cube;The length of the preset time period is the several times of the preset time unit;
Density value computing module 704, for carrying out joint probability density calculating to the cube, obtains each more The joint probability density value of dimension data collection;
Determination module 705, for when joint probability density value is less than density value threshold value, judging the user in preset time Emotion in unit occurs abnormal.
In one embodiment, the described second default quantity kind emotion is 5 kinds, it is respectively neutral, happy, surprised, sad and Anger, corresponding label are 0,1,2,3 and 4.
In one embodiment, the data set statistical module includes:
Microblogging text classification unit, for according to support vector machines to the described second default quantity bar microblogging text data into Row classification;
Data set determination unit, for for each user in the described first default quantity user, determining the use The five dimension data collection at family.
In one embodiment, described device further includes density value threshold value acquisition module;The density value threshold value acquisition module Including:
Data set acquiring unit, for presetting quantity bar microblogging text data based on second, according to first present count Amount user and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;
Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;
Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;
Experimental considerations unit, for based on different threshold values, being carried out according to joint probability density function to the cross validation collection real Test, obtain multigroup experimental result;
Density value threshold value determination unit, for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as The density value threshold value of the test set.
In one embodiment, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection, Each element i.e. in covariance is the covariance between different components in five dimension datas represented.
It should be noted that the abnormal individual character decision maker provided in an embodiment of the present invention based on microblogging text and above-mentioned side Method is one-to-one relation, and the implementation detail of the above method is equally applicable to above device, and the embodiment of the present invention is no longer to upper The system of stating is described in detail.
In the specification of the present invention, numerous specific details are set forth.It is to be appreciated, however, that the embodiment of the present invention can be with Put into practice in the case of these no details.In some instances, known method, structure and skill is not been shown in detail Art, so as not to obscure the understanding of this description.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe is described in detail the present invention with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:Its according to Can so modify to the technical solution described in foregoing embodiments, either to which part or all technical characteristic into Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme, it should all cover among the claim of the present invention and the scope of specification.

Claims (10)

  1. A kind of 1. abnormal individual character decision method based on microblogging text, it is characterised in that the described method includes:
    Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;
    Emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked, obtains the 3rd Default quantity kind emotion;
    The microblogging text data with affective tag is counted according to preset time unit, obtains cube;It is described The length of preset time period is the several times of the preset time unit;
    Joint probability density calculating is carried out to the cube, obtains the joint probability density value of each cube;
    When joint probability density value is less than density value threshold value, it is different to judge that emotion of the user in preset time unit occurs Often.
  2. 2. exception individual character decision method according to claim 1, it is characterised in that described second, which presets quantity kind emotion, is 5 kinds, respectively neutral, happy, surprised, sad and angry, corresponding label is 0,1,2,3 and 4.
  3. 3. exception individual character decision method according to claim 1, it is characterised in that according to preset time unit to in love The microblogging text data of sense label is counted, and obtaining cube includes:
    Classified according to support vector machines to the described second default quantity bar microblogging text data;
    For each user in the described first default quantity user, the five dimension data collection of the user are determined.
  4. 4. exception individual character decision method according to claim 1, it is characterised in that choosing the density value threshold value includes:
    Based on the second default quantity bar microblogging text data, according to the described first default quantity user and the preset time period Corresponding whole preset time unit obtains multiple five dimension datas collection;
    Batch calculates the joint probability density of the multiple five dimension datas collection;
    The multiple five dimension datas collection is divided into cross validation collection and test set;
    Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup experiment knot Fruit;
    Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
  5. 5. exception individual character decision method according to claim 4, it is characterised in that the joint probability density function uses The following formula represents:
    <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <mi>u</mi> <mo>,</mo> <mi>&amp;Sigma;</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <msup> <mi>&amp;pi;</mi> <mfrac> <mi>n</mi> <mn>2</mn> </mfrac> </msup> <mo>|</mo> <mi>&amp;Sigma;</mi> <msup> <mo>|</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </msup> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>&amp;lsqb;</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&amp;Sigma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>&amp;rsqb;</mo> </mrow> </mrow>
    In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection, association side Difference is the degree that each dimension of measurement deviates its average.
  6. 6. a kind of abnormal individual character decision maker based on microblogging text, it is characterised in that described device includes:
    Text data acquisition module, the second default quantity bar for obtaining the first default quantity user in preset time period are micro- Blog article notebook data;
    Text emotion identification module, for presetting quantity bar microblogging text data into market to described second using support vector machines Perception is other and marks, and obtains the 3rd default quantity kind emotion;
    Data set statistical module, for being counted according to preset time unit to the microblogging text data with affective tag, Obtain cube;The length of the preset time period is the several times of the preset time unit;
    Density value computing module, for carrying out joint probability density calculating to the cube, obtains each multidimensional data The joint probability density value of collection;
    Determination module, for when joint probability density value is less than density value threshold value, judging the user in preset time unit Emotion occur it is abnormal.
  7. 7. exception individual character decision maker according to claim 6, it is characterised in that described second, which presets quantity kind emotion, is 5 kinds, respectively neutral, happy, surprised, sad and angry, corresponding label is 0,1,2,3 and 4.
  8. 8. exception individual character decision maker according to claim 6, it is characterised in that the data set statistical module includes:
    Microblogging text classification unit, for being divided according to support vector machines the described second default quantity bar microblogging text data Class;
    Data set determination unit, for for each user in the described first default quantity user, determining the user's Five dimension data collection.
  9. 9. exception individual character decision maker according to claim 6, it is characterised in that described device further includes density value threshold value Acquisition module;The density value threshold value acquisition module includes:
    Data set acquiring unit, for based on the second default quantity bar microblogging text data, quantity to be preset according to described first User and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;
    Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;
    Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;
    Experimental considerations unit, for based on different threshold values, being tested, being obtained to the cross validation collection according to joint probability density function To multigroup experimental result;
    Density value threshold value determination unit, it is described for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as The density value threshold value of test set.
  10. 10. exception individual character decision maker according to claim 9, it is characterised in that the joint probability density function is adopted Represented with the following formula:
    <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <mi>u</mi> <mo>,</mo> <mi>&amp;Sigma;</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <msup> <mi>&amp;pi;</mi> <mfrac> <mi>n</mi> <mn>2</mn> </mfrac> </msup> <mo>|</mo> <mi>&amp;Sigma;</mi> <msup> <mo>|</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </msup> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>&amp;lsqb;</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&amp;Sigma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mo>;</mo> </mrow>
    In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
CN201711211558.4A 2017-11-28 2017-11-28 Abnormal individual character decision method and device based on microblogging text Pending CN107943966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711211558.4A CN107943966A (en) 2017-11-28 2017-11-28 Abnormal individual character decision method and device based on microblogging text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711211558.4A CN107943966A (en) 2017-11-28 2017-11-28 Abnormal individual character decision method and device based on microblogging text

Publications (1)

Publication Number Publication Date
CN107943966A true CN107943966A (en) 2018-04-20

Family

ID=61950153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711211558.4A Pending CN107943966A (en) 2017-11-28 2017-11-28 Abnormal individual character decision method and device based on microblogging text

Country Status (1)

Country Link
CN (1) CN107943966A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492135A (en) * 2018-10-27 2019-03-19 平安科技(深圳)有限公司 A kind of data checking method and device based on data processing
CN109522556A (en) * 2018-11-16 2019-03-26 北京九狐时代智能科技有限公司 A kind of intension recognizing method and device
CN110597703A (en) * 2018-06-13 2019-12-20 ***通信集团浙江有限公司 Regression testing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
WO2016182156A1 (en) * 2015-05-14 2016-11-17 디투이모션 주식회사 Mobile terminal for detecting abnormal activity and system including same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
WO2016182156A1 (en) * 2015-05-14 2016-11-17 디투이모션 주식회사 Mobile terminal for detecting abnormal activity and system including same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAO SUN 等: "Detecting users’ anomalous emotion using social media for business intelligence", 《JOURNAL OF COMPUTATIONAL SCIENCE》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597703A (en) * 2018-06-13 2019-12-20 ***通信集团浙江有限公司 Regression testing method and device
CN109492135A (en) * 2018-10-27 2019-03-19 平安科技(深圳)有限公司 A kind of data checking method and device based on data processing
CN109492135B (en) * 2018-10-27 2024-03-19 平安科技(深圳)有限公司 Data auditing method and device based on data processing
CN109522556A (en) * 2018-11-16 2019-03-26 北京九狐时代智能科技有限公司 A kind of intension recognizing method and device
CN109522556B (en) * 2018-11-16 2024-03-12 北京九狐时代智能科技有限公司 Intention recognition method and device

Similar Documents

Publication Publication Date Title
US20090193344A1 (en) Community mood representation
Raykov et al. Basic statistics: An introduction with R
CN104077417B (en) People tag in social networks recommends method and system
CN106599226A (en) Content recommendation method and content recommendation system
CN104933622A (en) Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN108256016A (en) Personal abnormal emotion detection method and device based on personal microblogging
CN104636631A (en) Diabetes mellitus probability calculation method based on large data of diabetes mellitus system
CN108616491A (en) A kind of recognition methods of malicious user and system
CN110276456A (en) A kind of machine learning model auxiliary construction method, system, equipment and medium
CN108845986A (en) A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN110379522A (en) A kind of disease popularity trend predicting system and method
CN106202073A (en) Music recommends method and system
CN107943966A (en) Abnormal individual character decision method and device based on microblogging text
CN110502277A (en) A kind of bad taste detection method of code based on BP neural network
CN106055661A (en) Multi-interest resource recommendation method based on multi-Markov-chain model
CN109740655A (en) Article score in predicting method based on matrix decomposition and neural collaborative filtering
CN105354721B (en) Method and device for identifying machine operation behavior
CN115391670B (en) Knowledge graph-based internet behavior analysis method and system
CN105786898B (en) A kind of construction method and device of domain body
CN108280164A (en) A kind of short text filtering and sorting technique based on classification related words
CN103617146B (en) A kind of machine learning method and device based on hardware resource consumption
Zhang et al. Joint monitoring of post-sales online review processes based on a distribution-free EWMA scheme
CN104809104A (en) Method and system for identifying micro-blog textual emotion
CN110209815A (en) A kind of news Users&#39; Interests Mining method of convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180420