CN107622333A - A kind of event prediction method, apparatus and system - Google Patents

A kind of event prediction method, apparatus and system Download PDF

Info

Publication number
CN107622333A
CN107622333A CN201711064205.6A CN201711064205A CN107622333A CN 107622333 A CN107622333 A CN 107622333A CN 201711064205 A CN201711064205 A CN 201711064205A CN 107622333 A CN107622333 A CN 107622333A
Authority
CN
China
Prior art keywords
data
text data
text
characteristic vector
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711064205.6A
Other languages
Chinese (zh)
Other versions
CN107622333B (en
Inventor
苏萌
刘天旸
高体伟
刘译璟
边蓓蕾
杜晓梦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
Original Assignee
Beijing Baifendian Information Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baifendian Information Science & Technology Co Ltd filed Critical Beijing Baifendian Information Science & Technology Co Ltd
Priority to CN201711064205.6A priority Critical patent/CN107622333B/en
Publication of CN107622333A publication Critical patent/CN107622333A/en
Application granted granted Critical
Publication of CN107622333B publication Critical patent/CN107622333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of event prediction method, apparatus and system.Method includes:Obtain the text data in social network data;Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model is used to determine that the text data corresponds to the probability of suspicious event characterized by the characteristic vector of the text data.The application is by capturing the social network data of magnanimity, and natural language processing is carried out to text data therein, text data is predicted therefrom to find the characteristic vector as key influence factor, and based on effect characteristicses, to reach the purpose of Accurate Prediction suspicious event.

Description

A kind of event prediction method, apparatus and system
Technical field
The application is related to field of computer technology, more particularly to a kind of event prediction method, apparatus and system.
Background technology
With the development of Internet technology, crime and attack of terrorism means are also more and more intelligent.Many terroristic organizations live Jump in internet, with the organizational planning attack of terrorism.
Prior art is usually after being carried out after the generation of the event such as crime and the attack of terrorism according to the analysis to netizen's emotion It is continuous to pacify work.Such as:After the generation of some event, relevant department have studied the public sentiment data being the theme with the event, and base National emotion is analyzed in public sentiment data.But the mode of this " event arranges again after occurring " is can not to prevent event 's.
Accordingly, it is desirable to provide the scheme of dependent event generation can be prevented.
The content of the invention
It is unpredictable for solving prior art that the embodiment of the present application provides a kind of event prediction method, apparatus and system The problem of event occurs.
The embodiment of the present application also provides a kind of event prediction method, including:
Obtain the text data in social network data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the text data Characteristic vector be characterized the probability for determining that the text data corresponds to suspicious event.
Optionally, before the text data in obtaining social network data, in addition to:
Social network data is obtained from social networks;
Non-text data in unstructured data in the social network data is converted into text data.
Optionally, it is described to the text data carry out vectorization processing, obtain feature corresponding to the text data to Amount includes:
Vectorization processing is carried out to the word in the text data, obtains term vector corresponding to institute's predicate;
Term vector corresponding to word in the text data, determine characteristic vector corresponding to the text data.
Optionally, vectorization processing is carried out to the word in the text data, obtains term vector bag corresponding to institute's predicate Include:
Based on the word in text data described in text depth representing model training, the output of text depth representing model is obtained Term vector.
Optionally, term vector corresponding to the word in the text data is obtained corresponding to the text data Characteristic vector includes:
The calculating averaged to term vector corresponding to the word in the text data, and using result of calculation as Characteristic vector corresponding to the text data.
Optionally, inputted using the characteristic vector as feature to before pre-established disaggregated model, in addition to:
Obtain the user behavior data associated in the social network data with the text data;
Feature selecting processing, characteristic variable corresponding to acquisition are carried out to the user behavior data;
Wherein, the characteristic vector is inputted to pre-established disaggregated model as feature includes:
The characteristic vector and the characteristic variable are inputted to pre-established disaggregated model as feature.
Optionally, described to carry out feature selecting processing to the user behavior data, obtaining correlated variables includes:
Determine the variable in the user behavior data;
The variable is scored based on predetermined Method for Feature Selection, to determine the variable to the text data The disturbance degree of corresponding event;
The variable that disturbance degree meets predetermined condition is chosen from the variable in the user behavior data, is become as feature Amount.
Optionally, the predetermined Method for Feature Selection is filtering type Method for Feature Selection, packaging type Method for Feature Selection, integrated It is at least one in formula Method for Feature Selection.
Optionally, the user behavior data includes:Solid data and/or label data, the solid data are used for table Show the set of the data related to text data, the label data is used to represent word pair in text data or text data Data corresponding to the label and label answered.
Optionally, after the prediction result of the disaggregated model output is obtained, in addition to:
The suspicious probability of the entity related to the text data is determined according to the prediction result.
Optionally, inputted using the characteristic vector as feature to before pre-established disaggregated model, in addition to:
Sample data is obtained, the sample data includes:Sample event, and textual data corresponding to the sample event According to and/or user behavior data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;It is and/or right The user behavior data carries out feature selecting processing, obtains characteristic variable corresponding with the user behavior data;
Characterized by characteristic vector corresponding to sample event and/or characteristic variable, disaggregated model is established.
Optionally, the disaggregated model be the disaggregated model based on Bayes, the disaggregated model based on SVMs, It is disaggregated model based on convolutional neural networks, at least one in the disaggregated model based on Recognition with Recurrent Neural Network.
The embodiment of the present application also provides a kind of event prediction device, including:
First acquisition unit, for obtaining the text data in social network data;
First processing units, for carrying out vectorization processing to the text data, obtain corresponding to the text data Characteristic vector;
Second processing unit, used for the characteristic vector to be inputted to pre-established disaggregated model, the disaggregated model In determining that the text data corresponds to the probability of suspicious event characterized by the characteristic vector of the text data.
Optionally, in addition to:
Second acquisition unit, for obtaining the user behavior associated in the social network data with the text data Data;
Wherein, first processing units, it is additionally operable to carry out feature selecting processing to the user behavior data, obtains corresponding Characteristic variable;
The second processing unit, it is additionally operable to as feature input the characteristic vector and the characteristic variable to pre- The disaggregated model of foundation.
The embodiment of the present application also provides a kind of event prediction system, including:Data warehouse, kafka clusters and storm collection Group, wherein:
The data warehouse, for storing social network data, and the producer for the kafka clusters provides social activity Network data;
The kafka clusters, for being pre-processed to the social network data, to extract the social networks number Text data and/or user behavior data in;
The storm clusters, for calling the event prediction device described in claim 13 or 14, with described in consumption Text data and/or user behavior data in kafka clusters, probability of the output corresponding to suspicious event.
The embodiment of the present application also provides a kind of event prediction device, including:Memory and processor, wherein:
Memory, for depositing program;
Processor, for performing the program of the memory storage, and specifically perform:
Obtain the text data in social network data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the text data Characteristic vector be characterized the probability for determining that the text data corresponds to suspicious event.
The embodiment of the present application also provides a kind of computer-readable recording medium, the computer-readable recording medium storage One or more programs, one or more of programs are when the electronic equipment for being included multiple application programs performs so that institute State electronic equipment and perform following methods:
Obtain the text data in social network data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the text data Characteristic vector be characterized the probability for determining that the text data corresponds to suspicious event.
Above-mentioned at least one technical scheme that the embodiment of the present application uses can reach following beneficial effect:
Natural language processing is carried out by capturing the social network data of magnanimity, and to text data therein, with therefrom The characteristic vector as key influence factor, and the input using characteristic vector as disaggregated model are found, to enter to text data Row prediction, reach the purpose of Accurate Prediction suspicious event.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 1 provides;
Fig. 2 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 2 provides;
Fig. 3 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 3 provides;
Fig. 4 is the schematic diagram for the text depth representing model word2vec that the embodiment of the present application 3 provides;
Fig. 5 is the schematic diagram for the Recognition with Recurrent Neural Network RNN that the embodiment of the present application 3 provides;
Fig. 6 is the structural representation for the event prediction device that the embodiment of the present application 4 provides;
Fig. 7 is the structural representation for the event prediction device that the embodiment of the present application 5 provides;
Fig. 8 is the structural representation for the event prediction system that the embodiment of the present application 6 provides;
Fig. 9 is the structural representation for a kind of electronic equipment that the embodiment of the present application 7 provides.
Embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described corresponding accompanying drawing.Obviously, described embodiment is only the application Part of the embodiment, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not having There is the every other embodiment made and obtained under the premise of creative work, belong to the scope of the application protection.
Below in conjunction with accompanying drawing, the technical scheme that each embodiment of the application provides is described in detail.
Embodiment 1
Fig. 1 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 1 provides, referring to Fig. 1, this method Specifically it may include steps of:
Text data in step 120, acquisition social network data;
It should be noted that a kind of implementation of step 120 can be:
First, social network data is captured from social platform, then, the social network data of crawl is converted, The pretreatments such as cleaning, parsing, classification, distinguish structural data and unstructured data therein, and unstructured data In text data.When carrying out event prediction, therefrom obtain corresponding to text data.
Another implementation of step 120 can be:
First, social network data is captured from social platform, then, the data of crawl is converted, clean, solved The pretreatments such as analysis, classification, distinguish the text in structural data and unstructured data, and unstructured data therein Notebook data and non-text data, then, the non-text data such as the picture in unstructured data, audio, video are converted For text data.When carrying out event prediction, therefrom obtain corresponding to text data.Wherein, non-text data is converted into text Technology used in notebook data includes the existing correlation technique by wechat audio identification for word, or, by the captions of video File translations are text data etc..
In addition, it can be wechat, qq, push away spy, facebook etc. for the social platform referred in above two implementation Deng;The instrument for capturing social network data can be web crawlers etc.;Text data can be specially one section of dialogue, text Shelves, one notice etc., correspondingly, text data is corresponding with event, such as.
Step 140, vectorization processing is carried out to the text data, obtain characteristic vector corresponding to the text data;
It should be noted that because characteristic vector is the input of disaggregated model, accordingly, it is determined that can the characteristic vector that go out Represent the prediction result that text data directly influences model output.Based on this, a kind of implementation of step 140 can be:
First, vectorization processing is carried out to the word in the text data, obtains term vector corresponding to institute's predicate;Then, Term vector corresponding to word in the text data obtains characteristic vector corresponding to the text data.
In this implementation, vectorization processing can be for using text depth representing model-word2vec instruments training institute The word in text data is stated, obtains the term vector of text depth representing model output.Then, to the word in the text data Corresponding term vector is averaged, and using the vector of acquisition as characteristic vector corresponding to the text data.Wherein, The core concept of text depth representing model is:By training, the processing to text data is reduced in K gts Vector operation, and the similarity in vector space can be used for representing similarity on text semantic.
Step 160, the characteristic vector inputted to pre-established disaggregated model, the disaggregated model is used for described The characteristic vector of text data is characterized the probability for determining that the text data corresponds to suspicious event.
It should be noted that before above-mentioned steps 120- steps 160 are performed, it is also necessary to the step of model is established in execution, Specifically it may include steps of:
First, sample data is obtained, the sample data includes:Sample event, and text corresponding to the sample event Notebook data;Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;With sample thing Characteristic vector is characterized corresponding to part, establishes disaggregated model.Disaggregated model based on foundation, new text data is carried out pre- Survey.
Understandable to be, sample event includes positive sample and negative sample, and positive sample is the thing related to suspicious event Part, such as:Terrorist incident etc., correspondingly, its corresponding text data can be the conversation content of terrorist, action planning Information, crime route etc..
In order to improve the precision of the model of foundation, the embodiment of the present application characterized by characteristic vector corresponding to text data, 4 kinds of disaggregated models are established to exemplary, including:Disaggregated model based on Bayes, the classification mould based on SVMs Type, the disaggregated model based on convolutional neural networks, the disaggregated model based on Recognition with Recurrent Neural Network.
It can be seen that the embodiment of the present application is by capturing the social network data of magnanimity, and it is based on depth text representation model pair Text data therein carries out natural language processing, therefrom to find the characteristic vector as key influence factor, and by feature Input of the vector as the model established based on deep learning, to be predicted to text data, reaches the suspicious thing of Accurate Prediction The purpose of part.
Embodiment 2
Fig. 2 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 2 provides, referring to Fig. 2, this method Specifically it may include steps of:
Step 220, obtain the text data in social network data and the user behavior number associated with the text data According to;
It should be noted that social network data includes structural data and unstructured data.Wherein, user behavior Data belong to structural data, including:Solid data and/or label data, the solid data is used to represent and text data The set of related data, the label data be used to represent in text data or text data label corresponding to word and Data corresponding to label.With one section of dialogue for text data, its corresponding solid data is exemplified as:It is the participation entity of dialogue, right The scene of words and time etc., and participate in the behavioral datas such as the web page browsing, search, click of entity personage A progress. Label data is exemplified as:Article B corresponding to the word occurred in dialogue, property corresponding to the classification and the category belonging to article B Matter, such as:Whether belong to for contraband etc..
Step 240, vectorization processing is carried out to the text data, obtain characteristic vector corresponding to the text data;
It should be noted that step 240 is similar to the step 140 in embodiment 1, therefore, no longer step 240 is opened up herein Open explanation.
Step 260, feature selecting processing, characteristic variable corresponding to acquisition are carried out to the user behavior data;
It should be noted that a kind of implementation of step 260 can be:
First, the variable in the user behavior data is determined, variable can be:Participate in main body, time, place, wherein Article being related to etc.;Then, the variable is scored based on predetermined Method for Feature Selection, to determine the variable pair The disturbance degree of event corresponding to the text data;Disturbance degree is chosen from the variable in the user behavior data to meet in advance The variable of fixed condition, as characteristic variable.
In this implementation, the disturbance degree of variable is smaller, thinks that its influence to user behavior data is smaller, such as: For the article " water " occurred in data, it is typically not considered as having what relation with suspicious event, therefore, it is to user behavior The influence of data is smaller;It is and then on the contrary for " pistol ", " antitank grenade ", " firearms model " etc..
In order to improve the precision of the characteristic variable of selection, the Method for Feature Selection that the embodiment of the present application proposes can be specially In filtering type filter Method for Feature Selection, packaging type Wrapper Method for Feature Selection, integrated form Embedded Method for Feature Selection It is at least one.It selects remote include:(1) with filter methods to drawing variable score by calculating coefficient correlation and chi-square value. (2) decision Tree algorithms are based on using recursive back-and-forth method forward to give a mark to each variable.(3) returned by lasso plus decision tree enters Row variables choice, penalty term is introduced, by the coefficient boil down to 0 of Partial Variable.The result rationally chosen using above-mentioned three kinds of methods One or more of, show that final mask needs the characteristic variable adopted.For example, the variable in a text data includes: Sex, age, geographical position, equipment for surfing the net, online duration, message hop count etc., can by features described above system of selection Therefrom to select the factor for significantly affecting the suspicious probability of text data, as characteristic variable.
Wherein, the principle of each Method for Feature Selection is as follows:
Filter methods:For continuous variable, variance back-and-forth method selection variance can be used to be more than the variable of certain threshold value, The coefficient correlation of characteristic variable and target variable can be calculated.In the case of characteristic variable and target variable are all qualitative variables, The correlation between variable can be portrayed using Chi-square Test or mutual information.
Wrapper methods:Using the performance of learning algorithm come the quality of evaluating characteristic subset.Wrapper methods need training one Individual learner, character subset is selected according to the performance of learner, available algorithm includes decision tree, neutral net, KNN Deng.
Embedded methods:Integration Method refers to feature selecting algorithm and learning algorithm being integrated together, and is such as based on lasso Carry out variables choice and variables choice is carried out based on tree-model.
Step 280, the characteristic vector and the characteristic variable are inputted to pre-established disaggregated model as feature, and The prediction result of the disaggregated model output is obtained, the prediction result is used to represent that the text data corresponds to suspicious thing The probability of part.
It should be noted that it is similar to the associated description of step 160 to embodiment 1, before step 280 is carried out, together The step of sample needs to carry out establishing disaggregated model, is specifically as follows:
Sample data is obtained, the sample data includes:Sample event, and textual data corresponding to the sample event According to and/or user behavior data;Vectorization processing is carried out to the text data, obtains feature corresponding to the text data Vector;And/or feature selecting processing is carried out to the user behavior data, obtain corresponding with the user behavior data special Levy variable;Characterized by characteristic vector corresponding to sample event and/or characteristic variable, disaggregated model is established.
In addition, after the prediction of suspicious event is completed, the application can also be further to can corresponding to suspicious event Doubtful entity is predicted.It is specifically as follows:
It is determined that text data correspond to suspicious event probability after, if probability meets predetermined standard, to the text The related data for the entity (personage) that data are related to is predicted, and further to excavate crime club etc., reaches further Improve the effect that prevention event occurs.Wherein, the related data of entity can be:Essential information, relative social number According to, its local and overseas deed etc..Analysis for suspicious entity, can be on the basis of the prediction to its social data, further Its deed, whereabouts etc. are analyzed, to predict the suspicious degree of entity from multiple dimensions.
It can be seen that the embodiment of the present application considers the characteristic vector of text message and the spy of corresponding user behavior data The feature of two angles of variable is levied, carries out the prediction of suspicious event, on the basis of embodiment 1, further to improve prediction Precision.
Embodiment 3
Fig. 3 is a kind of schematic flow sheet for event prediction method that the embodiment of the present application 3 provides, referring to Fig. 3, below from The application is described in detail the angle of example:
Step 320, social network data is captured from social platform
Social network data is pre-processed, obtains training data, training data includes:Message-text (textual data According to), solid data and label data.Wherein, the process of pretreatment is in Examples 1 and 2 to be described, therefore, herein not Repeat again.
The vectorization of step 340, Message-text represents
The expression of term vector through the most important task of natural language processing, in order to preferably complete it is most of from Right language processing tasks are, it is necessary to similarity and difference between defined terms and word.The present embodiment using word2vec train word to Amount, word2vec has two basic models, is CBOW term vectors model and Skip-gram term vector models respectively, referring to Fig. 4, explanation calculates the process of term vector by taking Skip-gram as an example:
Skip-gram models are a three-layer neural networks, input of the single word w (t) as model, by hidden layer Most Zhongdao softmax layers draw the word w (t-2) of the word context, w (t-1), w (t+1), w (t+2) probability and corresponding hidden The weighted value of layer is hidden, as the term vector for trying to achieve word w (t).
Based on word2vec corpus related to terrorism, the term vector in training corpus.But due to every message The word number of text differs, such as:I have a book. the words has 4 words, and this 4 words are represented by term vector respectively, Therefore, there are 4 term vectors.To enable the words to be represented by a term vector, the simple average of 4 term vectors can be taken Represent.In this way, each Message-text is represented with a vector, and disaggregated model is made after being easy to.
The variables choice of step 360, solid data and label data
Because solid data and label data dimension are higher, therefore, it is necessary to chosen with the method for Feature Engineering is to event The no suspicious variable having a significant impact, so that modelling effect is optimal.Solid data and label data need to input Whether eigenmatrix, event can be suspected of the object vectors for needing input.For continuous variable, it is necessary to standardize and normalizing Change is handled, and for classifying type variable, it is necessary to carry out dummy variable coding, some missing values need to be handled with interpolation.Feature is selected Selecting conventional method includes Filter methods, Wrapper methods, Embedded and dimension reduction method.Because being related to a large amount of societies in this patent Friendship and user behavior data, it is therefore desirable to carry out feature selecting.
In addition, the process of variables choice is corresponding with the description as described in step 260 in embodiment 2, therefore, herein no longer Repeat.
Step 380, establish disaggregated model
After obtaining the feature that Message-text vector sum chooses, two disaggregated models can be established.Because target variable is positive and negative There is extreme imbalance problem in class, therefore, it is necessary to handle unbalanced data, conventional method includes oversampling, SMOTE etc..Next the models such as Random forest, Logistc regression, SVM are attempted, first by data set point For training set and test set, using sklearn training patterns, the accuracy obtained under different models is compared, model is commented The method estimated includes hold-out, cross validation, TPR, TNR etc..
In addition to conventional machines learning classification model, this patent has also been attempted to classify to text with deep learning model, Different from traditional feed-forward neutral net, RNN introduces directed circulation, and forward-backward correlation is asked between can handling input Topic.
Referring to Fig. 5, RNN is used for handling sequence data, in traditional neural network, the node between every layer be it is connectionless, But in natural language processing, front and rear word is not independent in sentence, RNN can be remembered and answered to information above For in the calculating that currently exports, i.e., the node between hidden layer to be no longer connectionless but has connection, and hidden layer is defeated Entering the not only output including input layer also includes the output of last moment hidden layer.Show that Message-text whether may be used based on RNN Doubt.
New Message-text is predicted using the model established, help intelligence analysis personnel do decision-making so as to and When take precautions against the attack of terrorism.
Step 3100, modelling effect analysis
First, selected using above-mentioned feature selection approach to the whether suspicious variable having a significant impact of event, it is main Including:Predominantly structural data, including spot, weapon type, target, the spot historical events number Amount etc..Term vector is trained by word2vec, and text classification is carried out with the term vector trained.
Totally 60000, model training sample, wherein terrorist incident 120, are consequently belonging to classification height unbalanced data. In modeling process, class imbalance problem is adjusted using SMOTE algorithms.Two scenes are attempted respectively:1. merely with term vector Message-text is classified as aspect of model variable.2. some structural data variables are additionally included, with term vector in the lump As feature input model.Above-mentioned two scene is attempted respectively to establish machine learning and deep learning model, uses cross validation Method carries out model selection, and computation model overall accuracy, accuracy rate, recall rate are as a result as follows respectively:
Only feature input model is used as by the use of term vector:
First attempt to only classify to short message using term vector as characteristic variable.Mainly two kinds of engineerings are attempted Practise model (naive BayesianBayes+ support vector machines) and two kinds of deep learning model (convolutional neural networks CNN+ Recognition with Recurrent Neural Network RNN), class is aligned respectively and negative class sample randomly selects 1/3 as test set and is used for model evaluation, Positive class/negative class sample and terrorist incident and its related message text data/non-terrorist incident and its related Message-text number According to corresponding.The very negative rate of model accuracy accuracy, real rate TPR, TNR is calculated respectively, due to the positive and negative class height of data not Balance, therefore TPR and TNR have been considered here, G-means is calculated as final judgment criteria.
Model accuracy of the table 1 by the use of term vector as feature
It can be seen from the results above that in the case where considering the nicety of grading of positive class and negative class, except SVM essence Spend outside poor, other three modelling effects are pretty good.
Binding characteristic variable is as feature input model:
Secondly, with reference to some structural datas, such as spot, weapon type, target, short text is carried out Classification, with to the further lifting of model accuracy.As a result such as following table:
The model accuracy that table 2 is added after affair character
From the point of view of result above, add after the variable of description affair character, modelling effect has to be lifted by a small margin, is led to Integrated comparative is crossed, finally have chosen the RNN models for including affair character variable as final disaggregated model.
Step 3120, based on disaggregated model new events are predicted
The prediction carried out to new events is similar to the description in Examples 1 and 2, therefore, here is omitted.
It should be noted that the executive agent that embodiment 1-3 provides each step of method may each be same equipment, Or this method is also by distinct device as executive agent.For example the executive agent of step 120 and step 140 can be to set Standby 1, the executive agent of step 160 can be equipment 2;Again for example, the executive agent of step 120 can be equipment 1, step 140 Executive agent with step 160 can be equipment 2;Etc..
In addition, for above method embodiment, in order to be briefly described, therefore it is all expressed as to a series of action group Close, but those skilled in the art should know, embodiment of the present invention is not limited by described sequence of movement, because For according to embodiment of the present invention, some steps can use other orders or carry out simultaneously.Secondly, people in the art Member should also know that embodiment described in this description belongs to preferred embodiment, and involved action might not Necessary to being embodiment of the present invention.
Embodiment 4
Fig. 6 is the structural representation for the event prediction device that the embodiment of the present application 4 provides, and referring to Fig. 6, the device includes: First acquisition unit 61, first processing units 62 and second processing unit 63, wherein:
First acquisition unit 61, for obtaining the text data in social network data;
First processing units 62, for carrying out vectorization processing to the text data, it is corresponding to obtain the text data Characteristic vector;
Second processing unit 63, for the characteristic vector to be inputted to pre-established disaggregated model, the disaggregated model For determining that the text data corresponds to the probability of suspicious event characterized by the characteristic vector of the text data.
Wherein, the operation principle of first processing units 62 is briefly described:
First processing units 62 are used to carry out vectorization processing to the word in the text data, and it is corresponding to obtain institute's predicate Term vector;Term vector corresponding to word in the text data obtains characteristic vector corresponding to the text data.Tool Body:Based on the word in text data described in text depth representing model training, the word that text depth representing model exports is obtained Vector.Term vector corresponding to word in the text data is averaged, and using the vector of acquisition as the text Characteristic vector corresponding to data.
It can be seen that the embodiment of the present application is by capturing the social network data of magnanimity, and it is based on depth text representation model pair Text data therein carries out natural language processing, therefrom to find the characteristic vector as key influence factor, and by feature Input of the vector as the model established based on deep learning, to be predicted to text data, reaches the suspicious thing of Accurate Prediction The purpose of part.
Embodiment 5
Fig. 7 is the structural representation for the event prediction device that the embodiment of the present application 5 provides, and referring to Fig. 6, the device includes: First acquisition unit 71, second acquisition unit 72, first processing units 73 and second processing unit 74, wherein:
First acquisition unit 71, for obtaining the text data in social network data;
Second acquisition unit 72, for obtaining the user's row associated in the social network data with the text data For data;
First processing units 73, for carrying out vectorization processing to the text data, it is corresponding to obtain the text data Characteristic vector;Feature selecting processing, characteristic variable corresponding to acquisition are carried out to the user behavior data;
Second processing unit 74, for the characteristic vector and the characteristic variable to be inputted to pre-established as feature Disaggregated model.
Wherein, first processing units 73 are used to determine the variable in the user behavior data;Selected based on predetermined feature Select method to score to the variable, to determine disturbance degree of the variable to event corresponding to the text data;From described The variable that disturbance degree meets predetermined condition is chosen in variable in user behavior data, as characteristic variable.
It can be seen that the embodiment of the present application considers the characteristic vector of text message and the spy of corresponding user behavior data The feature of two angles of variable is levied, the prediction of suspicious event is carried out, can further improve the precision of prediction.
Embodiment 6
Fig. 8 is the structural representation for the event prediction system that the embodiment of the present application 6 provides, and referring to Fig. 8, the system includes: Data warehouse 81, kafka clusters 82 and storm clusters 83, wherein:
The data warehouse 81, for storing social network data, and the producer for the kafka clusters provides society Hand over network data;
The kafka clusters 82, for being pre-processed to the social network data, to extract the social networks Text data and/or user behavior data in data;
The storm clusters 83, for calling event prediction device corresponding to embodiment 5 or 6, to consume the kafka Text data and/or user behavior data in cluster, probability of the output corresponding to suspicious event.
It should be noted that the operation principle of system is as follows:
Capture full dose social network data (twitter and facebook) and carry out ETL processing, and by the data after processing According to the data warehouse module loading predefined into data warehouse.Newly-increased message is handled by kafka clusters. Corresponding entity and label data are obtained based on message content.By Message-text be converted to the term vector of structuring and entity and Label data carries out feature selecting and finds out the suspicious key factor of influence event together.Establish machine learning and deep learning mould Type, the suspicious of unknown information is predicted.
Wherein, external data (social media data) is parsed and cleaned into data warehouse, then is entered from data warehouse Enter Kafka, consumer pulls real-time increased message data from Broker, with reference to a hive (data based on Hadoop Warehouse instrument, the data file of structuring can be mapped as to a database table, and simple sql query functions are provided, can Run so that sql sentences are converted into MapReduce tasks) in solid data and label data, call packaged calculation Method bag carries out that the suspicious probability of event is calculated, and because message data is newly-increased data in real time, the suspicious probability of message uses Storm frameworks carry out streaming computing, and external data flows into Storm through Spout by Kafka in the form of Tuple and calculates collection in real time Group, gives the Topology processing in cluster, in Topology each the bolt of node as a specific task, The packaged algorithm bag of parallel calling carries out the calculating of the suspicious probability of event, is finally deposited result of calculation by last bolt Enter mysql.
The Kafka that the present embodiment uses is that a kind of distributed post of high-throughput subscribes to message system, and it can be handled Everything flow data in the website of consumer's scale.It is this action (web page browsing, search and other users action) be One key factor of many social functions on modern network.These data are often as the requirement of handling capacity and led to Processing daily record and log aggregation are crossed to solve.For the daily record data as Hadoop and off-line analysis system, but again The limitation handled when realistic, this is a feasible solution.Kafka purpose is the loaded in parallel machine by Hadoop System unifies Message Processing on line and offline, also for providing real-time consumption by cluster.
In addition, the system uses the real-time Computational frames of Storm, there is the characteristics of low latency, high-performance, Distributed Calculation, Therefore the recognition result that can provide intelligence in time is analyzed for intelligence agent.In addition, this patent make use of extensive social network Network data and internet behavior data analysis terrorist attacks feature, unified with nature language processing techniques, can reach automatic knowledge The purpose of other suspicious event.
For said apparatus embodiment, because it is substantially similar to method embodiment, so the comparison of description Simply, related part illustrates referring to the part of method embodiment.It should be noted that in each of device of the invention In individual part, logical partitioning is carried out to part therein according to the function that it to be realized, still, the present invention is not only restricted to All parts can be repartitioned or combined as needed by this.
Embodiment 7
Fig. 9 is the structural representation for a kind of electronic equipment that the embodiment of the present application 7 provides, referring to Fig. 9, the electronic equipment Including:Processor, internal bus, network interface, internal memory and nonvolatile memory, other business are also possible that certainly Required hardware.Processor read from nonvolatile memory corresponding to computer program into internal memory then run, Event prediction device is formed on logic level.Certainly, in addition to software realization mode, the application is not precluded from other realizations Mode, such as mode of logical device or software and hardware combining etc., that is to say, that the executive agent of following handling process is not It is defined in each logic unit or hardware or logical device.
Network interface, processor and memory can be connected with each other by bus system.Bus can be ISA (Industry Standard Architecture, industry standard architecture) bus, PCI (Peripheral Component Interconnect, Peripheral Component Interconnect standard) bus or EISA (Extended Industry Standard Architecture, EISA) bus etc..The bus can be divided into address bus, data/address bus, control Bus etc..For ease of representing, only represented in Fig. 9 with a four-headed arrow, it is not intended that an only bus or a type Bus.
Memory is used to deposit program.Specifically, program can include program code, and described program code includes calculating Machine operational order.Memory can include read-only storage and random access memory, and provide instruction sum to processor According to.Memory may include high-speed random access memory (Random-Access Memory, RAM), it is also possible to also including non- Volatile memory (non-volatile memory), for example, at least 1 magnetic disk storage.
Processor, for performing the program of the memory storage, and specifically perform:
Obtain the text data in social network data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the text data Characteristic vector be characterized the probability for determining that the text data corresponds to suspicious event.
Above-mentioned event prediction device device or manager as disclosed in the application Fig. 1-2 and embodiment illustrated in fig. 6 (Master) method that node performs can apply in processor, or be realized by processor.Processor is probably a kind of collection Into circuit chip, the disposal ability with signal.In implementation process, each step of the above method can be by processor Hardware integrated logic circuit or software form instruction complete.Above-mentioned processor can be general processor, including Central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.; It can also be digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field- Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic device Part, discrete hardware components.It can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present application. General processor can be microprocessor or the processor can also be any conventional processor etc..It is real with reference to the application The step of applying the method disclosed in example can be embodied directly in hardware decoding processor and perform completion, or use decoding processor In hardware and software module combination perform completion.Software module can be located at random access memory, and flash memory, read-only storage can In the ripe storage medium in this area such as program read-only memory or electrically erasable programmable memory, register.The storage Medium is located at memory, and processor reads the information in memory, with reference to the step of its hardware completion above method.
Event prediction device device can also carry out Fig. 1 method, and realize the method that manager's node performs.
Based on identical innovation and creation, the embodiment of the present application also provides a kind of computer-readable recording medium, and computer can Read storage medium and store one or more programs, one or more of programs are set when the electronics for being included multiple application programs During standby execution so that the electronic equipment performs following methods:
Obtain the text data in social network data;
Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the text data Characteristic vector be characterized the probability for determining that the text data corresponds to suspicious event.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer journey Sequence product.Therefore, in terms of the present invention can use complete hardware embodiment, complete software embodiment or combine software and hardware The form of embodiment.Moreover, the present invention can use the calculating for wherein including computer usable program code in one or more The computer program that machine usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by each in computer program instructions implementation process figure and/or block diagram Flow and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computers can be provided Processor of the programmed instruction to all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices To produce a machine so that produced by the instruction of computer or the computing device of other programmable data processing devices For realizing the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames Device.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included The manufacture of command device, the command device are realized in one flow of flow chart or multiple flows and/or one square frame of block diagram Or the function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that Series of operation steps is performed on computer or other programmable devices to produce computer implemented processing, so as to calculate The instruction performed on machine or other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or side The step of function of being specified in one square frame of block diagram or multiple square frames.
In a typical configuration, computing device include one or more processors (CPU), input/output interface, Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the form such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any side Method or technology realize that information stores.Information can be computer-readable instruction, data structure, the module of program or other numbers According to.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk Or other magnetic storage apparatus or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Press Defined according to herein, computer-readable medium does not include temporary computer readable media (transitory media), such as modulates Data-signal and carrier wave.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including described Other identical element also be present in the process of key element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program production Product.Therefore, the application can use the implementation in terms of complete hardware embodiment, complete software embodiment or combination software and hardware The form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more can With the computer program product implemented in storage medium (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) Form.
Embodiments herein is the foregoing is only, is not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, etc. With replacement, improvement etc., should be included within the scope of claims hereof.

Claims (11)

  1. A kind of 1. event prediction method, it is characterised in that including:
    Obtain the text data in social network data;
    Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;
    The characteristic vector is inputted to pre-established disaggregated model, the disaggregated model and is used for the feature of the text data Vector is characterized the probability for determining that the text data corresponds to suspicious event.
  2. 2. according to the method for claim 1, it is characterised in that before the text data in obtaining social network data, Also include:
    Social network data is obtained from social networks;
    Non-text data in unstructured data in the social network data is converted into text data.
  3. 3. according to the method for claim 1, it is characterised in that it is described that vectorization processing is carried out to the text data, obtain Characteristic vector corresponding to the text data is taken to include:
    Based on the word in text data described in text depth representing model training, obtain the word of text depth representing model output to Amount.
    The calculating averaged to term vector corresponding to the word in the text data, and using result of calculation as the text Characteristic vector corresponding to notebook data.
  4. 4. according to the method for claim 1, it is characterised in that inputted using the characteristic vector as feature to pre-established Disaggregated model before, in addition to:
    Obtain the user behavior data associated in the social network data with the text data;
    Feature selecting processing, characteristic variable corresponding to acquisition are carried out to the user behavior data;
    Wherein, the characteristic vector is inputted to pre-established disaggregated model as feature includes:
    The characteristic vector and the characteristic variable are inputted to pre-established disaggregated model as feature.
  5. 5. according to the method for claim 4, it is characterised in that described that the user behavior data is carried out at feature selecting Reason, obtaining correlated variables includes:
    Determine the variable in the user behavior data;
    The variable is scored based on predetermined Method for Feature Selection, to determine the variable to corresponding to the text data The disturbance degree of event;
    The variable that disturbance degree meets predetermined condition is chosen from the variable in the user behavior data, as characteristic variable.
  6. 6. according to the method for claim 5, it is characterised in that the predetermined Method for Feature Selection is filtering type feature selecting It is at least one in method, packaging type Method for Feature Selection, integrated form Method for Feature Selection.
  7. 7. according to the method for claim 6, it is characterised in that the user behavior data includes:Solid data and/or mark Data are signed, the solid data is used to represent the set of the data related to text data, and the label data is used to represent text Data corresponding to label corresponding to word and label in notebook data or text data.
  8. 8. according to the method for claim 1, it is characterised in that obtain disaggregated model output prediction result it Afterwards, in addition to:
    The suspicious probability of the entity related to the text data is determined according to the prediction result.
  9. 9. according to the method described in claim any one of 1-8, it is characterised in that inputted using the characteristic vector as feature Before to pre-established disaggregated model, in addition to:
    Sample data is obtained, the sample data includes:Sample event, and text data corresponding to the sample event and/ Or user behavior data;
    Vectorization processing is carried out to the text data, obtains characteristic vector corresponding to the text data;And/or to described User behavior data carries out feature selecting processing, obtains characteristic variable corresponding with the user behavior data;
    Characterized by characteristic vector corresponding to sample event and/or characteristic variable, disaggregated model is established.
  10. A kind of 10. event prediction device, it is characterised in that including:
    First acquisition unit, for obtaining the text data in social network data;
    First processing units, for carrying out vectorization processing to the text data, obtain feature corresponding to the text data Vector;
    Second processing unit, for the characteristic vector to be inputted to pre-established disaggregated model, the disaggregated model be used for The characteristic vector of the text data is characterized the probability for determining that the text data corresponds to suspicious event.
  11. A kind of 11. event prediction system, it is characterised in that including:Data warehouse, kafka clusters and storm clusters, wherein:
    The data warehouse, for storing social network data, and the producer for the kafka clusters provides social networks number According to;
    The kafka clusters, for being pre-processed to the social network data, to extract in the social network data Text data and/or user behavior data;
    The storm clusters, for calling the event prediction device described in claim 10, to consume in the kafka clusters Text data and/or user behavior data, output corresponding to suspicious event probability.
CN201711064205.6A 2017-11-02 2017-11-02 Event prediction method, device and system Active CN107622333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711064205.6A CN107622333B (en) 2017-11-02 2017-11-02 Event prediction method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711064205.6A CN107622333B (en) 2017-11-02 2017-11-02 Event prediction method, device and system

Publications (2)

Publication Number Publication Date
CN107622333A true CN107622333A (en) 2018-01-23
CN107622333B CN107622333B (en) 2020-08-18

Family

ID=61092921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711064205.6A Active CN107622333B (en) 2017-11-02 2017-11-02 Event prediction method, device and system

Country Status (1)

Country Link
CN (1) CN107622333B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182279A (en) * 2018-01-26 2018-06-19 有米科技股份有限公司 Object classification method, device and computer equipment based on text feature
CN108932530A (en) * 2018-06-29 2018-12-04 新华三大数据技术有限公司 The construction method and device of label system
CN108960291A (en) * 2018-06-08 2018-12-07 武汉科技大学 A kind of image processing method and system based on parallelization Softmax classification
CN109409529A (en) * 2018-09-13 2019-03-01 北京中科闻歌科技股份有限公司 A kind of event cognitive analysis method, system and storage medium
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
CN109614541A (en) * 2018-12-04 2019-04-12 北京艾漫数据科技股份有限公司 A kind of event recognition method, medium, device and calculate equipment
CN109766429A (en) * 2019-02-19 2019-05-17 北京奇艺世纪科技有限公司 A kind of sentence retrieval method and device
CN109815415A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Social media user interest recognition methods based on card side's word frequency analysis
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN110162558A (en) * 2019-04-01 2019-08-23 阿里巴巴集团控股有限公司 Structural data processing method and processing device
CN110210559A (en) * 2019-05-31 2019-09-06 北京小米移动软件有限公司 Object screening technique and device, storage medium
CN110491145A (en) * 2018-10-29 2019-11-22 魏天舒 A kind of traffic signal optimization control method and device
WO2020063071A1 (en) * 2018-09-27 2020-04-02 厦门快商通信息技术有限公司 Sentence vector calculation method based on chi-square test, and text classification method and system
CN111046179A (en) * 2019-12-03 2020-04-21 哈尔滨工程大学 Text classification method for open network question in specific field
CN111159166A (en) * 2019-12-27 2020-05-15 沃民高新科技(北京)股份有限公司 Event prediction method and device, storage medium and processor
WO2020124026A1 (en) * 2018-12-13 2020-06-18 SparkCognition, Inc. Security systems and methods
CN111459959A (en) * 2020-03-31 2020-07-28 北京百度网讯科技有限公司 Method and apparatus for updating event set
CN111477328A (en) * 2020-03-31 2020-07-31 北京智能工场科技有限公司 Non-contact psychological state prediction method
CN111626783A (en) * 2020-04-30 2020-09-04 贝壳技术有限公司 Offline information setting method and device for realizing event conversion probability prediction
CN111770097A (en) * 2020-06-29 2020-10-13 中国科学院计算技术研究所 Content lock firewall method and system based on white list
CN112101950A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Suspicious transaction monitoring model feature extraction method and device
CN112233381A (en) * 2020-10-14 2021-01-15 中国科学院、水利部成都山地灾害与环境研究所 Debris flow early warning method and system based on mechanism and machine learning coupling
CN112487406A (en) * 2020-12-02 2021-03-12 中国电子科技集团公司第三十研究所 Network behavior analysis method based on machine learning
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
CN114707685A (en) * 2021-12-17 2022-07-05 武汉烽火众智智慧之星科技有限公司 Event prediction method and device based on big data modeling analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116605A (en) * 2013-01-17 2013-05-22 上海交通大学 Method and system of microblog hot events real-time detection based on detection subnet
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN104281607A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Microblog hot topic analyzing method
CN107169629A (en) * 2017-04-17 2017-09-15 四川九洲电器集团有限责任公司 A kind of telecommunication fraud recognition methods and data processing equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116605A (en) * 2013-01-17 2013-05-22 上海交通大学 Method and system of microblog hot events real-time detection based on detection subnet
CN104281607A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Microblog hot topic analyzing method
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN107169629A (en) * 2017-04-17 2017-09-15 四川九洲电器集团有限责任公司 A kind of telecommunication fraud recognition methods and data processing equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董坚峰: ""面向公共危机预警的网络舆情分析研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182279A (en) * 2018-01-26 2018-06-19 有米科技股份有限公司 Object classification method, device and computer equipment based on text feature
CN108960291A (en) * 2018-06-08 2018-12-07 武汉科技大学 A kind of image processing method and system based on parallelization Softmax classification
CN108932530A (en) * 2018-06-29 2018-12-04 新华三大数据技术有限公司 The construction method and device of label system
CN109409529A (en) * 2018-09-13 2019-03-01 北京中科闻歌科技股份有限公司 A kind of event cognitive analysis method, system and storage medium
CN109409529B (en) * 2018-09-13 2020-12-08 北京中科闻歌科技股份有限公司 Event cognitive analysis method, system and storage medium
WO2020063071A1 (en) * 2018-09-27 2020-04-02 厦门快商通信息技术有限公司 Sentence vector calculation method based on chi-square test, and text classification method and system
CN110491145A (en) * 2018-10-29 2019-11-22 魏天舒 A kind of traffic signal optimization control method and device
CN109543153B (en) * 2018-11-13 2023-08-18 成都数联铭品科技有限公司 Sequence labeling system and method
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
CN109614541A (en) * 2018-12-04 2019-04-12 北京艾漫数据科技股份有限公司 A kind of event recognition method, medium, device and calculate equipment
GB2595088A (en) * 2018-12-13 2021-11-17 Sparkcognition Inc Security systems and methods
WO2020124026A1 (en) * 2018-12-13 2020-06-18 SparkCognition, Inc. Security systems and methods
CN109815415A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Social media user interest recognition methods based on card side's word frequency analysis
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN109871889B (en) * 2019-01-31 2019-12-24 内蒙古工业大学 Public psychological assessment method under emergency
CN109766429A (en) * 2019-02-19 2019-05-17 北京奇艺世纪科技有限公司 A kind of sentence retrieval method and device
CN110162558A (en) * 2019-04-01 2019-08-23 阿里巴巴集团控股有限公司 Structural data processing method and processing device
CN110210559B (en) * 2019-05-31 2021-10-08 北京小米移动软件有限公司 Object screening method and device and storage medium
CN110210559A (en) * 2019-05-31 2019-09-06 北京小米移动软件有限公司 Object screening technique and device, storage medium
CN111046179A (en) * 2019-12-03 2020-04-21 哈尔滨工程大学 Text classification method for open network question in specific field
CN111046179B (en) * 2019-12-03 2022-07-15 哈尔滨工程大学 Text classification method for open network question in specific field
CN111159166A (en) * 2019-12-27 2020-05-15 沃民高新科技(北京)股份有限公司 Event prediction method and device, storage medium and processor
CN111459959A (en) * 2020-03-31 2020-07-28 北京百度网讯科技有限公司 Method and apparatus for updating event set
CN111477328A (en) * 2020-03-31 2020-07-31 北京智能工场科技有限公司 Non-contact psychological state prediction method
CN111626783A (en) * 2020-04-30 2020-09-04 贝壳技术有限公司 Offline information setting method and device for realizing event conversion probability prediction
CN111626783B (en) * 2020-04-30 2021-08-31 贝壳找房(北京)科技有限公司 Offline information setting method and device for realizing event conversion probability prediction
CN111770097B (en) * 2020-06-29 2021-04-23 中国科学院计算技术研究所 Content lock firewall method and system based on white list
CN111770097A (en) * 2020-06-29 2020-10-13 中国科学院计算技术研究所 Content lock firewall method and system based on white list
CN112101950A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Suspicious transaction monitoring model feature extraction method and device
CN112101950B (en) * 2020-09-27 2024-05-10 中国建设银行股份有限公司 Suspicious transaction monitoring model feature extraction method and suspicious transaction monitoring model feature extraction device
CN112233381A (en) * 2020-10-14 2021-01-15 中国科学院、水利部成都山地灾害与环境研究所 Debris flow early warning method and system based on mechanism and machine learning coupling
CN112487406A (en) * 2020-12-02 2021-03-12 中国电子科技集团公司第三十研究所 Network behavior analysis method based on machine learning
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
CN114707685A (en) * 2021-12-17 2022-07-05 武汉烽火众智智慧之星科技有限公司 Event prediction method and device based on big data modeling analysis

Also Published As

Publication number Publication date
CN107622333B (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN107622333A (en) A kind of event prediction method, apparatus and system
Banerjee et al. Detection of cyberbullying using deep neural network
CN112221156B (en) Data abnormality recognition method, data abnormality recognition device, storage medium, and electronic device
CN108920654A (en) A kind of matched method and apparatus of question and answer text semantic
ALRashdi et al. Deep learning and word embeddings for tweet classification for crisis response
Nagamanjula et al. A novel framework based on bi-objective optimization and LAN2FIS for Twitter sentiment analysis
US11200381B2 (en) Social content risk identification
CN108961032A (en) Borrow or lend money processing method, device and server
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
CN113139052B (en) Rumor detection method and device based on graph neural network feature aggregation
CN111538794A (en) Data fusion method, device and equipment
WO2023236469A1 (en) Video action recognition method and apparatus, electronic device, and storage medium
CN107392311A (en) The method and apparatus of sequence cutting
Kumar et al. Content based bot detection using bot language model and bert embeddings
Lin et al. Social rumor detection based on multilayer transformer encoding blocks
Hossain et al. A study towards Bangla fake news detection using machine learning and deep learning
Gautam et al. A review on cyberstalking detection using machine learning techniques: Current trends and future direction
Rama et al. Deep learning to address candidate generation and cold start challenges in recommender systems: A research survey
CN116484105B (en) Service processing method, device, computer equipment, storage medium and program product
Xu et al. Rumor detection on microblogs using dual-grained feature via graph neural networks
Lan et al. Mining semantic variation in time series for rumor detection via recurrent neural networks
AlSulaim et al. Prediction of Anime Series' Success using Sentiment Analysis and Deep Learning
Dong et al. Rumor Detection with Adversarial Training and Supervised Contrastive Learning
Narayan et al. Fake news detection using hybrid of deep neural network and stacked lstm
Goldani et al. X-CapsNet For Fake News Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 100081 101 / F, building 14, 27 Jiancai Middle Road, Haidian District, Beijing

Patentee after: Beijing PERCENT Technology Group Co.,Ltd.

Address before: 100081 16 / F, block a, Beichen Century Center, building 2, courtyard 8, Beichen West Road, Chaoyang District, Beijing

Patentee before: BEIJING BAIFENDIAN INFORMATION SCIENCE & TECHNOLOGY Co.,Ltd.