CN109635207A

CN109635207A - A kind of social network user personality prediction technique based on Chinese text analysis

Info

Publication number: CN109635207A
Application number: CN201811553414.1A
Authority: CN
Inventors: 李岩锋; 高俊波; 孙伟; 李铁锋; 白静静
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2019-04-16

Abstract

A kind of social network user personality prediction technique based on Chinese text analysis, by handling user's Chinese text data that nearly a period of time is issued on social networks, text is divided into user's basic status information, user interaction information, user version information three classes.Text data is pre-processed, the data set being made of all kinds of words is obtained；Part-of-speech tagging is carried out to text data based on sentiment dictionary, calculates the frequency of occurrences of all kinds of parts of speech in the text, above three category information of Combinatorial Optimization user, and the result test based on expert's scale to user constructs the data set of numeralization as factual data；Feature Engineering is carried out to obtained data set, the characteristic element collection for being used for personality prediction will be obtained, personality prediction model is obtained based on BP neural network training, is predicted by the personality of the model realization social network user.The present invention is convenient with data acquisition, does not depend on psychological professional experience, need not spend human and material resources, the high advantage of accuracy.

Description

A kind of social network user personality prediction technique based on Chinese text analysis

Technical field

The present invention relates to Internet technical fields, and in particular to a kind of social network user people based on Chinese text analysis Lattice prediction technique.

Background technique

With the fast development of Internet technology and its continuous expansion of application field, such as microblogging, circle of friends social network Network is changing the life of the mankind, and people can release news and interact on it, thus forms huge network Environment, information spread speed is fast and range is wide, possesses good timeliness, convenient and efficient.

Information on social networks carrier, which shows the emotion come, mainly to be influenced by user personality；For reversed, use The external manifestation of family individual character is mainly therefore, to analyze the personality of social network user by the emotional expression of user, can be effectively The emotion information for understanding user, after promoting internet product, precisely launching advertisement, push personalized content, product is provided Phase service etc. is of great significance.

Mankind's personality model of mainstream is five-factor model personality model, and the personality of a people is considered as following five kinds of personal traits Synthesis:

Neurotic (Neuroticism): actively performance: be easily offended, be easy dejected, uneasy, self-consciousness is strong, impulsion, It is weak in mind.Passive behavior: safe, calm, self-consciousness is weaker, self compares satisfaction.It is neurotic mainly to reflect individual The tendency of unhappy mood is embodied to things, and can reflect the case where personal mood rises and falls, to the control force ratio of impulsion It is poor, it is easy to produce the attitude of bored surrounding.

Extropism (Extraversion): actively performance: it is export-oriented, enthusiastic, energetic, like stimulation things, like handing over Friend is easy to produce positive mood.Passive behavior: it is bad to link up, is bad to express oneself emotion, is serious.Extropism Mainly reflect in the self-confidence of individual, like speaking, sociable and love life performance, and can actively seek Positive mood.

Open (Openness): actively performance: imagination abundant, dare to attempt fresh things, sense at esthetic sentiment By very abundant, to have serious hope to knowledge, values be open.Passive behavior: more pragmatic, compliance is satisfied with the existing state of affairs relatively, is obeyed Agreement.The open curiosity for mainly reflecting people's external world, to having deep love for for life and liking for fangle.

Biddability (Compliance): actively performance: it is believed that others, the people that obeys others' opinion, be ready to help others, be It is sincere, compliance, sympathetic.Passive behavior: mercilessness is suspected, ridicules other people, rebellion.Biddability mainly reflects people Trust between people, rather than to other people suspection and to other people handle without mittens, while embodying whether individual is ready to help Other people standards of measurement.

Preciseness (Conscientiousness): actively performance: it is rigorous for things treat, coherent processing, from Letter is responsible for, is self-discipline, sense of accomplishment, careful.Passive behavior: it is unordered, it has a weak will, it is careless.Careful major embodiment individual does things Tendency, oneself is shown to restrain oneself, the comparison that gets down to the job is carefully, be filled with unbounded confidence for the ability of oneself.

Nowadays it in terms of the personality prediction to people, is mainly test by psychology scale, is predicted people to a system The problem of column, answers, and scores according still further to certain rule, analyzes the personality of people.This mode depends on psychological professional Experience, and labor intensive, time are too many.With the development of natural language processing technique and machine Learning Theory, it is based on textual data An important research topic is had become according to the sentiment analysis for carrying out category of psychology, but current for Chinese social networks text Correlative study and invention it is less.

Summary of the invention

The purpose of the present invention is to provide a kind of social network user personality prediction techniques based on Chinese text analysis, can To be handled and be excavated by the text information issued to social network user, and then its personality composition is analyzed, there is standard The advantages that exactness is high, analysis speed is fast, automation.

In order to achieve the above object, the invention is realized by the following technical scheme:

A kind of social network user personality prediction technique based on Chinese text analysis, characterized in that comprise the steps of:

S1, preliminary treatment is carried out to Chinese social networks text, text is divided into user's basic status information, user interaction Information and user version information three classes；

S2, user version information is pre-processed, obtains the data set D being made of all kinds of words_word；

S3, the text feature of user version information is extracted, based on sentiment dictionary to data set D_wordCarry out part of speech Mark calculates the frequency of occurrences of all kinds of parts of speech in the text, the above three classes text information of Combinatorial Optimization, to be based on expert's scale pair The result that user is test constructs the data set D of numeralization as factual data_comp；

S4, logarithm value data set D_compFeature Engineering is carried out, i.e., feature is screened, obtains predicting for personality Characteristic element collection D_pre；

S5, personality prediction, the characteristic element collection D obtained with step S4 are carried out based on BP neural network training pattern_preIn Feature vector is made with the rate of specific gravity of nervousness, extropism, opening, biddability, preciseness this 5 personalities as mode input For model output, neural network is constructed, training prediction model carries out personality prediction.

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein in the step S1:

When user's basic status information includes the quantity, follower's quantity, bean vermicelli quantity, social networks use of issued state Length averagely issues frequency, to reflect user to the basic service condition of social networks；

User interaction information includes the expression quantity in social networks, topic numbers, number, hop count, to reflect use Family and public topic and good friend's interacts situation；

User version information is the pure language content in text, to reflect the speech habits, expression way and emotion of user Tendency.

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein in the step S2 Pretreatment, which refers to, cleans text data, is segmented and removed stop words:

Described carries out cleaning comprising filtering out the figure in social networks text using canonical matching process to text data The non-textual contents such as field, url network address, emoticon, the transmitting symbol that piece, expression, location information, double " # " are surrounded；

The participle, which refers to, segments user version, and full text information is converted to the set of word；

It is described to go stop words to refer to remove text noise using regular expression, remove text medium-high frequency but without real The stop words of border meaning, stop words include pronoun, auxiliary word and punctuation mark.

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein in the step S3:

Three category information of Combinatorial Optimization refers to: counting quantity, the follower's quantity, number of fans of user's issued state Amount, social networks issue the basic status information of frequency using duration, averagely, count expression quantity, topic in the user version Quantity, secondary number, hop count interactive information；

The described result to be test based on expert's scale to user constructs the data of numeralization as factual data Collect D_compRefer to: user being tested by five-factor model personality expert's scale, is scored according to Expert Rules, calculates five people The respective specific gravity of lattice forms five label datas, obtains word in conjunction with part of speech annotation results and Combinatorial Optimization three classes information result Resistant frequency, basic status information, interactive information constitute the data set D of numeralization_comp。

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein D_compShare 111 spies Sign, corresponding 102 kinds of parts of speech, 5 kinds of user state informations and 4 kinds of interactive informations, the step S4 specifically includes:

S41, the correlation for calculating separately five personalities and every kind of part of speech:

Correlation is measured using Pearson correlation coefficients, its calculation formula is:

Wherein, Cov (X, Y) indicates the covariance of variable X and variable Y, σ_XAnd σ_YIt is the standard of variable X and variable Y respectively Difference,WithRespectively represent variable X and the average value of Y；Herein, X is a certain part of speech feature W in 102 part of speech feature W_i (i=1 ... 102) corresponding data, Y are a certain personality Ch in 5 kinds of personality Ch_j(j=1 ... 5) corresponding data；It takes respectively every Kind personality corresponds to highest preceding 13 parts of speech of Pearson's coefficient, constitutes 5 part of speech set:

Set_j={ W_1...13|Ch_jCorresponding preceding 13 part of speech feature } j=1 ... 5

The set that 13 part of speech feature corresponding to each personality are constituted seeks union, obtains part of speech set Set_W:

Set_W=Set₁∪Set₂∪…∪Set₅

S42, the correlation for calculating separately five personalities and 5 kinds of user state information features and 4 kinds of interactive information features:

Correlation is measured using Pearson correlation coefficients, calculation formula such as formula (1) is described, and herein, X is 9 Xiang Xingte Levy a certain feature F in F_i(i=1 ... 9) corresponding data, Y are a certain personality Ch in 5 kinds of personality Ch_j(j=1 ... 5) is corresponding Data.It takes every kind of personality to correspond to the highest first three items feature of Pearson's coefficient respectively, constitutes 5 characteristic sets:

Set_j={ F_1...3|Ch_jCorresponding preceding 3 part of speech feature } j=1 ... 5

The set that 3 features corresponding to each personality are constituted seeks union, obtains set Set_F:

Set_F=Set₁∪Set₂∪…∪Set₅

S43, the correlation of each part of speech filtered out in step S41 between any two is calculated:

Each part of speech is recalculated in Set_WThe frequency of occurrences in the text of composition；It is measured using Pearson correlation coefficients related Property, calculation formula is as described in S41, herein, X Set_WIn the corresponding data of a certain part of speech feature, Y Set_WIn remove X The corresponding data of a certain part of speech feature outside calculate:

Wherein,For part of speech W_iWith part of speech W_jPearson's coefficient, if certain is big to Pearson's coefficient between part of speech In 0.6, then one therein is rejected, set Set is obtained_Wn；To Set_WnWith Set_FIt asks simultaneously, obtains Set_pre=Set_Wn∪Set_F, then Its corresponding data is merged with five personality label datas, obtains multi-tag multiple target characteristic element collection D_pre。

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein the step S5 tool Body includes:

S51, it is being based on D_preBefore training pattern, first data are normalized, using logarithm normalizing, are calculated public Formula is x=lg (x)/lg (max), and wherein x is characterized value, and max is the maximum value in this feature corresponding data；

S52, tanh function is selected as the excitation function of neuron, be able to maintain its output input in Nonlinear Monotone Relationship is risen and declined, gradient solution, zmodem are met；

S53, over-fitting is prevented using L2 regularization, that is, weight decaying；

S54, be trained and predict using ten folding cross validations, using grid search tune ginseng regularized learning algorithm rate, Dropout rate, epochs and neuronal quantity parameter.

Compared with the prior art, the present invention has the following advantages:

1, data acquisition of the present invention is extremely convenient, under the premise of tested user agrees to, by program automatic collection social network Network text；

2, the present invention is different from traditional expert's scale test Analysis personality, is based on social networks text, and calling trains Model prediction personality, do not depend on psychological professional experience, without spending human and material resources, and there is pinpoint accuracy, time-consuming few The advantages that.

Detailed description of the invention

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is the Feature Engineering flow chart in the embodiment of the present invention；

Fig. 3 is the neural network structure figure in the embodiment of the present invention.

Specific embodiment

The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.

As shown in Figure 1, 2, a kind of social network user personality prediction technique based on Chinese text analysis, characterized in that It comprises the steps of:

S1, preliminary treatment is carried out to Chinese social networks text, text is divided into user's basic status information, user interaction Information and user version information three classes；The social networks text can be the acquisition nearly 1 year textual data issued of user According to social networks text is short text, generally comprises many noises, it is therefore desirable to carry out preliminary treatment；

S2, user version information is pre-processed, obtains the data set D being made of all kinds of words_word；Pretreatment refers to text Notebook data is cleaned, segmented and is gone stop words；

S4, due to intrinsic dimensionality it is excessive, it is therefore desirable to the data set D of logarithm value_compFeature Engineering is carried out, i.e., to feature It is screened, obtains the characteristic element collection D predicted for personality_pre；

In the step S1: user's basic status information includes quantity, the follower's quantity, number of fans of issued state Amount, social networks use duration, averagely issue frequency, to reflect user to the basic service condition of social networks；User interaction Information includes the expression quantity in social networks, topic numbers, number, hop count, to reflect that user becomes reconciled with public topic The interaction situation of friend；User version information is the pure language content in text, with reflect the speech habits of user, expression way and Sentiment orientation.

In the step S2: described to carry out cleaning comprising filtering out social network using canonical matching process to text data The non-texts such as field, url network address, emoticon, the transmitting symbol that picture, expression, location information, double " # " in network text surround This content；The participle, which refers to, segments user version, and full text information is converted to the set of word；Described goes Stop words, which refers to, removes text noise using regular expression, remove text medium-high frequency but the not no stop words of practical significance, Stop words includes pronoun, auxiliary word and punctuation mark.Such as: " upper boudoir honey was invited, again for small long holidays the 5th day to Chinese social networks text It is that a Hu is eaten sea and drunk, it is desirable to the ground of peace and quiet at one is found in the Nanjing Road being crowded with people, it is also not easy, ensconce Guang Hai publishing house In coffee-house bring unexpected peace and quiet ", carry out step S2 processing after become " invited within small long holidays the 5th day boudoir honey one The not easy unexpected peace and quiet of Guang Hai publishing house coffee-house in ground that peace and quiet at the Nanjing Road one that is crowded with people are drunk in sea are eaten recklessly ".

In the step S3: three category information of Combinatorial Optimization refers to: counting quantity, the follower of user's issued state Quantity, bean vermicelli quantity, social networks issue the basic status information of frequency using duration, averagely, count table in the user version Feelings quantity, topic numbers, secondary number, hop count interactive information；It is described to be test based on expert's scale user As a result it is used as factual data, constructs the data set D of numeralization_compRefer to: user being surveyed by five-factor model personality expert's scale Examination, scores according to Expert Rules, calculates five respective specific gravity of personality, five label datas are formed, in conjunction with part-of-speech tagging As a result and Combinatorial Optimization three classes information result obtains the number that part of speech frequency, basic status information, interactive information constitute numeralization According to collection D_comp.In the present embodiment, it is using the process that the five-factor model personality expert scale based on expertise tests user, Totally 60 problem, corresponding 12 problems of each personality, wherein 1/3 entitled negative sense is related, 2/3 entitled positive phase It closes, finally makes scoring according to five kinds of personalities of each user of the code of points of the scale, calculate the institute of every kind of personality score afterwards Accounting weight forms five label datas.In conjunction with before based on sentiment dictionary to D_wordCarry out part-of-speech tagging, Combinatorial Optimization three Part of speech frequency that category information obtains, basic status information, interactive information constitute the data set D of numeralization_comp。

The above-mentioned social network user personality prediction technique based on Chinese text analysis, wherein in the present embodiment, setting D_compShare 111 features, corresponding 102 kinds of parts of speech, 5 kinds of user state informations and 4 kinds of interactive informations, as shown in Fig. 2, described Step S4 specifically includes:

Wherein, Cov (X, Y) indicates the covariance of variable X and variable Y, σ_XAnd σ_YIt is the standard of variable X and variable Y respectively Difference,WithRespectively represent variable X and the average value of Y；Herein, X is a certain part of speech feature W in 102 part of speech feature W_i(i =1 ... 102) corresponding data, Y are a certain personality Ch in 5 kinds of personality Ch_j(j=1 ... 5) corresponding data；Every kind is taken respectively Personality corresponds to highest preceding 13 parts of speech of Pearson's coefficient, constitutes 5 part of speech set:

Set_j={ W_1…13|Ch_jCorresponding preceding 13 part of speech feature } j=1 ... 5

Set_W=Set₁∪Set₂∪…∪Set₅

Such as set Set₁For the corresponding 13 part of speech set of neurotic personality { " present ", " anxiety word ", " mobile word ", " gold Money word ", " religion word ", " dead word ", " mankind's word ", " seeing clearly word ", " cause and effect word ", " dirty word ", " perception course word " " should be with Word ", " body word " }

Such as set Set₂For the corresponding 13 part of speech set of extropism personality, { " friend's word ", " healthy word ", " property word " is " empty Between word ", " leisure word ", " cognition course word ", " visual word ", " feeling word ", " work word ", " love word ", " dirty word " " should be with Word ", " numerical ratio " }

Such as set Set₃For open personality corresponding 13 part of speech set { " present ", " friend's word ", " healthy word ", " property Word ", " mobile word ", " space word ", " mankind's word ", " social process word ", " cognition course word ", " seeing clearly word ", " visual word ", " physiology course word ", " body word " }

Such as set Set₄For the corresponding 13 part of speech set of biddability personality " money word ", " work word ", " love word ", " past ", " cause and effect word ", " including word ", " excluding word ", " sense of hearing word ", " multipurpose word ", " word of ingesting ", " perception course word ", " relative term ", " should and word " }

Such as set Set₅For the corresponding 13 part of speech set of preciseness personality " angry word ", " achievement word ", " money word ", " refering in particular to determine word ", " social process word ", " cognition course word ", " seeing clearly word ", " work word ", " excluding word ", " relative term ", " number Word ratio ", " physiology course word ", " body word " }

To Set₁、Set₂、Set₃、Set₄、Set₅Union is asked to obtain part of speech set Set_W, such as gather { " present ", " friend Word ", " anxiety word ", " angry word ", " healthy word ", " property word ", " mobile word ", " space word ", " achievement word ", " leisure word ", " gold Money word ", " religion word ", " dead word ", " refering in particular to determine word ", " mankind's word ", " social process word ", " cognition course word " " are seen clearly Word ", " visual word ", " feeling word ", " work word ", " love word ", " past ", " cause and effect word ", " including word ", " excluding word " " listens Feel word ", " dirty word ", " multipurpose word ", " word of ingesting ", " perception course word ", " relative term ", " should and word ", " numerical ratio ", " physiology course word ", " body word " }；

Set_j={ F_1…3|Ch_jCorresponding preceding 3 part of speech feature } j=1 ... 5

Set_F=Set₁∪Set₂∪…∪Set₅

Such as set { " issued state number ", " number of fans ", " topic numbers ", " secondary number " }；

Wherein,For part of speech W_iWith part of speech W_jPearson's coefficient, if certain is big to Pearson's coefficient between part of speech In 0.6, then one therein is rejected, set Set is obtained_Wn；

As { " present ", " friend's word ", " anxiety word ", " angry word ", " healthy word ", " property word ", " mobile word " is " empty for set Between word ", " achievement word ", " leisure word ", " money word ", " religion word ", " dead word ", " refering in particular to determine word ", " mankind's word " " recognizes Course word ", " seeing clearly word ", " visual word ", " feeling word ", " work word ", " love word ", " past ", " cause and effect word " " includes Word ", " exclude word ", " sense of hearing word ", " dirty word ", " multipurpose word ", " word of ingesting ", " should and word ", " numerical ratio ", " body Word " }；

To set Set_WnWith set Set_FIt asks simultaneously, obtains set Set_pre=Set_Wn∪Set_F, then by its corresponding data with Five personality label datas merge, and obtain multi-tag multiple target characteristic element collection D_pre。

The step S5 specifically includes to be illustrated in figure 3 corresponding neural network structure figure:

S52, tanh function is selected as the excitation function of neuron, its output input is made to be able to maintain non-Nonlinear Monotone Raising and lowering relationship meets gradient solution, zmodem；

Over-fitting is prevented using L2 regularization, i.e. weight decays, and refers to plus a regularization term after cost function, can To obtain:

Wherein, j is since 1.It is obtained after seeking it local derviation:

Obtain gradient decline formula:

When j is 0, it is believed that the value of λ is 0, when not having regularization, coefficient θ_jWeight be 1, and it is present:

Weight is decayed.According to "ockham's razor" rule, smaller weight indicates that the complexity of network is lower, logarithm According to fitting it is also more preferable.

S54, it is trained and predicts using ten folding cross validations, data set D is divided into the similar mutual exclusion of 10 sizes Subset, i.e. D₁∪D₂∪…∪D₁₀,Each subset D_iAll therefrom stratified sampling obtains, to guarantee number According to the consistency of distribution.Use the union of k-1 subset as training set every time, remaining subset is as test set.Using grid Ginseng regularized learning algorithm rate, dropout rate, epochs and neuronal quantity parameter are adjusted in search.

It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims

1. a kind of social network user personality prediction technique based on Chinese text analysis, which is characterized in that comprise the steps of:

S3, the text feature of user version information is extracted, based on sentiment dictionary to data set D_wordPart-of-speech tagging is carried out, Calculate the frequency of occurrences of all kinds of parts of speech in the text, the above three classes text information of Combinatorial Optimization, to be based on expert's scale to user The result test constructs the data set D of numeralization as factual data_comp；

S4, logarithm value data set D_compFeature Engineering is carried out, i.e., feature is screened, the spy predicted for personality is obtained Levy element collection D_pre；

S5, personality prediction, the characteristic element collection D obtained with step S4 are carried out based on BP neural network training pattern_preIn feature Vector as mode input, using nervousness, extropism, opening, biddability, preciseness this 5 personalities rate of specific gravity as mould Type output, constructs neural network, and training prediction model carries out personality prediction.

2. the social network user personality prediction technique as described in claim 1 based on Chinese text analysis, which is characterized in that In the step S1:

User's basic status information include the quantity of issued state, follower's quantity, bean vermicelli quantity, social networks using duration, Frequency is issued, averagely to reflect user to the basic service condition of social networks；

User interaction information include social networks in expression quantity, topic numbers, number, hop count, with reflect user with The interaction situation of public topic and good friend；

User version information is the pure language content in text, to reflect the speech habits, expression way and Sentiment orientation of user.

3. the social network user personality prediction technique as claimed in claim 2 based on Chinese text analysis, which is characterized in that Pretreatment in the step S2, which refers to, cleans text data, is segmented and removed stop words:

Described carries out cleaning comprising filtering out picture, table in social networks text using canonical matching process to text data The non-textual contents such as field, url network address, emoticon, the transmitting symbol that feelings, location information, double " # " are surrounded；

It is described to go stop words to refer to remove text noise using regular expression, remove text medium-high frequency but without practical meaning The stop words of justice, stop words include pronoun, auxiliary word and punctuation mark.

4. the social network user personality prediction technique as claimed in claim 3 based on Chinese text analysis, which is characterized in that In the step S3:

Three category information of Combinatorial Optimization refers to: counting the quantity, follower's quantity, bean vermicelli quantity, society of user's issued state The basic status information handed over Web vector graphic duration, averagely issue frequency, counts expression quantity, topic numbers ,@in the user version The interactive information of number, hop count；

The described result to be test based on expert's scale to user constructs the data set of numeralization as factual data D_compRefer to: user being tested by five-factor model personality expert's scale, is scored according to Expert Rules, calculates five Xiang Renge Respective specific gravity forms five label datas, obtains part of speech in conjunction with part of speech annotation results and Combinatorial Optimization three classes information result Frequency, basic status information, interactive information constitute the data set D of numeralization_comp。

5. the social network user personality prediction technique as claimed in claim 4 based on Chinese text analysis, which is characterized in that D_comp111 features are shared, corresponding 102 kinds of parts of speech, 5 kinds of user state informations and 4 kinds of interactive informations, the step S4 are specific Include:

Wherein, Cov (X, Y) indicates the covariance of variable X and variable Y, σ_XAnd σ_YIt is the standard deviation of variable X and variable Y respectively, WithRespectively represent variable X and the average value of Y；Herein, X is a certain part of speech feature W in 102 part of speech feature W_i(i=1 ... 102) corresponding data, Y are a certain personality Ch in 5 kinds of personality Ch_j(j=1 ... 5) corresponding data；Every kind of personality is taken respectively Corresponding highest preceding 13 parts of speech of Pearson's coefficient, constitute 5 part of speech set:

Set_W=Set₁USet₂U…USet₅

Correlation is measured using Pearson correlation coefficients, calculation formula such as formula (1) is described, and herein, X is in 9 property feature F A certain feature F_i(i=1 ... 9) corresponding data, Y are a certain personality Ch in 5 kinds of personality Ch_j(j=1 ... 5) corresponding number According to.It takes every kind of personality to correspond to the highest first three items feature of Pearson's coefficient respectively, constitutes 5 characteristic sets:

Set_F=Set₁USet₂U…USet₅

Each part of speech is recalculated in Set_WThe frequency of occurrences in the text of composition；Correlation is measured using Pearson correlation coefficients, Calculation formula is as described in S41, herein, X Set_WIn the corresponding data of a certain part of speech feature, Y Set_WIn in addition to X certain The corresponding data of item part of speech feature, that is, calculate:

Wherein,For part of speech W_iWith part of speech W_jPearson's coefficient, if certain to Pearson's coefficient between part of speech be greater than 0.6, One therein is then rejected, set Set is obtained_wn；To Set_wnWith Set_FIt asks simultaneously, obtains Set_pre=Set_wn∪Set_F, then its is right The data answered merge with five personality label datas, obtain multi-tag multiple target characteristic element collection D_pre。

6. the social network user personality prediction technique as claimed in claim 5 based on Chinese text analysis, which is characterized in that The step S5 specifically includes:

S51, it is being based on D_preBefore training pattern, first data are normalized, using logarithm normalizing, its calculation formula is x =lg (x)/lg (max), wherein x is characterized value, and max is the maximum value in this feature corresponding data；

S52, select tanh function as the excitation function of neuron, make its output input be able to maintain Nonlinear Monotone rise and Decline relationship meets gradient solution, zmodem；

S54, it is trained and predicts using ten folding cross validations, regularized learning algorithm rate, dropout are joined using grid search tune Rate, epochs and neuronal quantity parameter.