CN107861936A - The polarity probability analysis method and device of sentence - Google Patents

The polarity probability analysis method and device of sentence Download PDF

Info

Publication number
CN107861936A
CN107861936A CN201610859095.1A CN201610859095A CN107861936A CN 107861936 A CN107861936 A CN 107861936A CN 201610859095 A CN201610859095 A CN 201610859095A CN 107861936 A CN107861936 A CN 107861936A
Authority
CN
China
Prior art keywords
dimension
term vector
sentence
current statement
polarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610859095.1A
Other languages
Chinese (zh)
Inventor
金戈
张�杰
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201610859095.1A priority Critical patent/CN107861936A/en
Publication of CN107861936A publication Critical patent/CN107861936A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of polarity probability analysis method of sentence, methods described includes:Stammerer participle is carried out to the current statement, obtains each word;Feature extraction is carried out to each word, obtains corresponding each term vector;Average value corresponding to calculating each term vector, obtain the dimension of the current statement;The dimension of the current statement is added in training pattern, to determine the polarity probability of the current statement according to the training pattern.The invention also discloses a kind of polarity probability analysis device of sentence.The dimension of the first term vector computing statement according to sentence of the invention, the polarity probability of sentence is determined in conjunction with training pattern, improve the accuracy of sentence polarity probability analysis.

Description

The polarity probability analysis method and device of sentence
Technical field
The present invention relates to the polarity probability analysis method and device of Sentence analysis technical field, more particularly to a kind of sentence.
Background technology
With the development of the communication technology, the life of people is increasingly dependent on network, and at present, people step out or stopped In the spare time, some amusement place, food and drink, entertainment selections etc. can be all ordered on network, after consumption, can typically be carried out in comment area Comment.And businessman can make and being correspondingly improved according to the comment of consumer.
However, the polarity (including positive pole and negative pole) of sentence is analyzed at present, typically according to the key of sentence Word is analyzed, and is positive pole or negative pole according to keyword, to determine the polarity of sentence, it is clear that the analysis of this sentence polarity Mode, it is excessively general and simple, cause the analysis of sentence polarity not accurate enough.
The content of the invention
It is a primary object of the present invention to provide the polarity probability analysis method and device of a kind of sentence, it is intended to solve existing Sentence polarity analysis mode, excessively general and simple, the not accurate enough technical problem of the analysis of sentence polarity.
To achieve the above object, the polarity probability analysis method of a kind of sentence provided by the invention, the polarity of the sentence Probability analysis method includes:
Stammerer participle is carried out to the current statement, obtains each word;
Feature extraction is carried out to each word, obtains corresponding each term vector;
Average value corresponding to calculating each term vector, obtain the dimension of the current statement;
The dimension of the current statement is added in training pattern, to determine the current language according to the training pattern The polarity probability of sentence.
Preferably, it is described calculate each term vector corresponding to average value, the step of obtaining the dimension of the current statement wraps Include:
Obtain in the current statement, each dimension numerical value of each term vector;
Each dimension numerical value of same position in each term vector is added, obtain each dimension and;
By each dimension and divided by term vector number, obtain the dimension of the current statement.
Preferably, it is described calculate each term vector corresponding to average value, the step of obtaining the dimension of the current statement it Before, the polarity probability analysis method of the sentence also includes:
The dimension of each term vector in the current statement is reduced using PCA.
Preferably, the described the step of dimension of each term vector in the current statement is reduced using PCA it Before, the polarity probability analysis method also includes:
Each term vector in the current statement is standardized.
In addition, to achieve the above object, the present invention also provides a kind of polarity probability analysis device of sentence, the sentence Polarity probability analysis device includes:
Word-dividing mode, for carrying out stammerer participle to the current statement, obtain each word;
Characteristic extracting module, for carrying out feature extraction to each word, obtain corresponding each term vector;
Computing module, for average value corresponding to calculating each term vector, obtain the dimension of the current statement;
Determining module, for the dimension of the current statement to be added in training pattern, with according to the training pattern Determine the polarity probability of the current statement.
Preferably, the computing module includes:
Acquiring unit, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit, for each dimension numerical value of same position in each term vector to be added, obtain each dimension Degree and;
Be divided by unit, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
Preferably, the polarity probability analysis device of the sentence also includes:
Module is reduced, for reducing the dimension of each term vector in the current statement using PCA.
Preferably, the polarity probability analysis device of the sentence also includes:
Processing module, for being standardized to each term vector in the current statement.
The polarity probability analysis method and device of sentence proposed by the present invention, stammerer point is first carried out to the current statement Word, each word is obtained, feature extraction then is carried out to each word, obtain corresponding each term vector, then calculate each word Average value corresponding to vector, to obtain the dimension of the current statement, most the dimension of the current statement is added to training at last In model, to determine the polarity probability of the current statement according to the training pattern, the present invention is first according to the term vector of sentence The dimension of computing statement, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide compared with For detailed analysis foundation, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence The accuracy of polarity probability analysis.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the polarity probability analysis method first embodiment of sentence of the present invention;
Fig. 2 is average value corresponding to each term vector of present invention calculating, obtains the refinement stream of the dimension of the current statement Journey schematic diagram;
Fig. 3 is the schematic flow sheet of the polarity probability analysis method second embodiment of sentence of the present invention;
Fig. 4 is the schematic flow sheet of the polarity probability analysis method 3rd embodiment of sentence of the present invention;
Fig. 5 is the schematic diagram being standardized in the present invention to term vector;
Fig. 6 is the high-level schematic functional block diagram of the polarity probability analysis device first embodiment of sentence of the present invention;
Fig. 7 is the refinement high-level schematic functional block diagram of computing module in Fig. 6;
Fig. 8 is the high-level schematic functional block diagram of the polarity probability analysis device second embodiment of sentence of the present invention;
Fig. 9 is the high-level schematic functional block diagram of the polarity probability analysis device 3rd embodiment of sentence of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of polarity probability analysis method of sentence.
Reference picture 1, Fig. 1 are the schematic flow sheet of the polarity probability analysis method first embodiment of sentence of the present invention.
In the present embodiment, the polarity probability analysis method of the sentence includes:
Step S10, stammerer participle is carried out to the current statement, obtains each word.
Step S20, feature extraction is carried out to each word, obtain corresponding each term vector.
In the present embodiment, first sentence to be analyzed is obtained from statement library, and using the sentence of acquisition as current statement, Then stammerer participle is carried out to the current statement, obtains each word, the mode of the stammerer participle is exemplified below:
If current statement is:Standard room is too poor, room might as well 3 stars, and facility extraordinary obsolescence, it is proposed that hotel is old Standard room improve again.
After stammerer participle, obtained each word is:
Standard room/too/difference// room/also/not as/3/ star///and/facility/very/outmoded// suggestion/wine Shop// old// standard room/weight/it is new/improve/.
In the present embodiment, before stammerer participle is carried out to the current statement, in addition to:
Both positive and negative polarity mark is carried out to the current statement, wherein, just highly preferred to be represented with 1, negative pole is represented with 0, should be managed Solution, carries out both positive and negative polarity mark to the current statement, convenient subsequently according to training pattern, determines the pole of the current statement Property probability, is specifically described below.
After each word is obtained, feature extraction is carried out to each word, to obtain corresponding each term vector.In this reality Apply in example, using default instrument, such as word2vec carries out feature extraction to each word, with each word corresponding to obtaining to Amount, it should be appreciated that feature extraction is carried out to each word by word2vec, the dimension of obtained each term vector is 400。
To be best understood from, it is exemplified below:
If word is " hello ", feature extraction is carried out to the word by the word2vec, obtained term vector may For:
[-1.12e+00, 1.08e-01..., 1.69e-01, 1.17e+00]
Wherein, e+00=e0=1, e-01=e-1=1/e, because the dimension of the word is 400, therefore, in above-mentioned term vector Eliminate 396.It should be appreciated that the numerical value in above-mentioned term vector is merely exemplary, specific numerical value be by sentence, What the number of word and the implication of word determined.
Step S30, calculate each term vector corresponding to average value, obtain the dimension of the current statement.
After each term vector is obtained, average value corresponding to each term vector is calculated, to obtain the current statement Dimension, specifically, reference picture 2, the step S30 include:
Step S31, obtain in the current statement, each dimension numerical value of each term vector;
Step S32, each dimension numerical value of same position in each term vector is added, obtain each dimension and;
Step S33, by each dimension and divided by term vector number, obtain the dimension of the current statement.
First, in the current statement, each dimension numerical value of each term vector is obtained, by taking the above as an example, When word is " hello ", corresponding term vector is 400 dimensions, then, the dimension numerical value of the term vector just has 400.
After each dimension numerical value of each term vector is got, by each number of dimensions of same position in each term vector Value is added, obtain each dimension and, that is to say, that each word is all 400 dimensions by the term vector after feature extraction, So each dimension numerical value of same position in each term vector is added, still obtains the data of 400 dimensions, now will be each Individual dimension and divided by term vector number, you can obtain the dimension of the current statement.
To be best understood from, it is exemplified below:There are two words, respectively A and B in current statement, wherein,
The term vector that word A is obtained after feature extraction is [- 0.3e+00, 1.2e-01..., 5.1e-01, 3.1e+00];
The term vector that word B is obtained after feature extraction is [- 0.5e+00, 0.2e-01..., 3.2e-01, 0.1e+00];
So, by word A term vector and word B term vector, the dimension numerical value of same position is added, and is obtained Each dimension and be:
(-0.3e+00)+(-0.5e+00)=- 0.8e+00
1.2e-01+0.2e-01=1.4e-01
5.1e-01+3.2e-01=8.3e-01
3.1e+00+0.1e+00=3.2e+00
Now, then by each dimension and divided by term vector number, due to term vector number be 2, you can obtain institute State the dimension of current statement:
[-0.4e+00, 1.7e-01..., 4.15e-01, 1.6e+00]。
Step S40, the dimension of the current statement is added in training pattern, to determine institute according to the training pattern State the polarity probability of current statement.
After the dimension of the current statement is obtained, you can obtain training pattern, the training pattern in the present embodiment It is neural network training model, such as SVM (Support Vector Machine, SVMs) training pattern, is getting After training pattern, the current statement is added in the training pattern, it is described current to be determined according to the training pattern The polarity probability of sentence.Specifically, the polarity probability for determining the current statement according to the training pattern is:
By the dimension of the current statement, the higher dimensional space of the training pattern is mapped to, and is sought in the higher dimensional space The hyperplane of polarization is looked for, the hyperplane searched out is then cut into each sub- plane, wherein, every sub- plane is all The probability of positive pole 1 and negative pole 0 is corresponding with, finally, the positive pole probability of every sub- plane is added, you can obtain described super flat The positive pole probability in face, accordingly, the positive pole probability of the current statement is also just obtained, subtracts positive pole probability with 100% accordingly, It is also theorized that the negative pole probability of the current statement.It is of course also possible to after first calculating negative pole probability, then derive just Pole probability, is not limited herein.
In the present embodiment, in order to improve the accuracy of training pattern, preferably included in the training pattern training set and Test set, the test set are used to test to the training pattern, and the ratio of training set and test set can be as the case may be It is configured, in the present embodiment, preferably the ratio of training set and test set is 8:2.
The polarity probability analysis method for the sentence that the present embodiment proposes, first carries out stammerer participle to the current statement, obtains To each word, feature extraction then is carried out to each word, obtains corresponding each term vector, then calculate each term vector pair The average value answered, to obtain the dimension of the current statement, most the dimension of the current statement is added in training pattern at last, To determine the polarity probability of the current statement according to the training pattern, the present invention is first according to the term vector computing statement of sentence Dimension, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide it is more detailed Foundation is analyzed, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence polarity probability The accuracy of analysis.
Further, in order to improve the efficiency of the polarity probability analysis of sentence, language of the present invention is proposed based on first embodiment The second embodiment of the polarity probability analysis method of sentence, in the present embodiment, reference picture 3, before the step S30, institute's predicate The polarity probability analysis method of sentence also includes:
Step S50, the dimension of each term vector in the current statement is reduced using PCA.
It should be appreciated that in the first embodiment, feature extraction is carried out to each word, obtained each term vector is all It is 400 dimensions, is handled according to each term vector of 400 dimensions, the dimension of the sentence finally given is also 400 dimensions, then is added It is added to when being calculated in the training pattern, because dimension is too high, causes amount of calculation very big so that sentence polarity probability The time that analysis process is consumed is also more, so as to reduce computational efficiency.
Therefore, in the present embodiment, using PCA ((Principal Component Analysis, PCA)) Reduce the dimension of each term vector in the current statement, by each term vectors of 400 dimensions be all reduced to the words of 100 dimensions to Amount, then, the dimension of the sentence subsequently calculated is also only 100 dimensions, and amount of calculation reduces, correspondingly, polarity probability analysis The efficiency of process also improves.It should be appreciated that each term vector in the current statement is reduced using PCA Dimension, although the dimension of each term vector reduces, each term vector almost term vector comprising original dimension it is interior Hold, bag content ratio is up to 99%, and therefore, the present embodiment reduces the dimension of each term vector using PCA, will not Influence follow-up result of calculation.
In the present embodiment, the dimension of each term vector in the current statement is reduced using PCA, was both protected The accuracy of sentence polarity probability analysis has been demonstrate,proved, has more improved the efficiency of polarity probability analysis.
Further, in order to improve the accuracy of the polarity probability analysis of sentence, the present invention is proposed based on second embodiment The 3rd embodiment of the polarity probability analysis method of sentence, in the present embodiment, reference picture 4 are described before the step S50 The polarity probability analysis method of sentence also includes:
Step S60, each term vector in the current statement is standardized.
It was found from content above, each word is when calculating term vector, each dimension numerical value in each term vector It is different, and, it is likely that there is the situation that some dimension numerical value differ larger, in that way it is possible to which final calculating can be caused As a result it is inaccurate.
Therefore, in the present embodiment, before the dimension of each term vector is reduced using PCA, first to each word Vector is standardized, and the standardization is actually to carry out normal state to each term vector, specifically be can refer to Fig. 5 so that in each term vector after processing, the average often tieed up is 0, standard deviation 1, after standardization, each word It will not differ too big between the dimension of vector, subsequently, the polarity probability of computing statement is also more accurate.
The present invention further provides a kind of polarity probability analysis device of sentence.
Reference picture 6, Fig. 6 are the functional module signal of the first embodiment of polarity probability analysis device 100 of sentence of the present invention Figure.
It is emphasized that it will be apparent to those skilled in the art that functional block diagram shown in Fig. 6 is only one preferably real Apply the exemplary plot of example, function mould of the those skilled in the art around the polarity probability analysis device 100 of the sentence shown in Fig. 6 Block, the supplement of new functional module can be carried out easily;The title of each functional module is self-defined title, and being only used for auxiliary understanding should Each program function block of the polarity probability analysis device 100 of sentence, is not used in restriction technical scheme, skill of the present invention The core of art scheme is the function to be reached of functional module of each self-defined title.
In the present embodiment, the polarity probability analysis device 100 of the sentence includes:
Word-dividing mode 10, for carrying out stammerer participle to the current statement, obtain each word.
Characteristic extracting module 20, for carrying out feature extraction to each word, obtain corresponding each term vector.
In the present embodiment, first sentence to be analyzed is obtained from statement library, and using the sentence of acquisition as current statement, Then word-dividing mode 10 carries out stammerer participle to the current statement, obtains each word, and the mode of the stammerer participle is illustrated It is as follows:
If current statement is:Standard room is too poor, room might as well 3 stars, and facility extraordinary obsolescence, it is proposed that hotel is old Standard room improve again.
After stammerer participle, obtained each word is:
Standard room/too/difference// room/also/not as/3/ star///and/facility/very/outmoded// suggestion/wine Shop// old// standard room/weight/it is new/improve/.
In the present embodiment, before stammerer participle is carried out to the current statement, in addition to:
Both positive and negative polarity mark is carried out to the current statement, wherein, just highly preferred to be represented with 1, negative pole is represented with 0, should be managed Solution, carries out both positive and negative polarity mark to the current statement, convenient subsequently according to training pattern, determines the pole of the current statement Property probability, is specifically described below.
After each word is obtained, characteristic extracting module 20 carries out feature extraction to each word, corresponding each to obtain Individual term vector.In the present embodiment, feature extraction is carried out to each word using default instrument, such as word2vec, to obtain Corresponding each term vector, it should be appreciated that feature extraction, obtained each word are carried out to each word by word2vec The dimension of vector is 400.
To be best understood from, it is exemplified below:
If word is " hello ", feature extraction is carried out to the word by the word2vec, obtained term vector may For:
[-1.12e+00, 1.08e-01..., 1.69e-01, 1.17e+00]
Wherein, e+00=e0=1, e-01=e-1=1/e, because the dimension of the word is 400, therefore, in above-mentioned term vector Eliminate 396.It should be appreciated that the numerical value in above-mentioned term vector is merely exemplary, specific numerical value be by sentence, What the number of word and the implication of word determined.
Computing module 30, for average value corresponding to calculating each term vector, obtain the dimension of the current statement.
After each term vector is obtained, computing module 30 calculates average value corresponding to each term vector, described to obtain The dimension of current statement, specifically, reference picture 7, the computing module 30 include:
Acquiring unit 31, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit 32, for each dimension numerical value of same position in each term vector to be added, obtain each Dimension and;
Be divided by unit 33, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
First, acquiring unit 31 obtains each dimension numerical value of each term vector, with above-mentioned interior in the current statement Exemplified by appearance, when word is " hello ", corresponding term vector is 400 dimensions, then, the dimension numerical value of the term vector just has 400.
After acquiring unit 31 gets each dimension numerical value of each term vector, addition unit 32 is by each term vector Each dimension numerical value of same position is added, obtain each dimension and, that is to say, that after each word is by feature extraction Term vector be all 400 dimensions, then each dimension numerical value of same position in each term vector is added, still obtained 400 dimension data, now by each dimension and divided by term vector number, you can obtain the dimension of the current statement.
To be best understood from, it is exemplified below:There are two words, respectively A and B in current statement, wherein,
The term vector that word A is obtained after feature extraction is [- 0.3e+00, 1.2e-01..., 5.1e-01, 3.1e+00];
The term vector that word B is obtained after feature extraction is [- 0.5e+00, 0.2e-01..., 3.2e-01, 0.1e+00];
So, by word A term vector and word B term vector, the dimension numerical value of same position is added, and is obtained Each dimension and be:
(-0.3e+00)+(-0.5e+00)=- 0.8e+00
1.2e-01+0.2e-01=1.4e-01
5.1e-01+3.2e-01=8.3e-01
3.1e+00+0.1e+00=3.2e+00
Now, then by each dimension and divided by term vector number, due to term vector number be 2, you can obtain institute State the dimension of current statement:
[-0.4e+00, 1.7e-01..., 4.15e-01, 1.6e+00]。
Determining module 40, for the dimension of the current statement to be added in training pattern, with according to the training mould Type determines the polarity probability of the current statement.
After the dimension of the current statement is obtained, you can obtain training pattern, the training pattern in the present embodiment It is neural network training model, such as SVM (Support Vector Machine, SVMs) training pattern, is getting After training pattern, the current statement is added in the training pattern by determining module 40, with true according to the training pattern The polarity probability of the fixed current statement.Specifically, the polarity probability for determining the current statement according to the training pattern is:
By the dimension of the current statement, the higher dimensional space of the training pattern is mapped to, and is sought in the higher dimensional space The hyperplane of polarization is looked for, the hyperplane searched out is then cut into each sub- plane, wherein, every sub- plane is all The probability of positive pole 1 and negative pole 0 is corresponding with, finally, the positive pole probability of every sub- plane is added, you can obtain described super flat The positive pole probability in face, accordingly, the positive pole probability of the current statement is also just obtained, subtracts positive pole probability with 100% accordingly, It is also theorized that the negative pole probability of the current statement.It is of course also possible to after first calculating negative pole probability, then derive just Pole probability, is not limited herein.
In the present embodiment, in order to improve the accuracy of training pattern, preferably included in the training pattern training set and Test set, the test set are used to test to the training pattern, and the ratio of training set and test set can be as the case may be It is configured, in the present embodiment, preferably the ratio of training set and test set is 8:2.
The polarity probability analysis device 100 for the sentence that the present embodiment proposes, first carries out stammerer participle to the current statement, Each word is obtained, feature extraction then is carried out to each word, obtains corresponding each term vector, then calculate each term vector Corresponding average value, to obtain the dimension of the current statement, most the dimension of the current statement is added to training pattern at last In, to determine the polarity probability of the current statement according to the training pattern, the present invention first calculates according to the term vector of sentence The dimension of sentence, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide it is more detailed Thin analysis foundation, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence polarity The accuracy of probability analysis.
Further, in order to improve the efficiency of the polarity probability analysis device of sentence, this hair is proposed based on first embodiment The second embodiment of the polarity probability analysis device 100 of plain language sentence, in the present embodiment, reference picture 8, the polarity of the sentence is general Rate analytical equipment 100 also includes:
Module 50 is reduced, for reducing the dimension of each term vector in the current statement using PCA.
It should be appreciated that in the first embodiment, feature extraction is carried out to each word, obtained each term vector is all It is 400 dimensions, is handled according to each term vector of 400 dimensions, the dimension of the sentence finally given is also 400 dimensions, then is added It is added to when being calculated in the training pattern, because dimension is too high, causes amount of calculation very big so that sentence polarity probability The time that analysis process is consumed is also more, so as to reduce computational efficiency.
Therefore, in the present embodiment, module 50 is reduced using PCA ((Principal Component Analysis, PCA)) dimension of each term vector in the current statement is reduced, each term vectors of 400 dimensions are all reduced For the term vector of 100 dimensions, then, the dimension of the sentence subsequently calculated is also only 100 dimensions, and amount of calculation reduces, correspondingly, The efficiency of polarity probability analysis process also improves.It should be appreciated that the current statement is reduced using PCA In each term vector dimension, although the dimension of each term vector reduces, each term vector almost includes original dimension Term vector content, bag content ratio is up to 99%, and therefore, the present embodiment reduces each term vector using PCA Dimension, do not interfere with follow-up result of calculation.
In the present embodiment, the dimension of each term vector in the current statement is reduced using PCA, was both protected The accuracy of sentence polarity probability analysis has been demonstrate,proved, has more improved the efficiency of polarity probability analysis.
Further, in order to improve the accuracy of the polarity probability analysis device of sentence, this is proposed based on second embodiment The 3rd embodiment of the polarity probability analysis device 100 of invention sentence, in the present embodiment, reference picture 9, the polarity of the sentence Probability analysis device 100 also includes:
Processing module 60, for being standardized to each term vector in the current statement.
It was found from content above, each word is when calculating term vector, each dimension numerical value in each term vector It is different, and, it is likely that there is the situation that some dimension numerical value differ larger, in that way it is possible to which final calculating can be caused As a result it is inaccurate.
Therefore, in the present embodiment, before the dimension of each term vector is reduced using PCA, processing module 60 First each term vector is standardized, the standardization is actually to carry out normal state to each term vector, Specifically it can refer to Fig. 5 so that in each term vector after processing, the average often tieed up is 0, standard deviation 1, passes through standardization Afterwards, it will not differ too big between the dimension of each term vector, subsequently, the polarity probability of computing statement is also more accurate.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those key elements, and And also include the other key elements being not expressly set out, or also include for this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Other identical element also be present in the process of key element, method, article or device.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, computer, clothes Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (8)

1. the polarity probability analysis method of a kind of sentence, it is characterised in that the polarity probability analysis method of the sentence includes:
Stammerer participle is carried out to the current statement, obtains each word;
Feature extraction is carried out to each word, obtains corresponding each term vector;
Average value corresponding to calculating each term vector, obtain the dimension of the current statement;
The dimension of the current statement is added in training pattern, to determine the current statement according to the training pattern Polarity probability.
2. the polarity probability analysis method of sentence as claimed in claim 1, it is characterised in that described to calculate each term vector pair The average value answered, the step of obtaining the dimension of the current statement, include:
Obtain in the current statement, each dimension numerical value of each term vector;
Each dimension numerical value of same position in each term vector is added, obtain each dimension and;
By each dimension and divided by term vector number, obtain the dimension of the current statement.
3. the polarity probability analysis method of sentence as claimed in claim 1 or 2, it is characterised in that it is described calculate each word to Average value corresponding to amount, before the step of obtaining the dimension of the current statement, the polarity probability analysis method of the sentence is also Including:
The dimension of each term vector in the current statement is reduced using PCA.
4. the polarity probability analysis method of sentence as claimed in claim 3, it is characterised in that described to use PCA Before the step of reducing the dimension of each term vector in the current statement, the polarity probability analysis method also includes:
Each term vector in the current statement is standardized.
5. the polarity probability analysis device of a kind of sentence, it is characterised in that the polarity probability analysis device of the sentence includes:
Word-dividing mode, for carrying out stammerer participle to the current statement, obtain each word;
Characteristic extracting module, for carrying out feature extraction to each word, obtain corresponding each term vector;
Computing module, for average value corresponding to calculating each term vector, obtain the dimension of the current statement;
Determining module, for the dimension of the current statement to be added in training pattern, to be determined according to the training pattern The polarity probability of the current statement.
6. the polarity probability analysis device of sentence as claimed in claim 5, it is characterised in that the computing module includes:
Acquiring unit, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit, for each dimension numerical value of same position in each term vector to be added, obtain each dimension and;
Be divided by unit, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
7. the polarity probability analysis device of the sentence as described in claim 5 or 6, it is characterised in that the polarity of the sentence is general Rate analytical equipment also includes:
Module is reduced, for reducing the dimension of each term vector in the current statement using PCA.
8. the polarity probability analysis device of sentence as claimed in claim 7, it is characterised in that the polarity probability point of the sentence Analysis apparatus also includes:
Processing module, for being standardized to each term vector in the current statement.
CN201610859095.1A 2016-09-28 2016-09-28 The polarity probability analysis method and device of sentence Pending CN107861936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610859095.1A CN107861936A (en) 2016-09-28 2016-09-28 The polarity probability analysis method and device of sentence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610859095.1A CN107861936A (en) 2016-09-28 2016-09-28 The polarity probability analysis method and device of sentence

Publications (1)

Publication Number Publication Date
CN107861936A true CN107861936A (en) 2018-03-30

Family

ID=61698844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610859095.1A Pending CN107861936A (en) 2016-09-28 2016-09-28 The polarity probability analysis method and device of sentence

Country Status (1)

Country Link
CN (1) CN107861936A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408809A (en) * 2018-09-25 2019-03-01 天津大学 A kind of sentiment analysis method for automobile product comment based on term vector

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573046A (en) * 2015-01-20 2015-04-29 成都品果科技有限公司 Comment analyzing method and system based on term vector
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
WO2016105803A1 (en) * 2014-12-24 2016-06-30 Intel Corporation Hybrid technique for sentiment analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016105803A1 (en) * 2014-12-24 2016-06-30 Intel Corporation Hybrid technique for sentiment analysis
CN104573046A (en) * 2015-01-20 2015-04-29 成都品果科技有限公司 Comment analyzing method and system based on term vector
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁世飞编著: "支持向量机(SVM)", 《高级人工智能》 *
张冬雯 等: "基于word2vec和SVMperf的中文评论情感分类研究", 《计算机科学》 *
靳晓强: "英文冠词纠错方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408809A (en) * 2018-09-25 2019-03-01 天津大学 A kind of sentiment analysis method for automobile product comment based on term vector

Similar Documents

Publication Publication Date Title
US10657969B2 (en) Identity verification method and apparatus based on voiceprint
CN108399409A (en) Image classification method, device and terminal
US20150169942A1 (en) Terminal configuration method and terminal
CN107544982A (en) Text message processing method, device and terminal
CN112100431B (en) Evaluation method, device and equipment of OCR system and readable storage medium
CN111813910B (en) Customer service problem updating method, customer service problem updating system, terminal equipment and computer storage medium
CN109598414A (en) Risk evaluation model training, methods of risk assessment, device and electronic equipment
CN108269575A (en) Update audio recognition method, terminal installation and the storage medium of voice print database
CN108932646B (en) User tag verification method and device based on operator and electronic equipment
CN109003607B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
CN108509458A (en) A kind of business object recognition methods and device
CN108922520B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
CN108634926A (en) Vision testing method, device, system based on VR technologies and storage medium
CN107291775A (en) The reparation language material generation method and device of error sample
CN111209354A (en) Method and device for judging repetition of map interest points and electronic equipment
CN111507250A (en) Image recognition method, device and storage medium
CN108170785A (en) Bootstrap technique, device and the computer readable storage medium of terminal searching operation
CN107679887A (en) A kind for the treatment of method and apparatus of trade company's scoring
CN107861936A (en) The polarity probability analysis method and device of sentence
CN107015647A (en) User's gender identification method based on smart mobile phone posture behavior big data
CN108595141A (en) Pronunciation inputting method and device, computer installation and computer readable storage medium
CN117332062A (en) Data processing method and related device
CN110443291A (en) A kind of model training method, device and equipment
CN104050168B (en) Information processing method, electronic equipment and dictionary server
CN113555037B (en) Method and device for detecting tampered area of tampered audio and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330

RJ01 Rejection of invention patent application after publication