CN107861936A - The polarity probability analysis method and device of sentence - Google Patents
The polarity probability analysis method and device of sentence Download PDFInfo
- Publication number
- CN107861936A CN107861936A CN201610859095.1A CN201610859095A CN107861936A CN 107861936 A CN107861936 A CN 107861936A CN 201610859095 A CN201610859095 A CN 201610859095A CN 107861936 A CN107861936 A CN 107861936A
- Authority
- CN
- China
- Prior art keywords
- dimension
- term vector
- sentence
- current statement
- polarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of polarity probability analysis method of sentence, methods described includes:Stammerer participle is carried out to the current statement, obtains each word;Feature extraction is carried out to each word, obtains corresponding each term vector;Average value corresponding to calculating each term vector, obtain the dimension of the current statement;The dimension of the current statement is added in training pattern, to determine the polarity probability of the current statement according to the training pattern.The invention also discloses a kind of polarity probability analysis device of sentence.The dimension of the first term vector computing statement according to sentence of the invention, the polarity probability of sentence is determined in conjunction with training pattern, improve the accuracy of sentence polarity probability analysis.
Description
Technical field
The present invention relates to the polarity probability analysis method and device of Sentence analysis technical field, more particularly to a kind of sentence.
Background technology
With the development of the communication technology, the life of people is increasingly dependent on network, and at present, people step out or stopped
In the spare time, some amusement place, food and drink, entertainment selections etc. can be all ordered on network, after consumption, can typically be carried out in comment area
Comment.And businessman can make and being correspondingly improved according to the comment of consumer.
However, the polarity (including positive pole and negative pole) of sentence is analyzed at present, typically according to the key of sentence
Word is analyzed, and is positive pole or negative pole according to keyword, to determine the polarity of sentence, it is clear that the analysis of this sentence polarity
Mode, it is excessively general and simple, cause the analysis of sentence polarity not accurate enough.
The content of the invention
It is a primary object of the present invention to provide the polarity probability analysis method and device of a kind of sentence, it is intended to solve existing
Sentence polarity analysis mode, excessively general and simple, the not accurate enough technical problem of the analysis of sentence polarity.
To achieve the above object, the polarity probability analysis method of a kind of sentence provided by the invention, the polarity of the sentence
Probability analysis method includes:
Stammerer participle is carried out to the current statement, obtains each word;
Feature extraction is carried out to each word, obtains corresponding each term vector;
Average value corresponding to calculating each term vector, obtain the dimension of the current statement;
The dimension of the current statement is added in training pattern, to determine the current language according to the training pattern
The polarity probability of sentence.
Preferably, it is described calculate each term vector corresponding to average value, the step of obtaining the dimension of the current statement wraps
Include:
Obtain in the current statement, each dimension numerical value of each term vector;
Each dimension numerical value of same position in each term vector is added, obtain each dimension and;
By each dimension and divided by term vector number, obtain the dimension of the current statement.
Preferably, it is described calculate each term vector corresponding to average value, the step of obtaining the dimension of the current statement it
Before, the polarity probability analysis method of the sentence also includes:
The dimension of each term vector in the current statement is reduced using PCA.
Preferably, the described the step of dimension of each term vector in the current statement is reduced using PCA it
Before, the polarity probability analysis method also includes:
Each term vector in the current statement is standardized.
In addition, to achieve the above object, the present invention also provides a kind of polarity probability analysis device of sentence, the sentence
Polarity probability analysis device includes:
Word-dividing mode, for carrying out stammerer participle to the current statement, obtain each word;
Characteristic extracting module, for carrying out feature extraction to each word, obtain corresponding each term vector;
Computing module, for average value corresponding to calculating each term vector, obtain the dimension of the current statement;
Determining module, for the dimension of the current statement to be added in training pattern, with according to the training pattern
Determine the polarity probability of the current statement.
Preferably, the computing module includes:
Acquiring unit, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit, for each dimension numerical value of same position in each term vector to be added, obtain each dimension
Degree and;
Be divided by unit, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
Preferably, the polarity probability analysis device of the sentence also includes:
Module is reduced, for reducing the dimension of each term vector in the current statement using PCA.
Preferably, the polarity probability analysis device of the sentence also includes:
Processing module, for being standardized to each term vector in the current statement.
The polarity probability analysis method and device of sentence proposed by the present invention, stammerer point is first carried out to the current statement
Word, each word is obtained, feature extraction then is carried out to each word, obtain corresponding each term vector, then calculate each word
Average value corresponding to vector, to obtain the dimension of the current statement, most the dimension of the current statement is added to training at last
In model, to determine the polarity probability of the current statement according to the training pattern, the present invention is first according to the term vector of sentence
The dimension of computing statement, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide compared with
For detailed analysis foundation, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence
The accuracy of polarity probability analysis.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the polarity probability analysis method first embodiment of sentence of the present invention;
Fig. 2 is average value corresponding to each term vector of present invention calculating, obtains the refinement stream of the dimension of the current statement
Journey schematic diagram;
Fig. 3 is the schematic flow sheet of the polarity probability analysis method second embodiment of sentence of the present invention;
Fig. 4 is the schematic flow sheet of the polarity probability analysis method 3rd embodiment of sentence of the present invention;
Fig. 5 is the schematic diagram being standardized in the present invention to term vector;
Fig. 6 is the high-level schematic functional block diagram of the polarity probability analysis device first embodiment of sentence of the present invention;
Fig. 7 is the refinement high-level schematic functional block diagram of computing module in Fig. 6;
Fig. 8 is the high-level schematic functional block diagram of the polarity probability analysis device second embodiment of sentence of the present invention;
Fig. 9 is the high-level schematic functional block diagram of the polarity probability analysis device 3rd embodiment of sentence of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of polarity probability analysis method of sentence.
Reference picture 1, Fig. 1 are the schematic flow sheet of the polarity probability analysis method first embodiment of sentence of the present invention.
In the present embodiment, the polarity probability analysis method of the sentence includes:
Step S10, stammerer participle is carried out to the current statement, obtains each word.
Step S20, feature extraction is carried out to each word, obtain corresponding each term vector.
In the present embodiment, first sentence to be analyzed is obtained from statement library, and using the sentence of acquisition as current statement,
Then stammerer participle is carried out to the current statement, obtains each word, the mode of the stammerer participle is exemplified below:
If current statement is:Standard room is too poor, room might as well 3 stars, and facility extraordinary obsolescence, it is proposed that hotel is old
Standard room improve again.
After stammerer participle, obtained each word is:
Standard room/too/difference// room/also/not as/3/ star///and/facility/very/outmoded// suggestion/wine
Shop// old// standard room/weight/it is new/improve/.
In the present embodiment, before stammerer participle is carried out to the current statement, in addition to:
Both positive and negative polarity mark is carried out to the current statement, wherein, just highly preferred to be represented with 1, negative pole is represented with 0, should be managed
Solution, carries out both positive and negative polarity mark to the current statement, convenient subsequently according to training pattern, determines the pole of the current statement
Property probability, is specifically described below.
After each word is obtained, feature extraction is carried out to each word, to obtain corresponding each term vector.In this reality
Apply in example, using default instrument, such as word2vec carries out feature extraction to each word, with each word corresponding to obtaining to
Amount, it should be appreciated that feature extraction is carried out to each word by word2vec, the dimension of obtained each term vector is
400。
To be best understood from, it is exemplified below:
If word is " hello ", feature extraction is carried out to the word by the word2vec, obtained term vector may
For:
[-1.12e+00, 1.08e-01..., 1.69e-01, 1.17e+00]
Wherein, e+00=e0=1, e-01=e-1=1/e, because the dimension of the word is 400, therefore, in above-mentioned term vector
Eliminate 396.It should be appreciated that the numerical value in above-mentioned term vector is merely exemplary, specific numerical value be by sentence,
What the number of word and the implication of word determined.
Step S30, calculate each term vector corresponding to average value, obtain the dimension of the current statement.
After each term vector is obtained, average value corresponding to each term vector is calculated, to obtain the current statement
Dimension, specifically, reference picture 2, the step S30 include:
Step S31, obtain in the current statement, each dimension numerical value of each term vector;
Step S32, each dimension numerical value of same position in each term vector is added, obtain each dimension and;
Step S33, by each dimension and divided by term vector number, obtain the dimension of the current statement.
First, in the current statement, each dimension numerical value of each term vector is obtained, by taking the above as an example,
When word is " hello ", corresponding term vector is 400 dimensions, then, the dimension numerical value of the term vector just has 400.
After each dimension numerical value of each term vector is got, by each number of dimensions of same position in each term vector
Value is added, obtain each dimension and, that is to say, that each word is all 400 dimensions by the term vector after feature extraction,
So each dimension numerical value of same position in each term vector is added, still obtains the data of 400 dimensions, now will be each
Individual dimension and divided by term vector number, you can obtain the dimension of the current statement.
To be best understood from, it is exemplified below:There are two words, respectively A and B in current statement, wherein,
The term vector that word A is obtained after feature extraction is [- 0.3e+00, 1.2e-01..., 5.1e-01, 3.1e+00];
The term vector that word B is obtained after feature extraction is [- 0.5e+00, 0.2e-01..., 3.2e-01, 0.1e+00];
So, by word A term vector and word B term vector, the dimension numerical value of same position is added, and is obtained
Each dimension and be:
(-0.3e+00)+(-0.5e+00)=- 0.8e+00;
1.2e-01+0.2e-01=1.4e-01;
5.1e-01+3.2e-01=8.3e-01;
3.1e+00+0.1e+00=3.2e+00。
Now, then by each dimension and divided by term vector number, due to term vector number be 2, you can obtain institute
State the dimension of current statement:
[-0.4e+00, 1.7e-01..., 4.15e-01, 1.6e+00]。
Step S40, the dimension of the current statement is added in training pattern, to determine institute according to the training pattern
State the polarity probability of current statement.
After the dimension of the current statement is obtained, you can obtain training pattern, the training pattern in the present embodiment
It is neural network training model, such as SVM (Support Vector Machine, SVMs) training pattern, is getting
After training pattern, the current statement is added in the training pattern, it is described current to be determined according to the training pattern
The polarity probability of sentence.Specifically, the polarity probability for determining the current statement according to the training pattern is:
By the dimension of the current statement, the higher dimensional space of the training pattern is mapped to, and is sought in the higher dimensional space
The hyperplane of polarization is looked for, the hyperplane searched out is then cut into each sub- plane, wherein, every sub- plane is all
The probability of positive pole 1 and negative pole 0 is corresponding with, finally, the positive pole probability of every sub- plane is added, you can obtain described super flat
The positive pole probability in face, accordingly, the positive pole probability of the current statement is also just obtained, subtracts positive pole probability with 100% accordingly,
It is also theorized that the negative pole probability of the current statement.It is of course also possible to after first calculating negative pole probability, then derive just
Pole probability, is not limited herein.
In the present embodiment, in order to improve the accuracy of training pattern, preferably included in the training pattern training set and
Test set, the test set are used to test to the training pattern, and the ratio of training set and test set can be as the case may be
It is configured, in the present embodiment, preferably the ratio of training set and test set is 8:2.
The polarity probability analysis method for the sentence that the present embodiment proposes, first carries out stammerer participle to the current statement, obtains
To each word, feature extraction then is carried out to each word, obtains corresponding each term vector, then calculate each term vector pair
The average value answered, to obtain the dimension of the current statement, most the dimension of the current statement is added in training pattern at last,
To determine the polarity probability of the current statement according to the training pattern, the present invention is first according to the term vector computing statement of sentence
Dimension, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide it is more detailed
Foundation is analyzed, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence polarity probability
The accuracy of analysis.
Further, in order to improve the efficiency of the polarity probability analysis of sentence, language of the present invention is proposed based on first embodiment
The second embodiment of the polarity probability analysis method of sentence, in the present embodiment, reference picture 3, before the step S30, institute's predicate
The polarity probability analysis method of sentence also includes:
Step S50, the dimension of each term vector in the current statement is reduced using PCA.
It should be appreciated that in the first embodiment, feature extraction is carried out to each word, obtained each term vector is all
It is 400 dimensions, is handled according to each term vector of 400 dimensions, the dimension of the sentence finally given is also 400 dimensions, then is added
It is added to when being calculated in the training pattern, because dimension is too high, causes amount of calculation very big so that sentence polarity probability
The time that analysis process is consumed is also more, so as to reduce computational efficiency.
Therefore, in the present embodiment, using PCA ((Principal Component Analysis, PCA))
Reduce the dimension of each term vector in the current statement, by each term vectors of 400 dimensions be all reduced to the words of 100 dimensions to
Amount, then, the dimension of the sentence subsequently calculated is also only 100 dimensions, and amount of calculation reduces, correspondingly, polarity probability analysis
The efficiency of process also improves.It should be appreciated that each term vector in the current statement is reduced using PCA
Dimension, although the dimension of each term vector reduces, each term vector almost term vector comprising original dimension it is interior
Hold, bag content ratio is up to 99%, and therefore, the present embodiment reduces the dimension of each term vector using PCA, will not
Influence follow-up result of calculation.
In the present embodiment, the dimension of each term vector in the current statement is reduced using PCA, was both protected
The accuracy of sentence polarity probability analysis has been demonstrate,proved, has more improved the efficiency of polarity probability analysis.
Further, in order to improve the accuracy of the polarity probability analysis of sentence, the present invention is proposed based on second embodiment
The 3rd embodiment of the polarity probability analysis method of sentence, in the present embodiment, reference picture 4 are described before the step S50
The polarity probability analysis method of sentence also includes:
Step S60, each term vector in the current statement is standardized.
It was found from content above, each word is when calculating term vector, each dimension numerical value in each term vector
It is different, and, it is likely that there is the situation that some dimension numerical value differ larger, in that way it is possible to which final calculating can be caused
As a result it is inaccurate.
Therefore, in the present embodiment, before the dimension of each term vector is reduced using PCA, first to each word
Vector is standardized, and the standardization is actually to carry out normal state to each term vector, specifically be can refer to
Fig. 5 so that in each term vector after processing, the average often tieed up is 0, standard deviation 1, after standardization, each word
It will not differ too big between the dimension of vector, subsequently, the polarity probability of computing statement is also more accurate.
The present invention further provides a kind of polarity probability analysis device of sentence.
Reference picture 6, Fig. 6 are the functional module signal of the first embodiment of polarity probability analysis device 100 of sentence of the present invention
Figure.
It is emphasized that it will be apparent to those skilled in the art that functional block diagram shown in Fig. 6 is only one preferably real
Apply the exemplary plot of example, function mould of the those skilled in the art around the polarity probability analysis device 100 of the sentence shown in Fig. 6
Block, the supplement of new functional module can be carried out easily;The title of each functional module is self-defined title, and being only used for auxiliary understanding should
Each program function block of the polarity probability analysis device 100 of sentence, is not used in restriction technical scheme, skill of the present invention
The core of art scheme is the function to be reached of functional module of each self-defined title.
In the present embodiment, the polarity probability analysis device 100 of the sentence includes:
Word-dividing mode 10, for carrying out stammerer participle to the current statement, obtain each word.
Characteristic extracting module 20, for carrying out feature extraction to each word, obtain corresponding each term vector.
In the present embodiment, first sentence to be analyzed is obtained from statement library, and using the sentence of acquisition as current statement,
Then word-dividing mode 10 carries out stammerer participle to the current statement, obtains each word, and the mode of the stammerer participle is illustrated
It is as follows:
If current statement is:Standard room is too poor, room might as well 3 stars, and facility extraordinary obsolescence, it is proposed that hotel is old
Standard room improve again.
After stammerer participle, obtained each word is:
Standard room/too/difference// room/also/not as/3/ star///and/facility/very/outmoded// suggestion/wine
Shop// old// standard room/weight/it is new/improve/.
In the present embodiment, before stammerer participle is carried out to the current statement, in addition to:
Both positive and negative polarity mark is carried out to the current statement, wherein, just highly preferred to be represented with 1, negative pole is represented with 0, should be managed
Solution, carries out both positive and negative polarity mark to the current statement, convenient subsequently according to training pattern, determines the pole of the current statement
Property probability, is specifically described below.
After each word is obtained, characteristic extracting module 20 carries out feature extraction to each word, corresponding each to obtain
Individual term vector.In the present embodiment, feature extraction is carried out to each word using default instrument, such as word2vec, to obtain
Corresponding each term vector, it should be appreciated that feature extraction, obtained each word are carried out to each word by word2vec
The dimension of vector is 400.
To be best understood from, it is exemplified below:
If word is " hello ", feature extraction is carried out to the word by the word2vec, obtained term vector may
For:
[-1.12e+00, 1.08e-01..., 1.69e-01, 1.17e+00]
Wherein, e+00=e0=1, e-01=e-1=1/e, because the dimension of the word is 400, therefore, in above-mentioned term vector
Eliminate 396.It should be appreciated that the numerical value in above-mentioned term vector is merely exemplary, specific numerical value be by sentence,
What the number of word and the implication of word determined.
Computing module 30, for average value corresponding to calculating each term vector, obtain the dimension of the current statement.
After each term vector is obtained, computing module 30 calculates average value corresponding to each term vector, described to obtain
The dimension of current statement, specifically, reference picture 7, the computing module 30 include:
Acquiring unit 31, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit 32, for each dimension numerical value of same position in each term vector to be added, obtain each
Dimension and;
Be divided by unit 33, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
First, acquiring unit 31 obtains each dimension numerical value of each term vector, with above-mentioned interior in the current statement
Exemplified by appearance, when word is " hello ", corresponding term vector is 400 dimensions, then, the dimension numerical value of the term vector just has 400.
After acquiring unit 31 gets each dimension numerical value of each term vector, addition unit 32 is by each term vector
Each dimension numerical value of same position is added, obtain each dimension and, that is to say, that after each word is by feature extraction
Term vector be all 400 dimensions, then each dimension numerical value of same position in each term vector is added, still obtained
400 dimension data, now by each dimension and divided by term vector number, you can obtain the dimension of the current statement.
To be best understood from, it is exemplified below:There are two words, respectively A and B in current statement, wherein,
The term vector that word A is obtained after feature extraction is [- 0.3e+00, 1.2e-01..., 5.1e-01, 3.1e+00];
The term vector that word B is obtained after feature extraction is [- 0.5e+00, 0.2e-01..., 3.2e-01, 0.1e+00];
So, by word A term vector and word B term vector, the dimension numerical value of same position is added, and is obtained
Each dimension and be:
(-0.3e+00)+(-0.5e+00)=- 0.8e+00;
1.2e-01+0.2e-01=1.4e-01;
5.1e-01+3.2e-01=8.3e-01;
3.1e+00+0.1e+00=3.2e+00。
Now, then by each dimension and divided by term vector number, due to term vector number be 2, you can obtain institute
State the dimension of current statement:
[-0.4e+00, 1.7e-01..., 4.15e-01, 1.6e+00]。
Determining module 40, for the dimension of the current statement to be added in training pattern, with according to the training mould
Type determines the polarity probability of the current statement.
After the dimension of the current statement is obtained, you can obtain training pattern, the training pattern in the present embodiment
It is neural network training model, such as SVM (Support Vector Machine, SVMs) training pattern, is getting
After training pattern, the current statement is added in the training pattern by determining module 40, with true according to the training pattern
The polarity probability of the fixed current statement.Specifically, the polarity probability for determining the current statement according to the training pattern is:
By the dimension of the current statement, the higher dimensional space of the training pattern is mapped to, and is sought in the higher dimensional space
The hyperplane of polarization is looked for, the hyperplane searched out is then cut into each sub- plane, wherein, every sub- plane is all
The probability of positive pole 1 and negative pole 0 is corresponding with, finally, the positive pole probability of every sub- plane is added, you can obtain described super flat
The positive pole probability in face, accordingly, the positive pole probability of the current statement is also just obtained, subtracts positive pole probability with 100% accordingly,
It is also theorized that the negative pole probability of the current statement.It is of course also possible to after first calculating negative pole probability, then derive just
Pole probability, is not limited herein.
In the present embodiment, in order to improve the accuracy of training pattern, preferably included in the training pattern training set and
Test set, the test set are used to test to the training pattern, and the ratio of training set and test set can be as the case may be
It is configured, in the present embodiment, preferably the ratio of training set and test set is 8:2.
The polarity probability analysis device 100 for the sentence that the present embodiment proposes, first carries out stammerer participle to the current statement,
Each word is obtained, feature extraction then is carried out to each word, obtains corresponding each term vector, then calculate each term vector
Corresponding average value, to obtain the dimension of the current statement, most the dimension of the current statement is added to training pattern at last
In, to determine the polarity probability of the current statement according to the training pattern, the present invention first calculates according to the term vector of sentence
The dimension of sentence, the polarity probability of sentence is determined in conjunction with training pattern, for sentence polarity probability analysis provide it is more detailed
Thin analysis foundation, rather than just the both positive and negative polarity of the keyword according to sentence, the polarity of sentence is determined, improves sentence polarity
The accuracy of probability analysis.
Further, in order to improve the efficiency of the polarity probability analysis device of sentence, this hair is proposed based on first embodiment
The second embodiment of the polarity probability analysis device 100 of plain language sentence, in the present embodiment, reference picture 8, the polarity of the sentence is general
Rate analytical equipment 100 also includes:
Module 50 is reduced, for reducing the dimension of each term vector in the current statement using PCA.
It should be appreciated that in the first embodiment, feature extraction is carried out to each word, obtained each term vector is all
It is 400 dimensions, is handled according to each term vector of 400 dimensions, the dimension of the sentence finally given is also 400 dimensions, then is added
It is added to when being calculated in the training pattern, because dimension is too high, causes amount of calculation very big so that sentence polarity probability
The time that analysis process is consumed is also more, so as to reduce computational efficiency.
Therefore, in the present embodiment, module 50 is reduced using PCA ((Principal Component
Analysis, PCA)) dimension of each term vector in the current statement is reduced, each term vectors of 400 dimensions are all reduced
For the term vector of 100 dimensions, then, the dimension of the sentence subsequently calculated is also only 100 dimensions, and amount of calculation reduces, correspondingly,
The efficiency of polarity probability analysis process also improves.It should be appreciated that the current statement is reduced using PCA
In each term vector dimension, although the dimension of each term vector reduces, each term vector almost includes original dimension
Term vector content, bag content ratio is up to 99%, and therefore, the present embodiment reduces each term vector using PCA
Dimension, do not interfere with follow-up result of calculation.
In the present embodiment, the dimension of each term vector in the current statement is reduced using PCA, was both protected
The accuracy of sentence polarity probability analysis has been demonstrate,proved, has more improved the efficiency of polarity probability analysis.
Further, in order to improve the accuracy of the polarity probability analysis device of sentence, this is proposed based on second embodiment
The 3rd embodiment of the polarity probability analysis device 100 of invention sentence, in the present embodiment, reference picture 9, the polarity of the sentence
Probability analysis device 100 also includes:
Processing module 60, for being standardized to each term vector in the current statement.
It was found from content above, each word is when calculating term vector, each dimension numerical value in each term vector
It is different, and, it is likely that there is the situation that some dimension numerical value differ larger, in that way it is possible to which final calculating can be caused
As a result it is inaccurate.
Therefore, in the present embodiment, before the dimension of each term vector is reduced using PCA, processing module 60
First each term vector is standardized, the standardization is actually to carry out normal state to each term vector,
Specifically it can refer to Fig. 5 so that in each term vector after processing, the average often tieed up is 0, standard deviation 1, passes through standardization
Afterwards, it will not differ too big between the dimension of each term vector, subsequently, the polarity probability of computing statement is also more accurate.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property includes, so that process, method, article or device including a series of elements not only include those key elements, and
And also include the other key elements being not expressly set out, or also include for this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Other identical element also be present in the process of key element, method, article or device.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words
Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, computer, clothes
Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (8)
1. the polarity probability analysis method of a kind of sentence, it is characterised in that the polarity probability analysis method of the sentence includes:
Stammerer participle is carried out to the current statement, obtains each word;
Feature extraction is carried out to each word, obtains corresponding each term vector;
Average value corresponding to calculating each term vector, obtain the dimension of the current statement;
The dimension of the current statement is added in training pattern, to determine the current statement according to the training pattern
Polarity probability.
2. the polarity probability analysis method of sentence as claimed in claim 1, it is characterised in that described to calculate each term vector pair
The average value answered, the step of obtaining the dimension of the current statement, include:
Obtain in the current statement, each dimension numerical value of each term vector;
Each dimension numerical value of same position in each term vector is added, obtain each dimension and;
By each dimension and divided by term vector number, obtain the dimension of the current statement.
3. the polarity probability analysis method of sentence as claimed in claim 1 or 2, it is characterised in that it is described calculate each word to
Average value corresponding to amount, before the step of obtaining the dimension of the current statement, the polarity probability analysis method of the sentence is also
Including:
The dimension of each term vector in the current statement is reduced using PCA.
4. the polarity probability analysis method of sentence as claimed in claim 3, it is characterised in that described to use PCA
Before the step of reducing the dimension of each term vector in the current statement, the polarity probability analysis method also includes:
Each term vector in the current statement is standardized.
5. the polarity probability analysis device of a kind of sentence, it is characterised in that the polarity probability analysis device of the sentence includes:
Word-dividing mode, for carrying out stammerer participle to the current statement, obtain each word;
Characteristic extracting module, for carrying out feature extraction to each word, obtain corresponding each term vector;
Computing module, for average value corresponding to calculating each term vector, obtain the dimension of the current statement;
Determining module, for the dimension of the current statement to be added in training pattern, to be determined according to the training pattern
The polarity probability of the current statement.
6. the polarity probability analysis device of sentence as claimed in claim 5, it is characterised in that the computing module includes:
Acquiring unit, for obtaining in the current statement, each dimension numerical value of each term vector;
Addition unit, for each dimension numerical value of same position in each term vector to be added, obtain each dimension and;
Be divided by unit, for by each dimension and divided by term vector number, obtain the dimension of the current statement.
7. the polarity probability analysis device of the sentence as described in claim 5 or 6, it is characterised in that the polarity of the sentence is general
Rate analytical equipment also includes:
Module is reduced, for reducing the dimension of each term vector in the current statement using PCA.
8. the polarity probability analysis device of sentence as claimed in claim 7, it is characterised in that the polarity probability point of the sentence
Analysis apparatus also includes:
Processing module, for being standardized to each term vector in the current statement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610859095.1A CN107861936A (en) | 2016-09-28 | 2016-09-28 | The polarity probability analysis method and device of sentence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610859095.1A CN107861936A (en) | 2016-09-28 | 2016-09-28 | The polarity probability analysis method and device of sentence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107861936A true CN107861936A (en) | 2018-03-30 |
Family
ID=61698844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610859095.1A Pending CN107861936A (en) | 2016-09-28 | 2016-09-28 | The polarity probability analysis method and device of sentence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107861936A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408809A (en) * | 2018-09-25 | 2019-03-01 | 天津大学 | A kind of sentiment analysis method for automobile product comment based on term vector |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573046A (en) * | 2015-01-20 | 2015-04-29 | 成都品果科技有限公司 | Comment analyzing method and system based on term vector |
CN104794208A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Sentiment classification method and system based on contextual information of microblog text |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
WO2016105803A1 (en) * | 2014-12-24 | 2016-06-30 | Intel Corporation | Hybrid technique for sentiment analysis |
-
2016
- 2016-09-28 CN CN201610859095.1A patent/CN107861936A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016105803A1 (en) * | 2014-12-24 | 2016-06-30 | Intel Corporation | Hybrid technique for sentiment analysis |
CN104573046A (en) * | 2015-01-20 | 2015-04-29 | 成都品果科技有限公司 | Comment analyzing method and system based on term vector |
CN104794208A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Sentiment classification method and system based on contextual information of microblog text |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
Non-Patent Citations (3)
Title |
---|
丁世飞编著: "支持向量机(SVM)", 《高级人工智能》 * |
张冬雯 等: "基于word2vec和SVMperf的中文评论情感分类研究", 《计算机科学》 * |
靳晓强: "英文冠词纠错方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408809A (en) * | 2018-09-25 | 2019-03-01 | 天津大学 | A kind of sentiment analysis method for automobile product comment based on term vector |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10657969B2 (en) | Identity verification method and apparatus based on voiceprint | |
CN108399409A (en) | Image classification method, device and terminal | |
US20150169942A1 (en) | Terminal configuration method and terminal | |
CN107544982A (en) | Text message processing method, device and terminal | |
CN112100431B (en) | Evaluation method, device and equipment of OCR system and readable storage medium | |
CN111813910B (en) | Customer service problem updating method, customer service problem updating system, terminal equipment and computer storage medium | |
CN109598414A (en) | Risk evaluation model training, methods of risk assessment, device and electronic equipment | |
CN108269575A (en) | Update audio recognition method, terminal installation and the storage medium of voice print database | |
CN108932646B (en) | User tag verification method and device based on operator and electronic equipment | |
CN109003607B (en) | Voice recognition method, voice recognition device, storage medium and electronic equipment | |
CN108509458A (en) | A kind of business object recognition methods and device | |
CN108922520B (en) | Voice recognition method, voice recognition device, storage medium and electronic equipment | |
CN108634926A (en) | Vision testing method, device, system based on VR technologies and storage medium | |
CN107291775A (en) | The reparation language material generation method and device of error sample | |
CN111209354A (en) | Method and device for judging repetition of map interest points and electronic equipment | |
CN111507250A (en) | Image recognition method, device and storage medium | |
CN108170785A (en) | Bootstrap technique, device and the computer readable storage medium of terminal searching operation | |
CN107679887A (en) | A kind for the treatment of method and apparatus of trade company's scoring | |
CN107861936A (en) | The polarity probability analysis method and device of sentence | |
CN107015647A (en) | User's gender identification method based on smart mobile phone posture behavior big data | |
CN108595141A (en) | Pronunciation inputting method and device, computer installation and computer readable storage medium | |
CN117332062A (en) | Data processing method and related device | |
CN110443291A (en) | A kind of model training method, device and equipment | |
CN104050168B (en) | Information processing method, electronic equipment and dictionary server | |
CN113555037B (en) | Method and device for detecting tampered area of tampered audio and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180330 |
|
RJ01 | Rejection of invention patent application after publication |