CN110390005A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN110390005A CN110390005A CN201910666576.4A CN201910666576A CN110390005A CN 110390005 A CN110390005 A CN 110390005A CN 201910666576 A CN201910666576 A CN 201910666576A CN 110390005 A CN110390005 A CN 110390005A
- Authority
- CN
- China
- Prior art keywords
- answer
- word
- embedded
- expression
- described problem
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention provides a kind of data processing method and device, wherein this method comprises: handling the answer of problem and described problem, obtains the embedded expression of word of the word embedded expression and the answer of described problem;The embedded expression of the word of described problem is compressed, the embedded expression of word of compressed described problem is obtained;According to the embedded expression of word of the word of compressed described problem embedded expression and the answer, the matching value of answer and problem is calculated, and answer is ranked up according to obtained matching value.The data processing method and device provided through the embodiment of the present invention participate in the sequencer procedure of answer without artificial, and time saving and energy saving and sequence efficiency is high.
Description
Technical field
The present invention relates to field of computer technology, in particular to a kind of data processing method and device.
Background technique
Currently, the internet product mode for generating content by user-driven is gradually numerous with the development of Web2.0 technology
Honor is got up.In forum, Web Community, people can freely propose various problems and to making answer the problem of other people.
Due to problem and answer quantity increase and answer quality it is irregular, need to manually check the quality of answer
And multiple answers of problem are ranked up according to the quality of answer.
Artificially check the time-consuming and laborious inefficiency of the quality of each answer.
Summary of the invention
To solve the above problems, the embodiment of the present invention is designed to provide a kind of data processing method and device.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising:
The answer of problem and described problem is handled, the embedded expression of word and the answer of described problem are obtained
The embedded expression of word;
The embedded expression of the word of described problem is compressed, the embedded expression of word of compressed described problem is obtained;
According to the word of compressed described problem it is embedded expression and the answer the embedded expression of word, to answer with ask
The matching value of topic is calculated, and is ranked up according to obtained matching value to answer.
Second aspect, the embodiment of the invention also provides a kind of data processing equipments, comprising:
First processing module is handled for the answer to problem and described problem, and the word for obtaining described problem is embedding
Enter the embedded expression of word of formula expression and the answer;
Second processing module is compressed for the embedded expression of word to described problem, obtains compressed described ask
The embedded expression of the word of topic;
Sorting module, the embedded table of word for the embedded expression and the answer of the word according to compressed described problem
Show, the matching value of answer and problem is calculated, and answer is ranked up according to obtained matching value.
In the scheme that the above-mentioned first aspect of the embodiment of the present invention is provided to second aspect, it is embedded in by the word to described problem
Formula expression is compressed, and the embedded expression of word of compressed described problem is obtained, then according to compressed described problem
The embedded expression of word of word embedded expression and the answer, calculates the matching value of answer and problem, and according to obtaining
Matching value answer is ranked up, with manually check answer in the related technology quality and quality according to answer to problem
Multiple answers, which are ranked up, to be compared, and only need to calculate answer and the matching value of problem can be ranked up answer, entire to answer
The sequencer procedure of case is participated in without artificial, and time saving and energy saving and sequence efficiency is high.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 shows a kind of flow chart of data processing method provided by the embodiment of the present invention 1;
Fig. 2 shows a kind of structural schematic diagrams of data processing equipment provided by the embodiment of the present invention 2.
Specific embodiment
In the description of the present invention, it is to be understood that, term " center ", " longitudinal direction ", " transverse direction ", " length ", " width ",
" thickness ", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside", " up time
The orientation or positional relationship of the instructions such as needle ", " counterclockwise " is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of
The description present invention and simplified description, rather than the device or element of indication or suggestion meaning must have a particular orientation, with spy
Fixed orientation construction and operation, therefore be not considered as limiting the invention.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include one or more of the features.In the description of the present invention, the meaning of " plurality " is two or more,
Unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc.
Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can be machine
Tool connection, is also possible to be electrically connected;It can be directly connected, two members can also be can be indirectly connected through an intermediary
Connection inside part.For the ordinary skill in the art, above-mentioned term can be understood in this hair as the case may be
Concrete meaning in bright.
The original design intention of this programme is to solve the problems, such as that the answer in community's question and answer is reordered.
With the development of Web2.0 technology, the internet product mode that content is generated by user-driven (such as is known, hundred
Degree is known) gradually prosperity.In forum, Web Community, people can freely propose various problems and ask other people
Topic makes answer.Due to problem and answer quantity increase and answer quality it is irregular, manually check the quality of answer
And seem time-consuming and laborious according to the process that the quality of answer is ranked up multiple answers of problem.
It can be seen that two features of community's question and answer that common question and answer do not have.Firstly, problem had both included providing to ask
The theme of brief overview is inscribed, also includes the main body of detailed description problem.Quizmaster would generally convey them in problem theme part
Principal concern and key message.Then, they provide more details in relation to the theme in problem content part, seek
Ask help or expression to the gratitude of answerer.Secondly, redundancy and noise problem are very universal in community's question and answer.It problem and answers
Case all may include the auxiliary sentence for being not provided with semantic information.
Previous studies each word of fair play usually in question and answer expression.But due to redundancy and noise problem, only
There is text of the part from problem and answer very useful to determining answer quality.Worse, previous studies, which have ignored, asks
The difference between theme and content part is inscribed, and they are simply connected into problem representation.Since above-mentioned theme-content is closed
System, this simple connection type may redundancy issues in aggravation problem.
Based on this, this programme proposes a kind of data processing method and device, need to only calculate the matching value of answer and problem just
Answer can be ranked up, entirely the sequencer procedure of answer be participated in without artificial, time saving and energy saving and sequence efficiency is high.
In the data processing method and device that the application proposes, executed in following steps by deep learning network
Hold.So, in following embodiment, the parameter in deep learning network be all arranged in the server, need using
When, server is obtained from server itself.
Embodiment 1
The present embodiment proposes that a kind of data processing method, executing subject are servers.
The server can be handled problem and the text of answer using any in the prior art, be answered
The calculating equipment of the matching value of case and problem, no longer repeats one by one here.
A kind of flow chart of data processing method shown in Figure 1, the data processing method may include in detail below
Step:
Step 100 handles the answer of problem and described problem, obtain described problem word it is embedded expression and
The embedded expression of the word of the answer.
In above-mentioned steps 100, in order to obtain the word of described problem it is embedded expression and the answer the embedded table of word
Show, following steps (1) can be executed to step (2):
(1) it by the text input of the answer of the text of described problem and described problem into dictionary, respectively obtains described
The term vector and word vector of problem and the term vector and word vector of the answer;
(2) term vector of described problem and word vector are spliced, obtains the embedded expression of word of described problem, and will be described
The term vector and word vector of answer splice, and obtain the embedded expression of word of the answer.
In above-mentioned steps (1), the dictionary, including but not limited to: the GloVe term vector dictionary without mark corpus training
With the word vector dictionary based on convolutional neural networks.
Because the network text in question and answer forum, community differs widely in terms of spelling and grammer with standardized text,
The GloVe vector of specialized training can more accurately simulate word interaction.Character insertion has been demonstrated have very much to unregistered word
With, therefore it is especially suitable for the noisy network text in question and answer forum, community.
GloVe term vector dictionary by the text input of described problem without mark corpus training, so that it may obtain described ask
The term vector of topic;Word vector dictionary by the text input of described problem based on convolutional neural networks, so that it may obtain described ask
The word vector of topic.
Similarly, the GloVe term vector dictionary by the text input of the answer without mark corpus training, so that it may obtain institute
State the term vector of answer;Word vector dictionary by the text input of the answer based on convolutional neural networks, so that it may obtain institute
State the word vector of answer.
Step 102 compresses the embedded expression of the word of described problem, obtains the word insertion of compressed described problem
Formula indicates.
The step 102 can specifically include following steps (1) to step (2):
(1) Orthogonal Decomposition is carried out to the embedded expression of the word of described problem, the word for obtaining described problem is Embedded parallel
Component and quadrature component;
(2) the Embedded parallel component of the word of described problem and quadrature component are spliced, is obtained compressed described
The embedded expression of the word of problem.
The Embedded parallel component of word that described problem is obtained by following formula in above-mentioned steps (1):
Wherein,Indicate the Embedded parallel component of word of described problem,The main part of expression problem
The embedded expression of word;The embedded expression of the word of i-th of word in the title division of expression problem.
The Embedded quadrature component of word for obtaining described problem by following formula:
Wherein,Indicate the Embedded quadrature component of word of described problem.
In above-mentioned steps (2), fusion door can use to the Embedded parallel component of the word of described problem and orthogonal point
Amount is spliced.This is the prior art, is repeated no more in the present embodiment.
The embedded expression of the word of compressed described problem in order to obtain can execute following steps (21) to step
(23):
(21) the Embedded parallel component of word based on described problem calculates the alignment score of the horizontal component;
(22) the embedded expression of word of the main part of the alignment score based on the horizontal component and problem, is calculated
The summing-up of the problem of being obtained according to the title division of problem main part indicates;
(23) it is indicated according to the summing-up of the problem of being obtained according to the title division of problem main part, is asked described
The Embedded parallel component of the word of topic and quadrature component are spliced, and the embedded expression of word of compressed described problem is obtained.
In above-mentioned steps (21), it is calculated by the following formula to obtain the alignment score of the horizontal component:
Wherein,Indicate that the alignment score of the horizontal component, c indicate alignment parameters, Wp1And bp1It is depth respectively
Parameter in learning network.
The alignment parameters, preset in the server.
In above-mentioned steps (22), it is calculated by the following formula to obtain the problem of obtaining according to the title division of problem main body
Partial summing-up indicates:
Wherein,Indicate that the summing-up of the problem of obtaining according to the title division of problem main part indicates.
In above-mentioned steps (23), pass through the Embedded parallel component of word of the following formula to described problem and quadrature component
Spliced, obtain the embedded expression of word of compressed described problem:
Fpara=σ (Wp2Semb+Wp3Sap+bp2)
Spara=Fpara⊙Semb+(1-Fpara)⊙Sap
Wherein, Wp2、Wp3And bp2Indicate parameter of the fusion door in deep learning network, FparaIt indicates to represent fusion door
Size, SparaIndicate the embedded expression of the word of the parallel component of compressed described problem, SembThe title division of expression problem
The embedded expression of word, SapIndicate that the summing-up of the problem of obtaining according to the title division of problem main part indicates.
Step 104, according to the embedded expression of word of the word of compressed described problem embedded expression and the answer, it is right
The matching value of answer and problem is calculated, and is ranked up according to obtained matching value to answer.
In order to which the matching value to answer and problem calculates, and answer is ranked up according to obtained matching value, is had
Body can execute following steps (1) to step (9):
(1) word in answer is obtained from term vector space reflection to the interactive space with the same dimension of problem representation
The embedded expression of word of the compressed answer;
(2) according to the embedded table of word of the word of compressed described problem embedded expression and the compressed answer
Show, calculates the similarity of problem theme and problem content in the embedded expression of word of compressed described problem;
(3) according to the similarity of the problem of being calculated theme and problem content, the similarity of question and answer is carried out
It calculates;
(4) based on the problem of being calculated and the embedded expression of word of the similarity of answer and the compressed answer,
The first similarity of computational problem and answer in terms of problem;
(5) based on the problem of being calculated and the embedded expression of the word of the similarity of answer and compressed described problem,
The second similarity of computational problem and answer in terms of answer;
(6) the embedded expression of the word of first similarity and described problem is spliced, obtain being obtained according to answer asks
The summing-up of topic indicates;
(7) the embedded expression of the word of second similarity and the answer is spliced, obtain being obtained according to problem answers
The summing-up of case indicates;
(8) summary based on obtained the summing-up expression and the answer obtained according to problem the problem of being obtained according to answer
Property indicate, the matching value of answer and problem is calculated;
(9) answer of described problem is ranked up according to obtained matching value.
In above-mentioned steps (1), the embedded expression of word of the compressed answer is obtained by following formula:
Crep=σ (Wc1Cemb+bc1)⊙
tanh(Wc2Cemb+bc2)
Wherein, CrepIndicate the embedded expression of word of the compressed answer, Wc1、Wc2、bc1And bc2It is deep learning net
Parameter in network, CembIndicate the embedded expression of word of answer.
In above-mentioned steps (2), it is calculated by the following formula problem in the embedded expression of word of compressed described problem
The similarity of theme and problem content:
Wherein,The similarity of expression problem theme and problem content, Wa1、Wa2And baIt is in deep learning network
Parameter,Indicate the embedded expression of the word of compressed described problem,Indicate the expression that answer mapped.
In above-mentioned steps (3), calculated by similarity of the following formula to question and answer:
Wherein, c indicates alignment parameters,Indicate the similarity of question and answer.
In above-mentioned steps (4), it is calculated by the following formula the first similarity of question and answer:
Wherein,Indicate the first similarity of question and answer,Indicate the expression that answer mapped.
In above-mentioned steps (5), calculated by second similarity of the following formula to question and answer:
Wherein,Indicate the second similarity of question and answer,Indicate the word insertion of compressed described problem
Formula indicates.
In above-mentioned steps (8), can execute following steps (81) to step (83) to the matching value of answer and problem into
Row calculates:
(81) problem representation is calculated based on summing-up expression the problem of obtaining according to answer;
(82) answer expression is calculated in the summing-up expression based on the answer obtained according to problem;
(83) it is indicated by described problem and the answer indicates, the matching value of answer and problem is calculated.
In above-mentioned steps (81), it is calculated by the following formula to obtain problem representation:
As1=Ws2tanh(Ws1Satt+bs1)+bs2
Wherein, ssumIndicate problem representation;As1Indicate the matched result of attention when computational problem indicates;SattIndicate root
The summing-up for the problem of obtaining according to answer indicates;Ws1、Ws2、bs1And bs2It is the parameter in deep learning network.
In above-mentioned steps (82), it is calculated by the following formula to obtain answer expression:
As2=Ws2tanh(Ws1Catt+bs1)+bs2
Wherein, csumIndicate that answer indicates;As2Indicate the matched result of attention for calculating answer when indicating;CattIndicate root
The summing-up of the answer obtained according to problem indicates;Ws1、Ws2、bs1And bs2It is the parameter in deep learning network.
In above-mentioned steps (83), is calculated by matching value of the following formula to answer and problem, obtain answer category
In the probability of " good answer, medium answer or poor answer ":
Pr (y | S, B, C)=s0ftmax (W2tanh(W1[ssum;csum]+b1)+b2)
Wherein, Pr (y | S, B, C) indicates the matching value of answer and problem;ssumIndicate problem representation;csumIndicate answer table
Show;W1、W2、b1And b2It is the parameter in deep learning network.
In above-mentioned steps (9), sequence that can be descending according to the matching value being calculated arranges the answer of problem
Sequence.
In conclusion the data processing method that the present embodiment proposes, is carried out by the embedded expression of word to described problem
Compression, obtains the embedded expression of word of compressed described problem, then according to the embedded table of word of compressed described problem
Show with the embedded expression of the word of the answer, the matching value of answer and problem is calculated, and according to obtained matching value pair
Answer is ranked up, with the quality for manually checking answer in the related technology and according to the quality of answer to multiple answers of problem into
Row sequence is compared, and only need to calculate answer and the matching value of problem can be ranked up answer, entirely to the collated of answer
Cheng Wuxu is manually participated in, and time saving and energy saving and sequence efficiency is high.
Embodiment 2
The present embodiment proposes a kind of data processing equipment, for executing the data processing method of above-described embodiment 1.
The structural schematic diagram of data processing equipment shown in Figure 2, the present embodiment propose a kind of data processing equipment, packet
It includes:
First processing module 200 is handled for the answer to problem and described problem, obtains the word of described problem
The embedded expression of word of embedded expression and the answer;
Second processing module 202 is compressed for the embedded expression of word to described problem, is obtained compressed described
The embedded expression of the word of problem;
Sorting module 204, the word for the embedded expression of the word according to compressed described problem and the answer are embedded in
Formula indicates, calculates the matching value of answer and problem, and be ranked up according to obtained matching value to answer.
In conclusion the data processing equipment that the present embodiment proposes, is carried out by the embedded expression of word to described problem
Compression, obtains the embedded expression of word of compressed described problem, then according to the embedded table of word of compressed described problem
Show with the embedded expression of the word of the answer, the matching value of answer and problem is calculated, and according to obtained matching value pair
Answer is ranked up, with the quality for manually checking answer in the related technology and according to the quality of answer to multiple answers of problem into
Row sequence is compared, and only need to calculate answer and the matching value of problem can be ranked up answer, entirely to the collated of answer
Cheng Wuxu is manually participated in, and time saving and energy saving and sequence efficiency is high.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (20)
1. a kind of data processing method characterized by comprising
The answer of problem and described problem is handled, the word of the word embedded expression and the answer of described problem is obtained
Embedded expression;
The embedded expression of the word of described problem is compressed, the embedded expression of word of compressed described problem is obtained;
According to the embedded expression of word of the word of compressed described problem embedded expression and the answer, to answer and problem
Matching value is calculated, and is ranked up according to obtained matching value to answer.
2. being obtained the method according to claim 1, wherein handling the answer of problem and described problem
To the embedded expression of word of the word embedded expression and the answer of described problem, comprising:
By the text input of the answer of the text of described problem and described problem into dictionary, the word of described problem is respectively obtained
Vector sum word vector and the term vector and word vector of the answer;
The term vector of described problem and word vector are spliced, obtain the embedded expression of word of described problem, and by the answer
Term vector and the splicing of word vector, obtain the embedded expression of word of the answer.
3. being obtained the method according to claim 1, wherein being compressed to the embedded expression of the word of described problem
To the embedded expression of word of compressed described problem, comprising:
Orthogonal Decomposition is carried out to the embedded expression of the word of described problem, obtains the Embedded parallel component of word and just of described problem
Hand over component;
The Embedded parallel component of the word of described problem and quadrature component are spliced, the word of compressed described problem is obtained
Embedded expression.
4. according to the method described in claim 3, it is characterized in that, carrying out orthogonal point to the embedded expression of the word of described problem
Solution, obtains the Embedded parallel component of word of described problem, comprising:
The Embedded parallel component of word for obtaining described problem by following formula:
Wherein,Indicate the Embedded parallel component of word of described problem,The word of the main part of expression problem is embedding
Enter formula expression;The embedded expression of the word of i-th of word in the title division of expression problem.
5. according to the method described in claim 4, it is characterized in that, carrying out orthogonal point to the embedded expression of the word of described problem
Solution, obtains the Embedded quadrature component of word of described problem, comprising:
The Embedded quadrature component of word for obtaining described problem by following formula:
Wherein,Indicate the Embedded quadrature component of word of described problem.
6. according to the method described in claim 4, it is characterized in that, to the Embedded parallel component of the word of described problem and orthogonal
Component is spliced, and the embedded expression of word of compressed described problem is obtained, comprising:
The Embedded parallel component of word based on described problem calculates the alignment score of the horizontal component;
The embedded expression of word of the main part of alignment score based on the horizontal component and problem, is calculated according to problem
The summing-up of title division the problem of obtaining main part indicate;
It is indicated according to the summing-up of the problem of being obtained according to the title division of problem main part, it is embedding to the word of described problem
The parallel component and quadrature component for entering formula are spliced, and the embedded expression of word of compressed described problem is obtained.
7. according to the method described in claim 6, it is characterized in that, the Embedded parallel component of word based on described problem is to institute
The alignment score for stating horizontal component is calculated, comprising:
It is calculated by the following formula to obtain the alignment score of the horizontal component:
Wherein,Indicate that the alignment score of the horizontal component, c indicate alignment parameters, Wp1And bp1It is deep learning respectively
Parameter in network.
8. the method according to the description of claim 7 is characterized in that the master of alignment score and problem based on the horizontal component
The summing-up table of the problem of obtaining according to the title division of problem main part is calculated in the embedded expression of the word of body portion
Show, comprising:
The summing-up for being calculated by the following formula to obtain the problem of obtaining according to the title division of problem main part indicates:
Wherein,Indicate that the summing-up of the problem of obtaining according to the title division of problem main part indicates.
9. the method according to the description of claim 7 is characterized in that according to described the problem of being obtained according to the title division of problem
The summing-up of main part indicates, splices to the Embedded parallel component of the word of described problem and quadrature component, is pressed
The embedded expression of the word of described problem after contracting, comprising:
Spliced by the Embedded parallel component of word of the following formula to described problem and quadrature component, is obtained compressed
The embedded expression of the word of described problem:
Fpara=σ (Wp2Semb+Wp3Sap+bp2)
Spara=Fpara⊙Semb+(1-Fpara)⊙Sap
Wherein, Wp2、Wp3And bp2Indicate parameter of the fusion door in deep learning network, FparaIt indicates to represent the size for merging door,
SparaIndicate the embedded expression of the word of the parallel component of compressed described problem, SembThe word of the title division of expression problem is embedding
Enter formula expression, SapIndicate that the summing-up of the problem of obtaining according to the title division of problem main part indicates.
10. the method according to claim 1, wherein according to the embedded expression of word of compressed described problem
With the embedded expression of word of the answer, the matching value of answer and problem is calculated, and is answered according to obtained matching value
Case is ranked up, comprising:
By the word in answer from term vector space reflection to the interactive space with the same dimension of problem representation, obtain compressed
The embedded expression of the word of the answer;
According to the embedded expression of word of the word of compressed described problem embedded expression and the compressed answer, pressure is calculated
The similarity of problem theme and problem content in the embedded expression of the word of described problem after contracting;
According to the similarity of theme the problem of being calculated and problem content, the similarity of question and answer is calculated;
Based on the problem of being calculated and the embedded expression of word of the similarity of answer and the compressed answer, from problem side
First similarity of face computational problem and answer;
Based on the problem of being calculated and the embedded expression of the word of the similarity of answer and compressed described problem, from answer side
Second similarity of face computational problem and answer;
By the embedded expression splicing of the word of first similarity and described problem, the summary for the problem of obtaining according to answer is obtained
Property indicate;
The embedded expression of the word of second similarity and the answer is spliced, the summary of the answer obtained according to problem is obtained
Property indicate;
The summing-up for the answer for being indicated based on the summing-up obtained the problem of being obtained according to answer and being obtained according to problem indicated,
The matching value of answer and problem is calculated;
The answer of described problem is ranked up according to obtained matching value.
11. according to the method described in claim 10, it is characterized in that, by the word in answer from term vector space reflection to
The interactive space of the same dimension of problem representation obtains the embedded expression of word of the compressed answer, comprising:
The embedded expression of word of the compressed answer is obtained by following formula:
Crep=σ (Wc1Cemb+bc1)⊙tanh(Wc2Cemb+bc2)
Wherein, CrepIndicate the embedded expression of word of the compressed answer, Wc1、Wc2、bc1And bc2It is in deep learning network
Parameter, CembIndicate the embedded expression of word of answer.
12. according to the method for claim 11, which is characterized in that according to the embedded expression of the word of compressed described problem
With the embedded expression of word of the compressed answer, problem theme in the embedded expression of word of compressed described problem is calculated
With the similarity of problem content, comprising:
It is calculated by the following formula the similar of problem theme and problem content in the embedded expression of word of compressed described problem
Degree:
Wherein,The similarity of expression problem theme and problem content, Wa1、Wa2And baIt is the ginseng in deep learning network
Number,Indicate the embedded expression of the word of compressed described problem,Indicate the expression that answer mapped.
13. according to the method for claim 12, which is characterized in that according to theme the problem of being calculated and problem content
Similarity calculates the similarity of question and answer, comprising:
It is calculated by similarity of the following formula to question and answer:
Wherein, c indicates alignment parameters,Indicate the similarity of question and answer.
14. according to the method for claim 13, which is characterized in that based on the problem of being calculated and the similarity of answer and
The embedded expression of word of the compressed answer, the first similarity of computational problem and answer in terms of problem, comprising:
It is calculated by the following formula the first similarity of question and answer:
Wherein,Indicate the first similarity of question and answer,Indicate the expression that answer mapped.
15. according to the method for claim 13, which is characterized in that based on the problem of being calculated and the similarity of answer and
The embedded expression of the word of compressed described problem, the second similarity of computational problem and answer in terms of answer, comprising:
It is calculated by second similarity of the following formula to question and answer:
Wherein,Indicate the second similarity of question and answer,Indicate the embedded table of the word of compressed described problem
Show.
16. according to the method for claim 13, which is characterized in that based on obtained summary the problem of being obtained according to answer
Property the summing-up of answer that indicates and obtained according to problem indicate, the matching value of answer and problem is calculated, comprising:
Problem representation is calculated based on summing-up expression the problem of obtaining according to answer;
Answer expression is calculated in summing-up expression based on the answer obtained according to problem;
It is indicated by described problem and the answer indicates, the matching value of answer and problem is calculated.
17. according to the method for claim 16, which is characterized in that indicated based on summing-up the problem of being obtained according to answer
Problem representation is calculated, comprising:
It is calculated by the following formula to obtain problem representation:
As1=Ws2tanh(Ws1Satt+bs1)+bs2
Wherein, ssumIndicate problem representation;As1Indicate the matched result of attention when computational problem indicates;SattIndicate that basis is answered
The summing-up for the problem of case obtains indicates;Ws1、Ws2、bs1And bs2It is the parameter in deep learning network.
18. according to the method for claim 16, which is characterized in that the summing-up based on the answer obtained according to problem indicates
Answer expression is calculated, comprising:
It is calculated by the following formula to obtain answer expression:
As2=Ws2tanh(Ws1Catt+bs1)+bs2
Wherein, csumIndicate that answer indicates;As2Indicate the matched result of attention for calculating answer when indicating;CattIndicate that basis is asked
The summing-up for inscribing obtained answer indicates;Ws1、Ws2、bs1And bs2It is the parameter in deep learning network.
19. according to the method for claim 16, which is characterized in that it is indicated by described problem and the answer indicates, it is right
The matching value of answer and problem is calculated:
It is calculated by matching value of the following formula to answer and problem:
Pr (y | S, B, C)=softmax (W2 tanh(W1[ssum;csum]+b1)+b2)
Wherein, Pr (y | S, B, C) indicates the matching value of answer and problem;ssumIndicate problem representation;csumIndicate that answer indicates;W1、
W2、b1And b2It is the parameter in deep learning network.
20. a kind of data processing equipment characterized by comprising
First processing module is handled for the answer to problem and described problem, and the word for obtaining described problem is embedded
Indicate the embedded expression of word with the answer;
Second processing module is compressed for the embedded expression of word to described problem, obtains compressed described problem
The embedded expression of word;
Sorting module, for the word according to compressed described problem it is embedded expression and the answer the embedded expression of word,
The matching value of answer and problem is calculated, and answer is ranked up according to obtained matching value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910666576.4A CN110390005A (en) | 2019-07-23 | 2019-07-23 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910666576.4A CN110390005A (en) | 2019-07-23 | 2019-07-23 | A kind of data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390005A true CN110390005A (en) | 2019-10-29 |
Family
ID=68287149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910666576.4A Pending CN110390005A (en) | 2019-07-23 | 2019-07-23 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390005A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108132931A (en) * | 2018-01-12 | 2018-06-08 | 北京神州泰岳软件股份有限公司 | A kind of matched method and device of text semantic |
CN108829818A (en) * | 2018-06-12 | 2018-11-16 | 中国科学院计算技术研究所 | A kind of file classification method |
CN109271505A (en) * | 2018-11-12 | 2019-01-25 | 深圳智能思创科技有限公司 | A kind of question answering system implementation method based on problem answers pair |
US20190065576A1 (en) * | 2017-08-23 | 2019-02-28 | Rsvp Technologies Inc. | Single-entity-single-relation question answering systems, and methods |
US20190079921A1 (en) * | 2015-01-23 | 2019-03-14 | Conversica, Inc. | Systems and methods for automated question response |
CN109656952A (en) * | 2018-10-31 | 2019-04-19 | 北京百度网讯科技有限公司 | Inquiry processing method, device and electronic equipment |
CN109726396A (en) * | 2018-12-20 | 2019-05-07 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
-
2019
- 2019-07-23 CN CN201910666576.4A patent/CN110390005A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190079921A1 (en) * | 2015-01-23 | 2019-03-14 | Conversica, Inc. | Systems and methods for automated question response |
US20190065576A1 (en) * | 2017-08-23 | 2019-02-28 | Rsvp Technologies Inc. | Single-entity-single-relation question answering systems, and methods |
CN108132931A (en) * | 2018-01-12 | 2018-06-08 | 北京神州泰岳软件股份有限公司 | A kind of matched method and device of text semantic |
CN108829818A (en) * | 2018-06-12 | 2018-11-16 | 中国科学院计算技术研究所 | A kind of file classification method |
CN109656952A (en) * | 2018-10-31 | 2019-04-19 | 北京百度网讯科技有限公司 | Inquiry processing method, device and electronic equipment |
CN109271505A (en) * | 2018-11-12 | 2019-01-25 | 深圳智能思创科技有限公司 | A kind of question answering system implementation method based on problem answers pair |
CN109726396A (en) * | 2018-12-20 | 2019-05-07 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
Non-Patent Citations (1)
Title |
---|
霍欢: "一种基于关键词扩展的答案块提取模型", 《小型微型计算机***》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wambsganss et al. | AL: An adaptive learning support system for argumentation skills | |
Rao et al. | Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information | |
Ferreira et al. | Towards automatic content analysis of social presence in transcripts of online discussions | |
CN105912629B (en) | A kind of intelligent answer method and device | |
CN114117075B (en) | Knowledge graph completion method and device based on semantic alignment and symmetrical structure coding | |
CN106919655A (en) | A kind of answer provides method and apparatus | |
CN107273861A (en) | Subjective question marking and scoring method and device and terminal equipment | |
DiMarco et al. | A computational theory of goal-directed style in syntax | |
CN109614480B (en) | Method and device for generating automatic abstract based on generation type countermeasure network | |
CN108829682A (en) | Computer readable storage medium, intelligent answer method and intelligent answer device | |
Martinez-Romo et al. | Disentangling categorical relationships through a graph of co-occurrences | |
CN110111010B (en) | Question and answer task allocation method and system based on crowd-sourcing network | |
CN116561538A (en) | Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium | |
CN106202053A (en) | A kind of microblogging theme sentiment analysis method that social networks drives | |
CN108595427B (en) | Subjective question scoring method and device, readable storage medium and electronic equipment | |
CN115146621A (en) | Training method, application method, device and equipment of text error correction model | |
Dascalu et al. | Validating the automated assessment of participation and of collaboration in chat conversations | |
CN106131226A (en) | Judge method and the server of script | |
CN113010655B (en) | Answer and interference item generation method and device for reading and understanding of machine | |
CN114357195A (en) | Knowledge graph-based question-answer pair generation method, device, equipment and medium | |
CN112579794B (en) | Method and system for predicting semantic tree for Chinese and English word pairs | |
CN109346108A (en) | Operation checking method and system | |
CN113793197A (en) | Conversation recommendation system based on knowledge graph semantic fusion | |
CN110390005A (en) | A kind of data processing method and device | |
CN112541069A (en) | Text matching method, system, terminal and storage medium combined with keywords |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |