CN109299400A - A kind of viewpoint abstracting method, device and equipment - Google Patents

A kind of viewpoint abstracting method, device and equipment Download PDF

Info

Publication number
CN109299400A
CN109299400A CN201811037185.8A CN201811037185A CN109299400A CN 109299400 A CN109299400 A CN 109299400A CN 201811037185 A CN201811037185 A CN 201811037185A CN 109299400 A CN109299400 A CN 109299400A
Authority
CN
China
Prior art keywords
viewpoint
text
analyzed
word
lstm model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811037185.8A
Other languages
Chinese (zh)
Inventor
谢忠玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201811037185.8A priority Critical patent/CN109299400A/en
Publication of CN109299400A publication Critical patent/CN109299400A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a kind of viewpoint abstracting method, device and equipment, wherein this method comprises: obtaining text to be analyzed;Based on shot and long term memory network LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined;Wherein, viewpoint entity includes evaluation object and evaluating word, and LSTM model is obtained according to multiple training samples training with viewpoint entity indicia;By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed.Viewpoint abstracting method, device and the equipment provided through the embodiment of the present invention, can reduce the computation complexity of viewpoint extraction process.

Description

A kind of viewpoint abstracting method, device and equipment
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of viewpoint abstracting method, device and equipment.
Background technique
As social networks, the continuous of mobile Internet are popularized, the cost of release information is lower and lower, more and more to use It is happy to share the viewpoint of oneself and the comment for personage, event, product etc. on the internet in family.And Internet company etc. is Understand the feedback etc. that user is directed to product, it is desirable to be able to obtain or excavate the User Perspective for including in user published information.
In the prior art to the excavation of User Perspective, word dependence relationship library is initially set up, it is then interdependent according to the word Relationship library carries out viewpoint extraction.
However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art:
The prior art needs to segment text during establishing word dependence relationship library, obtains in text Word and the corresponding part of speech of word, and subordinate sentence is carried out to text, syntax parsing is carried out to each subordinate sentence, analyzes syntactic structure; Then candidate evaluations word, candidate evaluations object and word dependence relationship path are determined, and then establishes word dependence relationship Library.As can be seen that the establishment process in word dependence relationship library is more complicated in the prior art, so that viewpoint extraction process ratio It is more complex.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of viewpoint abstracting method, device and equipment, to reduce viewpoint extraction The computation complexity of process.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of viewpoint abstracting methods, comprising:
Obtain text to be analyzed;
Based on shot and long term memory network LSTM model trained in advance, the viewpoint entity in the text to be analyzed is determined; Wherein, the viewpoint entity includes evaluation object and evaluating word, and the LSTM model is according to more with viewpoint entity indicia What a training sample training obtained;
By presetting heuristic rule, being associated between the evaluation object and the evaluating word is established, described in determination The corresponding viewpoint of text to be analyzed.
It is optionally, described that being associated between the evaluation object and the evaluating word is established by default heuristic rule, Include:
For each evaluation object, the evaluation object is calculated respectively the distance between with each evaluating word;
Determine the minimum range in all distances;
Determine that the corresponding evaluating word of the minimum range is objective appraisal word;
Establish being associated between the evaluation object and the objective appraisal word.
Optionally, the step of training the LSTM model in advance, comprising:
Obtain multiple training samples;
For each training sample, which is input to default LSTM model, to the default LSTM model into Row training, obtains trained LSTM model, wherein word has viewpoint entity indicia in the training sample.
Optionally, the viewpoint entity indicia includes the label being labeled by sequence labelling mode.
Optionally, after the acquisition text to be analyzed, the method also includes:
The viewpoint irrelevant contents in the text to be analyzed are filtered, text to be analyzed after being filtered;
It is described based on shot and long term memory network LSTM model trained in advance, determine that the viewpoint in the text to be analyzed is real The step of body, comprising:
Based on the LSTM model, the viewpoint entity after the filtering in text to be analyzed is determined.
Second aspect, the embodiment of the invention provides a kind of viewpoint draw-out devices, comprising:
First obtains module, for obtaining text to be analyzed;
Determining module, for determining the text to be analyzed based on shot and long term memory network LSTM model trained in advance In viewpoint entity;Wherein, the viewpoint entity includes evaluation object and evaluating word, and the LSTM model is according to viewpoint What multiple training samples training of entity indicia obtained;
Module is established, for establishing the pass between the evaluation object and the evaluating word by presetting heuristic rule Connection, with the corresponding viewpoint of the determination text to be analyzed.
It is optionally, described to establish module, comprising:
Computational submodule calculates the evaluation object respectively between each evaluating word for being directed to each evaluation object Distance;
First determines submodule, for determining the minimum range in all distances;
Second determines submodule, for determining that the corresponding evaluating word of the minimum range is objective appraisal word;
Setting up submodule, for establishing being associated between the evaluation object and the objective appraisal word.
Optionally, described device further include:
Second obtains module, for obtaining multiple training samples;
The training sample is input to default LSTM model, to described pre- for being directed to each training sample by training module If LSTM model is trained, trained LSTM model is obtained, wherein word has viewpoint entity mark in the training sample Note.
Optionally, the viewpoint entity indicia includes the label being labeled by sequence labelling mode.
Optionally, described device further includes filtering module, is used for after the acquisition text to be analyzed, described in filtering Viewpoint irrelevant contents in text to be analyzed, text to be analyzed after being filtered;
The determining module is specifically used for being based on the LSTM model, determines the sight after the filtering in text to be analyzed Point entity.
The third aspect, the embodiment of the invention provides a kind of viewpoint extracting devices, including processor, communication interface, storage Device and communication bus, wherein the processor, the communication interface, the memory are completed mutual by the communication bus Between communication;
The memory, for storing computer program;
The processor, when for executing the program stored on the memory, the method for realizing above-mentioned first aspect Step.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes the method step of above-mentioned first aspect Suddenly.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes the method and step of above-mentioned first aspect.
Viewpoint abstracting method, device and equipment provided in an embodiment of the present invention, available text to be analyzed;Based on preparatory ((Long Short-Term Memory, LSTM) model, determines the viewpoint in text to be analyzed to trained shot and long term memory network Entity;Wherein, viewpoint entity includes evaluation object and evaluating word, and LSTM model is according to the multiple instructions for having viewpoint entity indicia Practice what sample training obtained;By presetting heuristic rule, being associated between evaluation object and evaluating word is established, to determine wait divide Analyse the corresponding viewpoint of text.By LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, and then by pre- If heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed.And LSTM Model is obtained according to the training sample training with viewpoint entity indicia, so that without being in viewpoint extraction process Word dependence relationship library is established, and carries out syntax parsing for the sentence in text, the processes such as analysis syntactic structure can drop The computation complexity of low viewpoint extraction process.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow chart of viewpoint abstracting method provided in an embodiment of the present invention;
Fig. 2 is to establish associated flow chart between evaluation object and evaluating word in the embodiment of the present invention;
Fig. 3 is the flow chart of LSTM model training in the embodiment of the present invention;
Fig. 4 is the schematic network structure of LSTM model in the embodiment of the present invention;
Fig. 5 is another flow chart of viewpoint abstracting method provided in an embodiment of the present invention;
Fig. 6 is the structural schematic diagram of viewpoint draw-out device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of viewpoint extracting device provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
For video content etc., how is the desired feedback for knowing spectators such as content production team and operation team, spectators' Where is focus, which comment etc. carried out for the various aspects of entire works by spectators, can be by original interior to user Hold (User Generated Content, abbreviation UGC) comment text and carry out opining mining, to effectively support content production The work of team and operation team etc..
In comment information that mass users generate etc., the viewpoint of user is extracted.The mode master of viewpoint information is extracted at present It is divided into two major classes: the classification of document tendentiousness and information extraction.The classification of document tendentiousness mainly stresses to carry out emotion point to text Class research, such as commendation, derogatory sense or neutrality, and information extraction then focuses on excavating each component part of User Perspective, as opinion is held The person of having, evaluation object, evaluating word etc..Currently, it is mainly information extraction mode that relatively common viewpoint, which extracts mode,.
In a kind of existing mode, word dependence relationship library is initially set up, is then carried out according to the word dependence relationship library Viewpoint extracts.This mode needs to segment text during establishing word dependence relationship library, obtains in text Word and the corresponding part of speech of word, and subordinate sentence is carried out to text, syntax parsing is carried out to each subordinate sentence, analyzes syntactic structure; Then candidate evaluations word, candidate evaluations object and word dependence relationship path are determined, and then establishes word dependence relationship Library.So that the establishment process in word dependence relationship library is more complicated, and then it is more complicated to will cause viewpoint extraction process.
A kind of viewpoint abstracting method provided in an embodiment of the present invention, training is for determining viewpoint entity in text in advance LSTM model determines the viewpoint entity in text to be analyzed, i.e. evaluation object and evaluating word based on the LSTM model;Then lead to again The association that heuristic rule realizes viewpoint entity is crossed, with the User Perspective in determination text to be analyzed.
Viewpoint abstracting method provided in an embodiment of the present invention is also just not necessarily to be directed to without relying on word dependence relationship library Sentence in text carries out syntax parsing, carries out the processes such as analysis syntactic structure and establishes word dependence relationship library, and the present invention LSTM model is obtained by the training sample training with viewpoint entity indicia in embodiment.In this way, real through the invention The viewpoint abstracting method for applying example offer, can reduce the computation complexity of viewpoint extraction process.And it is not necessarily in the embodiment of the present invention Preposition dependence, such as need not rely upon evaluation object seed set or grammer dependency tree.
Viewpoint abstracting method provided in an embodiment of the present invention can be applied to electronic equipment.Specifically, which can To include desktop computer, portable computer, intelligent mobile terminal etc..It such as can be applied to hereafter described viewpoint extraction to set It is standby.In order to which the scheme of the embodiment of the present invention is more clearly understood, viewpoint provided in an embodiment of the present invention is extracted below Method is described in detail.
The embodiment of the invention provides a kind of viewpoint abstracting methods, as shown in Figure 1, comprising:
S101 obtains text to be analyzed.
Text to be analyzed can be the text of diversified forms.For example, word format or the text of TXT format etc.;Or it can Be in microblogging text or videoconference client user be directed to video comment information etc..
Viewpoint is extracted generally be directed to user to the comment information of event, product etc., and simple understanding can also be analysed to Text is known as commenting on corpus.
Specifically, electronic equipment can comment on the comment information that user is collected in area from microblogging;Or it can also be from video visitor The comment information of user is collected in the comment area at family end;Or microblogging, videoconference client etc. will be commented on when generating comment information Information preservation is in preset text, and when needing to analyze comment information, electronic equipment is directly from the preset text It is middle to obtain text to be analyzed.It is not restricted in the embodiment of the present invention.
S102 determines the viewpoint entity in text to be analyzed based on LSTM model trained in advance.
Wherein, viewpoint entity includes evaluation object and evaluating word.
LSTM model is obtained according to multiple training samples training with viewpoint entity indicia.
And viewpoint entity indicia may include the label of diversified forms.Such as can be and be only labeled with each word is evaluation pair As or evaluating word label;Or viewpoint entity indicia may include the label being labeled by sequence labelling mode, etc. Deng.
The LSTM model for extracting viewpoint entity is trained in advance.It is so directed to text to be analyzed, it can be based on instruction in advance The experienced LSTM model, determines the viewpoint entity in text to be analyzed.Specifically, the text input LSTM mould can be analysed to Type obtains the viewpoint entity in text to be analyzed by the LSTM model.
Viewpoint entity may include evaluation object and evaluating word.The object that evaluation object, that is, User Perspective is acted on, evaluation It is specifically evaluated in word, that is, viewpoint.
In a kind of achievable mode, viewpoint entity can also include Appraising subject, that is, the user etc. to make an appraisal.Work as needs When including Appraising subject in the viewpoint entity of extraction, then in the LSTM model process that viewpoint entity is extracted in training, it can mark The Appraising subject of training sample, and then by training obtained LSTM model, it include Appraising subject in determining viewpoint entity.
S103 establishes being associated between evaluation object and evaluating word, by presetting heuristic rule with determination text to be analyzed This corresponding viewpoint.
After extracting the viewpoint entity in text to be analyzed, it is associated operation, with the corresponding sight of determination text to be analyzed Point.
Specifically operation associated may include: to establish being associated between evaluation object and evaluating word, that is, determine text to be analyzed Which evaluating word modifies which evaluation object in this.It is simple to understand, that is, establish the modification between evaluating word and evaluation object Relationship.
Default heuristic rule can be to be obtained after a large amount of texts by analyzing.It specifically may include having modification to close The characteristics of evaluating word and evaluation object of system, etc..Such as, evaluation object and evaluating word are separated by sentence termination punctuation mark, then Think there can not be modified relationship between the evaluation object and evaluating word;As evaluation object with and the evaluation object distance it is most short Evaluating word between there are modified relationships, etc..
In this way, search the evaluation object and evaluating word for meeting the default heuristic rule, and establish satisfaction preset it is heuristic Association between the evaluation object and evaluating word of rule, completion is operation associated, determines the viewpoint in text to be analyzed.
In the embodiment of the present invention, by LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, in turn By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed. And LSTM model is obtained according to the training sample training with viewpoint entity indicia, so that in viewpoint extraction process Without in order to establish word dependence relationship library, and syntax parsing is carried out for the sentence in text, the processes such as analysis syntactic structure, It can reduce the computation complexity of viewpoint extraction process.
On the basis of above-mentioned embodiment illustrated in fig. 1, as shown in Fig. 2, step S103 may include:
S1031 calculates the evaluation object respectively the distance between with each evaluating word for each evaluation object.
Evaluation object in text to be analyzed may be one, two it is either multiple.Similarly, evaluating word it could also be possible that One, two either it is multiple.
It is successively directed to each evaluation object, calculates separately the distance between the evaluation object and each evaluating word.Here institute The distance said can be the number of the word or character that are spaced between evaluation object and evaluating word;Or it can be with a word Perhaps character is that step-length determines from evaluation object to evaluating word or is to the number for the step-length passed through from evaluating word to evaluation The distance, etc..
S1032 determines the minimum range in all distances.
S1033 determines that the corresponding evaluating word of minimum range is objective appraisal word.
When there are multiple evaluating words, then accordingly it is calculated multiple between an evaluation object and multiple evaluating words Distance.Minimum value is selected from multiple distances, that is, determines minimum range.For an evaluation object, multiple distances respectively with it is more There are corresponding relationships between a evaluating word, in this way, can then determine the corresponding evaluation of minimum range when determining minimum range Word, and the evaluating word is determined as objective appraisal word.
In a kind of fairly simple mode, there is only an evaluation object and an evaluating word in text to be analyzed, then may be used To directly determine the distance between the evaluation object and the evaluating word as minimum range, which is objective appraisal word.
S1034 establishes being associated between the evaluation object and objective appraisal word.
For each evaluation object, it can determine the corresponding objective appraisal word of the evaluation object, establish each evaluation respectively Association between the corresponding objective appraisal word of object is completed operation associated.
Specifically, establishing association can be the corresponding pass established between the corresponding objective appraisal word of each evaluation object System.It such as can be and save the corresponding objective appraisal word correspondence of each evaluation object, or can be and be respectively formed Each evaluation object it is corresponding objective appraisal word composition association to, etc..
In a kind of concrete implementation mode, corresponding evaluation object queue can be constructed for all evaluation objects TargetList, and corresponding evaluating word queue opinionList is constructed for all evaluating words.Traverse targetList In element, by element in targetList and its be associated apart from nearest evaluating word.In a kind of situation, if The element number of targetList is less than the element number of opinionList, traverses the element in targetList.
In the embodiment of the present invention, after determining evaluating word and the evaluation object in text to be analyzed, pass through heuristic rule Establish being associated between evaluation object and evaluating word.Specifically, by calculate each evaluation object respectively with each evaluating word it Between distance, determine the corresponding evaluating word of each evaluation object, and then establish the corresponding target of each evaluation object Association between evaluating word.Being compared to correlation extraction has higher recall rate.
On the basis of the above embodiments, can also include the steps that training LSTM model in advance in the embodiment of the present invention. Specifically, as shown in figure 3, may include:
S104 obtains multiple training samples.
Similar to text to be analyzed, training sample can be the text of diversified forms.For example, word format or TXT format Text etc.;It or may include the comment information etc. that user is directed to video in microblogging text or videoconference client.
In order to improve trained accuracy, electronic equipment obtains multiple training samples, such as 500,1000,2000 Deng.
The training sample is input to default LSTM model for each training sample by S105, to default LSTM model into Row training, obtains trained LSTM model.
Wherein, word has viewpoint entity indicia in the training sample.
After obtaining training sample, training sample can be marked.It, can be by manually marking in a kind of achievable mode The mode of note is marked.
Specifically, the word for belonging to evaluation object or evaluating word can be marked for the word in training sample, The label is viewpoint entity indicia.In a kind of optional embodiment of the present invention, it can be marked by sequence labelling mode, It can be such as marked by BIO (begin inside others) mark system.It is specific as follows:
B_T:begin of the target, expression belong to evaluation object target, start in training text;
I_T:inside the target, expression belongs to target, among training text;
B_O:begin of the opinion, expression belong to evaluating word opinion, start in training text;
I_O:inside the opinion, expression belongs to opinion, among training text;
O:others, expression are not belonging to target and are also not belonging to opinion.
And verified in order to the result obtained to training, guarantee the accuracy for the LSTM model that training obtains, this Training sample can be divided into training set and test set in inventive embodiments.Training sample is used for training process in training set, surveys The training sample that examination is concentrated is for verifying training result.
Because LSTM model has a preferable performance in sequence labelling task, in the embodiment of the present invention, by that will have The training sample of mark is input to default LSTM model, is trained to default LSTM model, obtains LSTM model.
Specifically, presetting LSTM model may include parameter to be measured, and training sample is inputted default LSTM model, adjustment to Parameter is surveyed, so that the output of default LSTM model infinitely approaches the viewpoint entity indicia marked in advance, such as default LSTM model Output and viewpoint entity indicia between cost function when restraining, determine parameter to be measured, what is obtained includes determining ginseng to be measured Several default LSTM models is the LSTM model that training obtains.Wherein, parameter to be measured may include: the hidden layer number of plies, hide The quantity of layer neuron, batch size, learning rate and/or the number of iterations, etc..
In addition, in order to avoid the noise information in training sample, such as emoticon, "@... " content in microblogging text Deng, in a kind of achievable mode, before being labeled to training sample, first training sample is pre-processed, it specifically, can be with Filter the noise information in training sample.
Specifically, the LSTM prototype network structure that training obtains is as shown in Figure 4.It, will after training obtains the LSTM model The text input to be analyzed LSTM model, can extract the viewpoint entity in text to be analyzed, such as evaluation object and evaluating word. Such as, text to be analyzed includes: " I likes China ";By the LSTM hidden layer encoder in LSTM model, wherein hidden layer packet Multiple Hidden units are included, Hidden unit includes parameter L, and by classification layer, it is also believed to return classification layer Softmax, Wherein, classification layer includes multiple taxons, and taxon includes parameter c1、c2、c3And c4;It is finally defeated with BIO labeling form The corresponding O of evaluation object and evaluating word in the text to be analyzed out, such as " I ", " love " corresponding B-O, " in " B-T is corresponded to, " state " is right Answer I-T.
The LSTM model for extracting viewpoint entity is trained in the embodiment of the present invention in advance, it is true to be then based on the LSTM model Viewpoint entity in fixed text to be analyzed, can so quickly determine the viewpoint entity in text to be analyzed, and can be improved The accuracy that viewpoint entity determines.And the excavation based on information, evaluation object is extracted, the viewpoint that can accurately position user is made With object, the extraction of evaluating word then more specific, more targeted can determine viewpoint.
In an alternative embodiment of the present invention, as shown in figure 5, after step slol, can also include:
S106 filters the viewpoint irrelevant contents in text to be analyzed, text to be analyzed after being filtered.
It can be appreciated that being pre-processed to text to be analyzed.
Wherein, viewpoint irrelevant contents can be understood as the content for not influencing viewpoint expression, specifically may include not influencing The content that viewpoint extracts.It such as may include emoticon, uniform resource locator (Uniform Resource Locator, URL) Link, and/or spcial character etc..For example, being directed to microblogging text, the text of similar " * * * " this some user of Ai Te is filtered out This, because such text can generate interference for viewpoint extraction.
Step S102 determines the viewpoint in text to be analyzed based on shot and long term memory network LSTM model trained in advance The step of entity, comprising:
S1020 is based on LSTM model, determines the viewpoint entity after filtering in text to be analyzed.
Step S103 establishes being associated between evaluation object and evaluating word, by presetting heuristic rule to determine wait divide The corresponding viewpoint of text is analysed, may include:
S1030 establishes being associated between evaluation object and evaluating word, after determining filtering by presetting heuristic rule The corresponding viewpoint of text to be analyzed.
Specifically, step S1020 is similar with step S102 in above-described embodiment, S1030 and step in above-described embodiment S103 is similar, just repeats no more here.
In the embodiment of the present invention, before carrying out viewpoint extraction, first filters out and do not have effective noise to believe viewpoint extraction Breath can be improved the precision of viewpoint extraction such as viewpoint irrelevant contents.
The embodiment of the invention also provides a kind of viewpoint draw-out devices, as shown in Figure 6, comprising:
First obtains module 601, for obtaining text to be analyzed;
Determining module 602, for determining in text to be analyzed based on shot and long term memory network LSTM model trained in advance Viewpoint entity;Wherein, viewpoint entity includes evaluation object and evaluating word, and LSTM model is according to viewpoint entity indicia What multiple training sample training obtained;
Module 603 is established, for establishing being associated between evaluation object and evaluating word by default heuristic rule, with Determine the corresponding viewpoint of text to be analyzed.
In the embodiment of the present invention, by LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, in turn By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed. And LSTM model is obtained according to the training sample training with viewpoint entity indicia, so that in viewpoint extraction process Without in order to establish word dependence relationship library, and syntax parsing is carried out for the sentence in text, the processes such as analysis syntactic structure, It can reduce the computation complexity of viewpoint extraction process.
Optionally, module 603 is established, comprising:
Computational submodule calculates the evaluation object respectively between each evaluating word for being directed to each evaluation object Distance;
First determines submodule, for determining the minimum range in all distances;
Second determines submodule, for determining that the corresponding evaluating word of minimum range is objective appraisal word;
Setting up submodule, for establishing being associated between the evaluation object and objective appraisal word.
Optionally, the device further include:
Second obtains module, for obtaining multiple training samples;
The training sample is input to default LSTM model, to default for being directed to each training sample by training module LSTM model is trained, and obtains trained LSTM model, wherein word has viewpoint entity indicia in the training sample.
Optionally, viewpoint entity indicia includes the label being labeled by sequence labelling mode.
Optionally, which further includes filtering module, for filtering text to be analyzed after obtaining text to be analyzed In viewpoint irrelevant contents, text to be analyzed after being filtered;
Determining module 602 is specifically used for being based on LSTM model, determines the viewpoint entity after filtering in text to be analyzed.
It should be noted that viewpoint draw-out device provided in an embodiment of the present invention is the dress using above-mentioned viewpoint abstracting method It sets, then all embodiments of above-mentioned viewpoint abstracting method are suitable for the device, and can reach the same or similar beneficial to effect Fruit.
The embodiment of the invention also provides a kind of viewpoint extracting devices, as shown in fig. 7, comprises processor 701, communication interface 702, memory 703 and communication bus 704, wherein processor 701, communication interface 702, memory 703 pass through communication bus 704 complete mutual communication.
Memory 703, for storing computer program;
Processor 701 when for executing the program stored on memory 703, realizes the side of above-mentioned viewpoint abstracting method Method step.
In the embodiment of the present invention, by LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, in turn By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed. And LSTM model is obtained according to the training sample training with viewpoint entity indicia, so that in viewpoint extraction process Without in order to establish word dependence relationship library, and syntax parsing is carried out for the sentence in text, the processes such as analysis syntactic structure, It can reduce the computation complexity of viewpoint extraction process.
The communication bus that above-mentioned viewpoint extracting device is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, abbreviation EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control Bus processed etc..Only to be indicated with a thick line in figure convenient for indicating, it is not intended that an only bus or a type of total Line.
Communication interface is for the communication between above-mentioned viewpoint extracting device and other equipment.
Memory may include random access memory (Random Access Memory, abbreviation RAM), also may include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, Abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array, Abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes above-mentioned viewpoint abstracting method Method and step.
In the embodiment of the present invention, by LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, in turn By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed. And LSTM model is obtained according to the training sample training with viewpoint entity indicia, so that in viewpoint extraction process Without in order to establish word dependence relationship library, and syntax parsing is carried out for the sentence in text, the processes such as analysis syntactic structure, It can reduce the computation complexity of viewpoint extraction process.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes the method and step of above-mentioned viewpoint abstracting method.
In the embodiment of the present invention, by LSTM model trained in advance, the viewpoint entity in text to be analyzed is determined, in turn By presetting heuristic rule, being associated between evaluation object and evaluating word is established, with the corresponding viewpoint of determination text to be analyzed. And LSTM model is obtained according to the training sample training with viewpoint entity indicia, so that in viewpoint extraction process Without in order to establish word dependence relationship library, and syntax parsing is carried out for the sentence in text, the processes such as analysis syntactic structure, It can reduce the computation complexity of viewpoint extraction process.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For equipment, computer readable storage medium and computer program product embodiments, implement since it is substantially similar to method Example, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (11)

1. a kind of viewpoint abstracting method characterized by comprising
Obtain text to be analyzed;
Based on shot and long term memory network LSTM model trained in advance, the viewpoint entity in the text to be analyzed is determined;Wherein, The viewpoint entity includes evaluation object and evaluating word, and the LSTM model is according to the multiple training for having viewpoint entity indicia What sample training obtained;
By presetting heuristic rule, being associated between the evaluation object and the evaluating word is established, with determining described wait divide Analyse the corresponding viewpoint of text.
2. the method according to claim 1, wherein it is described by preset heuristic rule, establish the evaluation Being associated between object and the evaluating word, comprising:
For each evaluation object, the evaluation object is calculated respectively the distance between with each evaluating word;
Determine the minimum range in all distances;
Determine that the corresponding evaluating word of the minimum range is objective appraisal word;
Establish being associated between the evaluation object and the objective appraisal word.
3. the method according to claim 1, wherein the step of training the LSTM model in advance, comprising:
Obtain multiple training samples;
For each training sample, which is input to default LSTM model, the default LSTM model is instructed Practice, obtain trained LSTM model, wherein word has viewpoint entity indicia in the training sample.
4. method according to any one of claims 1 to 3, which is characterized in that the viewpoint entity indicia includes passing through sequence The label that column notation methods are labeled.
5. method according to any one of claims 1 to 3, which is characterized in that after the acquisition text to be analyzed, institute State method further include:
The viewpoint irrelevant contents in the text to be analyzed are filtered, text to be analyzed after being filtered;
It is described based on shot and long term memory network LSTM model trained in advance, determine the viewpoint entity in the text to be analyzed Step, comprising:
Based on the LSTM model, the viewpoint entity after the filtering in text to be analyzed is determined.
6. a kind of viewpoint draw-out device characterized by comprising
First obtains module, for obtaining text to be analyzed;
Determining module, for determining in the text to be analyzed based on shot and long term memory network LSTM model trained in advance Viewpoint entity;Wherein, the viewpoint entity includes evaluation object and evaluating word, and the LSTM model is according to viewpoint entity What multiple training samples training of label obtained;
Module is established, for establishing being associated between the evaluation object and the evaluating word by default heuristic rule, with Determine the corresponding viewpoint of the text to be analyzed.
7. device according to claim 6, which is characterized in that described to establish module, comprising:
Computational submodule calculates the evaluation object respectively the distance between with each evaluating word for being directed to each evaluation object;
First determines submodule, for determining the minimum range in all distances;
Second determines submodule, for determining that the corresponding evaluating word of the minimum range is objective appraisal word;
Setting up submodule, for establishing being associated between the evaluation object and the objective appraisal word.
8. device according to claim 6, which is characterized in that described device further include:
Second obtains module, for obtaining multiple training samples;
The training sample is input to default LSTM model, to described default for being directed to each training sample by training module LSTM model is trained, and obtains trained LSTM model, wherein word has viewpoint entity indicia in the training sample.
9. according to the described in any item devices of claim 6 to 8, which is characterized in that the viewpoint entity indicia includes passing through sequence The label that column notation methods are labeled.
10. according to the described in any item devices of claim 6 to 8, which is characterized in that described device further includes filtering module, is used In after the acquisition text to be analyzed, the viewpoint irrelevant contents in the text to be analyzed are filtered, wait divide after being filtered Analyse text;
The determining module is specifically used for being based on the LSTM model, determines that the viewpoint after the filtering in text to be analyzed is real Body.
11. a kind of viewpoint extracting device, which is characterized in that including processor, communication interface, memory and communication bus, wherein The processor, the communication interface, the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, realizes any side claim 1-5 Method step.
CN201811037185.8A 2018-09-06 2018-09-06 A kind of viewpoint abstracting method, device and equipment Pending CN109299400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811037185.8A CN109299400A (en) 2018-09-06 2018-09-06 A kind of viewpoint abstracting method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811037185.8A CN109299400A (en) 2018-09-06 2018-09-06 A kind of viewpoint abstracting method, device and equipment

Publications (1)

Publication Number Publication Date
CN109299400A true CN109299400A (en) 2019-02-01

Family

ID=65166107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811037185.8A Pending CN109299400A (en) 2018-09-06 2018-09-06 A kind of viewpoint abstracting method, device and equipment

Country Status (1)

Country Link
CN (1) CN109299400A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966832A (en) * 2020-08-21 2020-11-20 网易(杭州)网络有限公司 Evaluation object extraction method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1853180A (en) * 2003-02-14 2006-10-25 尼维纳公司 System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field
CN103631961A (en) * 2013-12-17 2014-03-12 苏州大学张家港工业技术研究院 Method for identifying relationship between sentiment words and evaluation objects
CN105447036A (en) * 2014-08-29 2016-03-30 华为技术有限公司 Opinion mining-based social media information credibility evaluation method and apparatus
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1853180A (en) * 2003-02-14 2006-10-25 尼维纳公司 System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field
CN103631961A (en) * 2013-12-17 2014-03-12 苏州大学张家港工业技术研究院 Method for identifying relationship between sentiment words and evaluation objects
CN105447036A (en) * 2014-08-29 2016-03-30 华为技术有限公司 Opinion mining-based social media information credibility evaluation method and apparatus
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966832A (en) * 2020-08-21 2020-11-20 网易(杭州)网络有限公司 Evaluation object extraction method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109299457A (en) A kind of opining mining method, device and equipment
WO2018086470A1 (en) Keyword extraction method and device, and server
CN103544188B (en) The user preference method for pushing of mobile Internet content and device
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN104951539A (en) Internet data center harmful information monitoring system
US20170236023A1 (en) Fast Pattern Discovery for Log Analytics
CN105528422A (en) Focused crawler processing method and apparatus
CN103336766A (en) Short text garbage identification and modeling method and device
CN112311803B (en) Rule base updating method and device, electronic equipment and readable storage medium
CN104899324A (en) Sample training system based on IDC (internet data center) harmful information monitoring system
CN104346408B (en) A kind of method and apparatus being labeled to the network user
JP6776310B2 (en) User-Real-time feedback information provision methods and systems associated with input content
CN108717519B (en) Text classification method and device
CN109992781B (en) Text feature processing method and device and storage medium
CN111182162A (en) Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
CN113688212B (en) Sentence emotion analysis method, device and equipment
CN107679213A (en) Exercise searching method and system and terminal equipment
CN109299277A (en) The analysis of public opinion method, server and computer readable storage medium
CN110362663A (en) Adaptive more perception similarity detections and parsing
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN113821592B (en) Data processing method, device, equipment and storage medium
CN108932320A (en) Article search method, apparatus and electronic equipment
CN107766234A (en) A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN110909230A (en) Network hotspot analysis method and system
CN112085087A (en) Method and device for generating business rules, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201

RJ01 Rejection of invention patent application after publication