CN110032736A - A kind of text analyzing method, apparatus and storage medium - Google Patents

A kind of text analyzing method, apparatus and storage medium Download PDF

Info

Publication number
CN110032736A
CN110032736A CN201910220954.6A CN201910220954A CN110032736A CN 110032736 A CN110032736 A CN 110032736A CN 201910220954 A CN201910220954 A CN 201910220954A CN 110032736 A CN110032736 A CN 110032736A
Authority
CN
China
Prior art keywords
text
network model
sample
emotional value
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910220954.6A
Other languages
Chinese (zh)
Inventor
陈海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deep Blue Technology Shanghai Co Ltd
Original Assignee
Deep Blue Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deep Blue Technology Shanghai Co Ltd filed Critical Deep Blue Technology Shanghai Co Ltd
Priority to CN201910220954.6A priority Critical patent/CN110032736A/en
Publication of CN110032736A publication Critical patent/CN110032736A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of text analyzing method, apparatus and storage mediums, are related to text classification field, to solve in the prior art, the problem of emotional semantic classification to be considered as to the general task of text classification, and has ignored the emotional factor that text contains.In this method, the primary vector of text to be analyzed being made of emotional value is obtained by sentiment dictionary, and memory network model obtains the secondary vector of text to be analyzed being made of attention weight in short-term by the length with attention mechanism.If calculating the distance of primary vector and secondary vector less than the first preset threshold, the emotion of text representation to be analyzed is obtained.In this way, realizing the excavation of the emotional factor to this paper to be analyzed by the way that sentiment dictionary, attention mechanism and long memory network model in short-term to be combined.

Description

A kind of text analyzing method, apparatus and storage medium
Technical field
This application involves text classification field more particularly to a kind of text analyzing method, apparatus and storage mediums.
Background technique
The internet of rapid development has had become inalienable part in people's daily life.According to China The newest report of inter network information center (CNNIC), oneself has been reached 7.72 hundred million for the quantity of Chinese netizen, and continues to keep Grow steadily, maximum motive force is exactly the appearance of the instant media of countless emerging networks and flourishes among these, thus therewith Produce the text information of magnanimity.How to carry out mining analysis to text information becomes a vital task of big data analysis. It is relatively simple at present to the mining analysis of text information, the deep layer meaning cannot be excavated, so needing a kind of new text analyzing Method.
Summary of the invention
Application embodiment provides a kind of text analyzing method, apparatus and storage medium, to solve in the prior art, to text The mining analysis of this information, it is relatively simple, the problem of deep layer is looked like cannot be excavated.
In a first aspect, the embodiment of the present application provides a kind of text analyzing method, this method comprises:
Obtain text to be analyzed;
By the preparatory trained length memory network model analysis in short-term with attention mechanism to the text to be analyzed This is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model is trained according to following methods It arrives:
Read sample text;And the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary; And the sample text of reading is input to obtained in the network model to be trained the sample text by attention weight The secondary vector of composition;
The distance of the primary vector and the secondary vector is calculated, makes institute by adjusting the parameter of the network model Distance is stated less than the first preset threshold.
Second aspect, the embodiment of the present application provide a kind of text analyzing device, which includes:
Text module is obtained, for obtaining text to be analyzed;
Analysis module, for passing through the preparatory trained length memory network model analysis pair in short-term with attention mechanism The text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model be according to What lower method training obtained:
Vector module is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by emotion It is worth the primary vector constituted;And the sample text of reading is input in the network model to be trained and obtains the sample The secondary vector of text being made of attention weight;
Computing module, for calculating the distance of the primary vector and the secondary vector, by adjusting the network mould The parameter of type makes the distance less than the first preset threshold.
The third aspect, another embodiment of the application additionally provide a kind of computing device, including at least one processor;With And;
The memory being connect at least one described processor communication;Wherein, the memory be stored with can by it is described extremely The instruction that a few processor executes, described instruction are executed by least one described processor, so that at least one described processing Device is able to carry out a kind of text analyzing method provided by the embodiments of the present application.
Fourth aspect, another embodiment of the application additionally provide a kind of computer storage medium, wherein the computer is deposited Storage media is stored with computer executable instructions, and the computer executable instructions are for making computer execute the embodiment of the present application One of text analyzing method.
A kind of text analyzing method, apparatus provided by the embodiments of the present application and storage medium, by sentiment dictionary obtain to The primary vector of text being made of emotional value is analyzed, and memory network model obtains in short-term by the length with attention mechanism To the secondary vector of text to be analyzed being made of attention weight.If calculating the distance of primary vector and secondary vector less than the One preset threshold then obtains the emotion of text representation to be analyzed.In this way, by by sentiment dictionary, attention mechanism and length When memory network model be combined, realize the excavation of the emotional factor to this paper to be analyzed.
Other features and advantage will illustrate in the following description, also, partly become from specification It obtains it is clear that being understood and implementing the application.The purpose of the application and other advantages can be by written explanations Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 be the embodiment of the present application in training have attention mechanism length in short-term memory network model process signal Figure;
Fig. 2 is the flow diagram that primary vector is obtained in the embodiment of the present application;
Fig. 3 is the flow diagram that secondary vector is obtained in the embodiment of the present application;
Fig. 4 is the structural schematic diagram of the length memory network model in short-term in the embodiment of the present application with attention mechanism;
Fig. 5 is the structural schematic diagram of long memory network model in short-term in the embodiment of the present application;
Fig. 6 is the flow diagram of loss function adjustment in the embodiment of the present application;
Fig. 7 is the flow diagram of text analyzing in the embodiment of the present application;
Fig. 8 is text analyzing structural schematic diagram in the embodiment of the present application;
Fig. 9 is the structural schematic diagram according to the computing device of the application embodiment.
Specific embodiment
In order to solve that emotional semantic classification is considered as to the general task of text classification in the prior art, and has ignored text and contain Emotional factor the problem of, a kind of text analyzing method, apparatus and storage medium are provided in the embodiment of the present application.In order to better Understand technical solution provided by the embodiments of the present application, the basic principle of the program done briefly describe here:
A kind of text analyzing method, apparatus provided by the embodiments of the present application and storage medium, by sentiment dictionary obtain to The primary vector of text being made of emotional value is analyzed, and memory network model obtains in short-term by the length with attention mechanism To the secondary vector of text to be analyzed being made of attention weight.If calculating the distance of primary vector and secondary vector less than the One preset threshold then obtains the emotion of text representation to be analyzed.In this way, by by sentiment dictionary, attention mechanism and length When memory network model be combined, realize the excavation of the emotional factor to this paper to be analyzed.The emotional factor of excavation can Preferably characterization text true intention to be expressed.
Text analyzing refers to the expression to text and its selection of characteristic item;Text analyzing is text mining, information retrieval A basic problem, it quantifies the Feature Words extracted from text to indicate text information.The semantic meeting of text Reflect specific position, viewpoint, value and interests.Emotion element abundant is generally comprised in these information, is had centainly Researching value, therefore, how to efficiently use these information carry out text emotion analysis be increasingly becoming natural language processing and The big hot subject of the one of artificial intelligence field.
If emotional semantic classification to be considered as to the general task of text classification, and have ignored the emotional factor that text contains.Below To how to train the length with attention mechanism, memory network model is described in detail in short-term.As shown in Figure 1, including following Step:
Step 101: reading sample text.
Step 102: the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary.
Wherein, in sentiment dictionary, each word or phrase are by expert's imparting feeling polarities or emotional intensity, researcher's knot Sentiment dictionary data are closed, artificial rule, the emotional value of judgement sample text are constructed.
Step 103: the sample text of reading being input in the network model to be trained and obtain the sample text The secondary vector being made of attention weight.
Step 104: the distance of the primary vector and the secondary vector is calculated, by adjusting the ginseng of the network model It counts so that the distance is less than the first preset threshold.
In this way, can make to instruct by the way that sentiment dictionary, attention mechanism and long memory network model in short-term to be combined The model perfected carries out text analyzing and more meets the emotion that the mankind are recognized.
In the embodiment of the present application, as shown in Fig. 2, being needed by the emotional value that sentiment dictionary obtains sample text to sample Text is segmented, and obtains the emotional value of each text.Therefore, step 101 is specific implementable for following steps:
Step 201: the text in the sample text being labeled by part-of-speech tagging tool, obtains the word of each text Property.
Wherein, so-called text refers to the result that participle obtains in the embodiment of the present application.Such as Chinese, segment The text arrived is each independent word, such as: Chinese dream includes three texts, and after being segmented, obtained text is respectively " in ", " state ", " dream ", and what each text referred to for foreign language such as English is each English word, such as to I have a After dream is segmented, obtained text is respectively " I ", " have ", " a ", " dream ".
Wherein, part-of-speech tagging refers to for one correct part of speech of each label character in word segmentation result, namely determines every A text is the process of noun, verb, adjective or other parts of speech.
Step 202: inquiry sentiment dictionary obtains emotional value of each text under its each part of speech.
Step 203: the primary vector of the sample text is made of the emotional value of each text.
In one embodiment, there are many paraphrase, such as power for a text possibility;Paraphrase are as follows: strength, swashs at electric power It encourages, fast forward through ....And each paraphrase can correspond to an emotional value, if a text has a variety of paraphrase, it is determined that should The method of the emotional value of text is specifically implementable are as follows:
Step A1: it is directed to each text, determines the paraphrase of the text.
Step A2: the corresponding part of speech of each paraphrase of the text is determined.
Step A3: in sentiment dictionary, the corresponding emotional value of each part of speech of the text is searched.
Step A4: using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as the final of the text Emotional value.
Wherein, for ease of understanding, when with the text of a variety of paraphrase, the emotional value of the text such as step A1-A4's is retouched State available formula (1) expression:
In formula (1), S indicates emotional value, and n indicates the kind number of the text paraphrase, eiIndicate the emotion under every kind of paraphrase Value.
In one embodiment, it if a text has 3 kinds of paraphrase, is counted after determining the corresponding emotional value of each paraphrase respectively Calculate the emotional value of the text.For example, the corresponding emotional value of 3 kinds of paraphrase is respectively 2,4,6, then the emotional value of the text is (2+4+ 6)/3=4, therefore, the emotional value of the text are 4.
In this way, when obtaining the emotional value of each text herein, if passing through above method meter there are many text of paraphrase The emotional value of the text is calculated, to can quickly determine emotional value when facing the text of a variety of paraphrase.
In one embodiment, if a text has a variety of paraphrase, probability highest can be determined according to the context of text A kind of paraphrase, and using the paraphrase as the emotional value of the text.
In the embodiment of the present application, it in order to further increase the accuracy of text analyzing, can filter out in sample text The not high text of emotional value, it is specific it is implementable be step B1- step B2:
Step B1: the emotional value of each text is compared with the second preset threshold.
Step B2: text of the emotional value less than the second preset threshold and its corresponding emotional value are filtered out.
In one embodiment, a word in text is " I have a dream ", obtains each word in the words Emotional value, such as: it be the emotional value of 8, " a " is that the emotional value of 2, " dream " is that the emotional value of " I ", which is the emotional value of 6, " have ", 10.If set second preset threshold is 5, " a " in the words is filtered out, therefore, filtered text is " I have dream".When obtaining the primary vector of the text, using the emotional value of each text in filtered text as in primary vector Element.It should be noted that the second preset threshold can be configured according to the actual situation, the application is without limitation.In this way, mistake The text lower than the second preset threshold has been filtered, the emotional factor of text can have been protruded, to further increase the standard of text analyzing True property.
After obtaining primary vector by sentiment dictionary, it is also necessary to sample text is put into network model to be trained into Row training, obtains the secondary vector about sample text, as shown in figure 3, specific implementable are as follows:
Step 301: each text in the sample text being input in the network model to be trained, obtain each text Word shared attention weight in the sample text.
Wherein, as described above, in the embodiment of the present application, if having filtered out sample text when obtaining the emotional value of text Filtered sample text is input to by the not high text of emotional value in this then when being input to the network model wait train In network model to be trained.
Step 302: by each text, shared attention weight constitutes the second of the sample text in the sample text Vector.
As shown in figure 4, for network model training flow chart to be trained.Wherein, W1, W2 ... Wm are the text of input, will Each text is input in LSTM (long memory network model in short-term) in sample text, can obtain each text institute in the sample text The attention weight (H1, H2 ... Hm) accounted for.Wherein, W and H is corresponded.The shared attention in the sample text by each text Secondary vector of the power weight as the sample text.
The network model to be trained in the embodiment of the present application is described above, below to the LSTM in the network model into Row further instruction.It is specific implementable for step C1- step C3:
Step C1: the state of activation primitive is determined according to the text of input in forgeing gate layer;And according to activation primitive State carries out selectivity to the pre-existing text in model and gives up, and obtains important element.
Step C2: the important element is carried out more according to gating function and the text of the input in input gate layer Newly.
Step C3: according to gating function and activation primitive using updated element as shared by the text in output gate layer Attention weight exported.
In this way, by calling LSTM to be trained sample text, it is available about LSTM for each in sample text Attention weight shared by text.Wherein, Fig. 5 is the structural schematic diagram of LSTM.Wherein, σ is activation primitive, and tanh is gate letter Number.
In the embodiment of the present application, after obtaining primary vector and secondary vector, by calculating primary vector and secondary vector Distance adjust network model to be trained.The distance can be added in the loss function of the network model and be carried out It calculates, is illustrated in figure 6 the flow diagram of this method, it may include:
Step 601: the distance being added in the loss function of the network model.
Step 602: adjusting the parameter in the loss function, make the distance less than the first preset threshold.
Wherein, the formula for calculating distance can be as shown in formula (2):
In formula (2), CviIndicate primary vector, attiIndicate secondary vector, LδIndicate primary vector and secondary vector In the distance between mutual corresponding i-th of element.
It is added to distance above-mentioned as the parameter of loss function in loss function, obtained loss function can be such as formula (3) shown in:
In formula (3), Loss is the value of loss function, and i and j are the indexes of sentence in training set, wherein i and j is not With the index of label, y is the true distribution of label in text, and y^ is the label distribution of model prediction;β ‖ θ ‖ is that L2 regularization is punished Penalty parameter, there is the case where over-fitting in network model to be trained in order to prevent.
In this way, distance and L2 regularization punishment parameter are added in the loss function of the model, by adjusting loss Parameter in function can make trained model more accurate.
It is described in detail above how to train the memory network model in short-term of the length with attention mechanism, below by specific Embodiment to how by trained network model to text to be analyzed carry out text analyzing be described in detail.Fig. 7 is The flow diagram of text analyzing method, comprising the following steps:
Step 701: obtaining text to be analyzed.
Step 702: by the preparatory trained length memory network model analysis in short-term with attention mechanism to described Text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained.
In this way, by trained network model, the emotion of available text representation to be analyzed is treated to realize Analyze the excavation of the emotional factor of this paper.
Based on identical inventive concept, the embodiment of the present application also provides a kind of text analyzing devices.As shown in figure 8, should Device includes:
Text module 801 is obtained, for obtaining text to be analyzed;
Analysis module 802, for memory network model to divide in short-term by the preparatory trained length with attention mechanism The text to be analyzed is analyzed in analysis, obtains the emotion of the text representation to be analyzed;Wherein, the network model is root It is obtained according to following methods training:
Vector module 803 is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by feelings The primary vector that inductance value is constituted;And the sample text of reading is input in the network model to be trained and obtains the sample The secondary vector being made of attention weight of this text;
Computing module 804, for calculating the distance of the primary vector and the secondary vector, by adjusting the network The parameter of model makes the distance less than the first preset threshold.
Further, obtaining vector module 803 includes:
Mark part of speech unit is obtained for being labeled by part-of-speech tagging tool to the text in the sample text The part of speech of each text;
Query unit obtains emotional value of each text under its each part of speech for inquiring sentiment dictionary;
Primary vector unit, for being made of the primary vector of the sample text the emotional value of each text.
Further, query unit includes:
It determines paraphrase subelement, for being directed to each text, determines the paraphrase of the text;
Part of speech subelement is determined, for determining the corresponding part of speech of each paraphrase of the text;
Subelement is searched, for searching the corresponding emotional value of each part of speech of the text in sentiment dictionary;
Emotional value subelement, for using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as this article The final emotional value of word.
Further, obtaining vector module 803 includes:
Attention weight unit, for each text in the sample text to be input to the network model to be trained In, obtain each text shared attention weight in the sample text;
Secondary vector unit, for shared attention weight to constitute the sample text in the sample text by each text This secondary vector.
Further, described device further include:
Comparison module, for obtain vector module 803 according to sentiment dictionary obtain the sample text by emotional value structure At primary vector before, the emotional value of each text is compared with the second preset threshold;
Filtering module, for filtering out text of the emotional value less than the second preset threshold and its corresponding emotional value.
Further, computing module 804 includes:
Adding unit, for the distance to be added in the loss function of the network model;
Adjustment unit makes the distance less than the first preset threshold for adjusting the parameter in the loss function.
It further, include L2 regularization punishment parameter in the loss function.
After describing the method and device of text analyzing of the application illustrative embodiments, next, introducing root According to the computing device of the another exemplary embodiment of the application.
Person of ordinary skill in the field it is understood that the various aspects of the application can be implemented as system, method or Program product.Therefore, the various aspects of the application can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".
In some possible embodiments, according to an embodiment of the present application, computing device can include at least at least one A processor and at least one processor.Wherein, memory is stored with program code, when program code is executed by processor When, so that processor executes the text analyzing method according to the various illustrative embodiments of the application of this specification foregoing description In step 701- step 702.
The computing device 90 of this embodiment according to the application is described referring to Fig. 9.The calculating dress that Fig. 9 is shown Setting 90 is only an example, should not function to the embodiment of the present application and use scope bring any restrictions.The computing device Such as can be mobile phone, tablet computer etc..
As shown in figure 9, computing device 90 is showed in the form of general-purpose calculating appts.The component of computing device 90 may include But it is not limited to: at least one above-mentioned processor 91, above-mentioned at least one processor 92, (including the storage of the different system components of connection Device 92 and processor 91) bus 93.
Bus 911 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, processor or the local bus using any bus structures in a variety of bus structures.
Memory 92 may include the readable medium of form of volatile memory, such as random access memory (RAM) 921 And/or cache memory 922, it can further include read-only memory (ROM) 923.
Memory 92 can also include program/utility 925 with one group of (at least one) program module 924, this The program module 924 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Computing device 90 can also be communicated with one or more external equipments 94 (such as sensing equipment etc.), can also be with one Or it is multiple enable a user to the equipment interacted with computing device 90 communication, and/or with enable the computing device 90 and one Or any equipment (such as router, modem etc.) communication that a number of other computing devices are communicated.This communication can To be carried out by input/output (I/O) interface 95.Also, computing device 90 can also by network adapter 96 and one or The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown, Network adapter 96 is communicated by bus 93 with other modules for computing device 90.It will be appreciated that though be not shown in the figure, Other hardware and/or software module can be used in conjunction with computing device 90, including but not limited to: microcode, device driver, superfluous Remaining processor, external disk drive array, RAID system, tape drive and data backup storage system etc..
In some possible embodiments, the various aspects of text analyzing method provided by the present application are also implemented as A kind of form of program product comprising program code, when program product is run on a computing device, program code is used for Computer equipment is set to execute the side of the text analyzing according to the various illustrative embodiments of the application of this specification foregoing description Step in method executes step 701- step 702 as shown in Figure 7.
Program product can be using any combination of one or more readable mediums.Readable medium can be readable signal Jie Matter or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared The system of line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing is (non- The list of exhaustion) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, the read-only storage of portable compact disc Device (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The text analyzing method of the application embodiment can be using portable compact disc read only memory (CD-ROM) simultaneously Including program code, and can run on the computing device.However, the program product of the application is without being limited thereto, in this document, Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.
Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying Readable program code.The data-signal of this propagation can take various forms, including --- but being not limited to --- electromagnetism letter Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Program in connection.
The program code for including on readable medium can transmit with any suitable medium, including --- but being not limited to --- Wirelessly, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the application operation program Code, programming language include object oriented program language-Java, C++ etc., further include conventional process Formula programming language-such as " C " language or similar programming language.Program code can be calculated fully in user It executes on device, partly execute on a user device, executing, as an independent software package partially in user's computing device Upper part executes on remote computing device or executes on remote computing device or server completely.It is being related to remotely counting In the situation for calculating device, remote computing device can pass through the network of any kind --- including local area network (LAN) or wide area network (WAN)-it is connected to user's computing device, or, it may be connected to external computing device (such as provided using Internet service Quotient is connected by internet).
It should be noted that although being referred to several unit or sub-units of device in the above detailed description, this stroke It point is only exemplary not enforceable.In fact, according to presently filed embodiment, it is above-described two or more The feature and function of unit can embody in a unit.Conversely, the feature and function of an above-described unit can It is to be embodied by multiple units with further division.
In addition, although in the accompanying drawings sequentially to describe the operation of the application method, this does not require that or implies These operations must be sequentially executed according to this, or have to carry out operation shown in whole and be just able to achieve desired result.It is attached Add ground or it is alternatively possible to omit certain steps, multiple steps are merged into a step and are executed, and/or by a step point Solution is execution of multiple steps.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with side In the computer-readable memory of formula work, so that it includes instruction dress that instruction stored in the computer readable memory, which generates, The manufacture set, the command device are realized in one box of one or more flows of the flowchart and/or block diagram or multiple The function of being specified in box.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of text analyzing method, which is characterized in that the described method includes:
Obtain text to be analyzed;
By the preparatory trained length memory network model analysis in short-term with attention mechanism to the text to be analyzed into Row analysis, obtains the emotion of the text representation to be analyzed;Wherein, the network model is obtained according to following methods training :
Read sample text;And the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary;And it will The sample text of reading, which is input in the network model to be trained, obtains being made of attention weight for the sample text Secondary vector;
Calculate the distance of the primary vector and the secondary vector, by adjusting the network model parameter make it is described away from From less than the first preset threshold.
2. the method according to claim 1, wherein it is described according to sentiment dictionary obtain the sample text by The primary vector that emotional value is constituted, specifically includes:
The text in the sample text is labeled by part-of-speech tagging tool, obtains the part of speech of each text;
Sentiment dictionary is inquired, emotional value of each text under its each part of speech is obtained;
The primary vector of the sample text is made of the emotional value of each text.
3. according to the method described in claim 2, it is characterized in that, the inquiry sentiment dictionary, obtains each text in its each word Emotional value under property, specifically includes:
For each text, the paraphrase of the text is determined;
Determine the corresponding part of speech of each paraphrase of the text;
In sentiment dictionary, the corresponding emotional value of each part of speech of the text is searched;
Using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as the final emotional value of the text.
4. the method according to claim 1, wherein described be input to the sample text of reading institute to be trained The secondary vector being made of attention weight for obtaining the sample text in network model is stated, is specifically included:
Each text in the sample text is input in the network model to be trained, obtains each text in the sample Shared attention weight in text;
By each text, shared attention weight constitutes the secondary vector of the sample text in the sample text.
5. according to the method described in claim 2, it is characterized in that, it is described according to sentiment dictionary obtain the sample text by Before the primary vector that emotional value is constituted, the method also includes:
The emotional value of each text is compared with the second preset threshold;
Filter out text of the emotional value less than the second preset threshold and its corresponding emotional value.
6. according to the method described in claim 5, it is characterized in that, the primary vector and the secondary vector of calculating Distance makes the distance less than the first preset threshold, specifically includes by adjusting the parameter of the network model:
The distance is added in the loss function of the network model;
The parameter in the loss function is adjusted, makes the distance less than the first preset threshold.
7. according to the method described in claim 6, it is characterized in that, including L2 regularization punishment parameter in the loss function.
8. a kind of text analyzing device, which is characterized in that described device includes:
Text module is obtained, for obtaining text to be analyzed;
Analysis module, for passing through the preparatory trained length memory network model analysis in short-term with attention mechanism to described Text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model is according to lower section Method training obtains:
Vector module is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by emotional value structure At primary vector;And the sample text of reading is input in the network model to be trained and obtains the sample text The secondary vector being made of attention weight;
Computing module, for calculating the distance of the primary vector and the secondary vector, by adjusting the network model Parameter makes the distance less than the first preset threshold.
9. a kind of computer-readable medium, is stored with computer executable instructions, which is characterized in that the computer is executable to be referred to It enables for executing the method as described in any claim in claim 1-7.
10. a kind of computing device characterized by comprising
At least one processor;And the memory being connect at least one described processor communication;Wherein, the memory is deposited The instruction that can be executed by least one described processor is contained, described instruction is executed by least one described processor, so that institute It states at least one processor and is able to carry out method as described in any claim in claim 1-7.
CN201910220954.6A 2019-03-22 2019-03-22 A kind of text analyzing method, apparatus and storage medium Pending CN110032736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910220954.6A CN110032736A (en) 2019-03-22 2019-03-22 A kind of text analyzing method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910220954.6A CN110032736A (en) 2019-03-22 2019-03-22 A kind of text analyzing method, apparatus and storage medium

Publications (1)

Publication Number Publication Date
CN110032736A true CN110032736A (en) 2019-07-19

Family

ID=67236423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910220954.6A Pending CN110032736A (en) 2019-03-22 2019-03-22 A kind of text analyzing method, apparatus and storage medium

Country Status (1)

Country Link
CN (1) CN110032736A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427330A (en) * 2019-08-13 2019-11-08 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of code analysis
CN110991163A (en) * 2019-11-29 2020-04-10 达而观信息科技(上海)有限公司 Document comparison analysis method and device, electronic equipment and storage medium
CN111291187A (en) * 2020-01-22 2020-06-16 北京芯盾时代科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN116738298A (en) * 2023-08-16 2023-09-12 杭州同花顺数据开发有限公司 Text classification method, system and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138506A (en) * 2015-07-09 2015-12-09 天云融创数据科技(北京)有限公司 Financial text sentiment analysis method
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
CN107077486A (en) * 2014-09-02 2017-08-18 菲特尔销售工具有限公司 Affective Evaluation system and method
WO2017149540A1 (en) * 2016-03-02 2017-09-08 Feelter Sales Tools Ltd Sentiment rating system and method
CN108170681A (en) * 2018-01-15 2018-06-15 中南大学 Text emotion analysis method, system and computer readable storage medium
CN108460009A (en) * 2017-12-14 2018-08-28 中山大学 The attention mechanism Recognition with Recurrent Neural Network text emotion analytic approach of embedded sentiment dictionary
CN108932227A (en) * 2018-06-05 2018-12-04 天津大学 A kind of short text emotion value calculating method based on sentence structure and context
CN109271493A (en) * 2018-11-26 2019-01-25 腾讯科技(深圳)有限公司 A kind of language text processing method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107077486A (en) * 2014-09-02 2017-08-18 菲特尔销售工具有限公司 Affective Evaluation system and method
CN105138506A (en) * 2015-07-09 2015-12-09 天云融创数据科技(北京)有限公司 Financial text sentiment analysis method
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
WO2017149540A1 (en) * 2016-03-02 2017-09-08 Feelter Sales Tools Ltd Sentiment rating system and method
CN108460009A (en) * 2017-12-14 2018-08-28 中山大学 The attention mechanism Recognition with Recurrent Neural Network text emotion analytic approach of embedded sentiment dictionary
CN108170681A (en) * 2018-01-15 2018-06-15 中南大学 Text emotion analysis method, system and computer readable storage medium
CN108932227A (en) * 2018-06-05 2018-12-04 天津大学 A kind of short text emotion value calculating method based on sentence structure and context
CN109271493A (en) * 2018-11-26 2019-01-25 腾讯科技(深圳)有限公司 A kind of language text processing method, device and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAOFENG CAI等: "Multi-view and Attention-Based BI-LSTM for Weibo Emotion Recognition", 《NCCE 2018》 *
於雯;周武能;: "基于LSTM的商品评论情感分析", 计算机***应用, no. 08 *
易顺明;周洪斌;周国栋;: "Twitter推文与情感词典SentiWordNet匹配算法研究", 南京师范大学学报(工程技术版), no. 03 *
易顺明;易昊;周国栋;: "采用情感特征向量的Twitter情感分类方法研究", 小型微型计算机***, no. 11 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427330A (en) * 2019-08-13 2019-11-08 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of code analysis
CN110427330B (en) * 2019-08-13 2023-09-26 腾讯科技(深圳)有限公司 Code analysis method and related device
CN110991163A (en) * 2019-11-29 2020-04-10 达而观信息科技(上海)有限公司 Document comparison analysis method and device, electronic equipment and storage medium
CN110991163B (en) * 2019-11-29 2023-09-19 达观数据有限公司 Document comparison and analysis method and device, electronic equipment and storage medium
CN111291187A (en) * 2020-01-22 2020-06-16 北京芯盾时代科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN111291187B (en) * 2020-01-22 2023-08-08 北京芯盾时代科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN116738298A (en) * 2023-08-16 2023-09-12 杭州同花顺数据开发有限公司 Text classification method, system and storage medium
CN116738298B (en) * 2023-08-16 2023-11-24 杭州同花顺数据开发有限公司 Text classification method, system and storage medium

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
KR102577514B1 (en) Method, apparatus for text generation, device and storage medium
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN110032736A (en) A kind of text analyzing method, apparatus and storage medium
CN110717339A (en) Semantic representation model processing method and device, electronic equipment and storage medium
CN102866989B (en) Viewpoint abstracting method based on word dependence relationship
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
CN111738016B (en) Multi-intention recognition method and related equipment
CN109271493A (en) A kind of language text processing method, device and storage medium
CN114694076A (en) Multi-modal emotion analysis method based on multi-task learning and stacked cross-modal fusion
CN107220235A (en) Speech recognition error correction method, device and storage medium based on artificial intelligence
CN113239169B (en) Answer generation method, device, equipment and storage medium based on artificial intelligence
Wang et al. Response selection for multi-party conversations with dynamic topic tracking
CN108228576B (en) Text translation method and device
CN111144120A (en) Training sentence acquisition method and device, storage medium and electronic equipment
CN108536670A (en) Output statement generating means, methods and procedures
CN110377905A (en) Semantic expressiveness processing method and processing device, computer equipment and the readable medium of sentence
WO2020206913A1 (en) Method and apparatus for neural network-based word segmentation and part-of-speech tagging, device and storage medium
CN110851601A (en) Cross-domain emotion classification system and method based on layered attention mechanism
CN112860871B (en) Natural language understanding model training method, natural language understanding method and device
US20230094730A1 (en) Model training method and method for human-machine interaction
CN113705315A (en) Video processing method, device, equipment and storage medium
CN115357719A (en) Power audit text classification method and device based on improved BERT model
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
CN116127060A (en) Text classification method and system based on prompt words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240322