CN108959265A - Cross-domain texts sensibility classification method, device, computer equipment and storage medium - Google Patents
Cross-domain texts sensibility classification method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108959265A CN108959265A CN201810770172.5A CN201810770172A CN108959265A CN 108959265 A CN108959265 A CN 108959265A CN 201810770172 A CN201810770172 A CN 201810770172A CN 108959265 A CN108959265 A CN 108959265A
- Authority
- CN
- China
- Prior art keywords
- training set
- training
- feature
- classifier
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses one kind, the embodiment of the invention provides a kind of cross-domain texts sensibility classification method, device, computer equipment and storage mediums.Wherein, method includes: the first training set and the second training set to be merged into third training set, and carry out term vector training to third training set by term vector tool;First training set is divided into multiple sub- training sets, and merges each sub- training set with the second training set to obtain multiple feature training sets respectively;Respectively according to each feature training set one classifier of training;Integrated classifier is established according to the classifier after all training;The training set in target domain without mark is acted on by integrated classifier, to solve the problems, such as that mark sample needs to need respective mark sample by artificial acquisition and different fields in the prior art, highly shortened handling time and model construction time.In addition, the present invention realizes that simply speed is fast, precision is high, can large scale deployment.
Description
Technical field
The present invention relates to emotional semantic classification technical field more particularly to a kind of cross-domain texts sensibility classification methods, device, meter
Calculate machine equipment and storage medium.
Background technique
Emotional semantic classification is one of natural language processing main task.For emotional semantic classification, existing method focus mostly in
The emotional semantic classification in single field.Its commonly used method includes machine learning classification method based on vector space model, is based on
The machine learning classification method of term vector model, based on RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network),
CNN (Convolutional Neural Network, convolutional neural networks) even depth learning method.
When carrying out emotional semantic classification using above method, need largely to mark sample, and the acquisition for marking sample is main
Manually, different fields needs respective mark sample again, time-consuming and laborious, has clearly disadvantageous.
Summary of the invention
The embodiment of the invention provides a kind of cross-domain texts sensibility classification method, device, computer equipment and storages to be situated between
Matter, it is intended to which when carrying out emotional semantic classification, mark sample needs to need by artificial acquisition and different fields respective for solution
The problem of marking sample.
In a first aspect, the embodiment of the invention provides a kind of cross-domain texts sensibility classification methods comprising:
First training set and the second training set are merged into third training set, and by term vector tool to third training set
Term vector training is carried out to obtain the text vector of each sample and emotion label label in the third training set, wherein institute
Stating the first training set is by the training set of mark in source domain, and second training set is in target domain by the instruction of mark
Practice collection, further includes the training set without mark in the target domain;
First training set is divided into multiple sub- training sets, and respectively by each sub- training set and described second
Training set merges, and accordingly obtains multiple feature training sets;
Label is marked to classify one according to the text vector and emotion of each feature training set and each sample respectively
Device is trained;
Integrated classifier is established according to the classifier after all training;
The training set in the target domain without mark is acted on by the integrated classifier.
Its further technical solution is, described that first training set is divided into multiple sub- training sets, comprising:
The first training ensemble average is divided into multiple sub- training sets.
Its further technical solution is, described to obtain a classification according to each feature training set training respectively
Device, comprising:
Initialize the initial weight of each training sample in the feature training set
Successive ignition training is carried out to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula
Trained weight
Classifier L is determined according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample
This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,
Its further technical solution is that the classifier obtained according to each feature training set training is established integrated
Classifier, comprising:
According to formulaDetermine integrated classifier L (x), wherein m is the feature training set
Quantity.
Its further technical solution is that first training set and the second training set are merged into third training set described,
And term vector training is carried out to third training set to obtain the text of each sample in the third training set by term vector tool
Before vector and emotion label label, the method also includes:
Word segmentation processing is carried out to first training set and second training set;
Remove the stop-word in first training set and second training set.
Second aspect, the embodiment of the invention also provides a kind of cross-domain texts emotional semantic classification devices comprising:
Acquiring unit for the first training set and the second training set to be merged into third training set, and passes through term vector work
Have and term vector training is carried out to obtain the text vector and emotion mark of each sample in the third training set to third training set
Remember label, wherein first training set is by the training set of mark in source domain, and second training set is target domain
It is middle to pass through the training set marked, it further include the training set without mark in the target domain;
Combining unit, for first training set to be divided into multiple sub- training sets, and respectively by each sub- instruction
Practice collection to merge with second training set, accordingly obtains multiple feature training sets;
Training unit, for respectively according to one classifier of each feature training set training;
Unit is established, the classifier for obtaining according to each feature training set training establishes integrated classifier;
Action cell acts on the training set in the target domain without mark by the integrated classifier.
Its further technical solution is that the training unit includes:
Initialization unit, for initializing the initial weight of each training sample in the feature training set
Repetitive exercise unit, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula
Trained weight
First determination unit determines classifier L according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample
This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,
Its further technical solution is that the unit of establishing includes:
Second determination unit, for according to formulaDetermine integrated classifier L (x), wherein m
For the quantity of the feature training set.
The third aspect, the embodiment of the invention also provides a kind of computer equipments comprising memory and processor, it is described
Computer program is stored on memory, the processor realizes the above method when executing the computer program.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the storage medium storage
There is computer program, the computer program includes program instruction, and described program instruction can be realized when being executed by a processor
State method.
By the technical solution of the application embodiment of the present invention, can use in source domain by the data set of mark and a small amount of
By the data set of mark in target domain, realizes and the data set not marked largely in target domain is labeled, thus
Solve the problems, such as that mark sample needs to need respective mark sample by artificial acquisition and different fields in the prior art,
It highly shortened handling time and model construction time.In method of the invention, using multiple classifiers combinations at collection
Constituent class device avoids single classifier over-fitting, improves generalization ability.In addition, the present invention realizes that simply speed is fast, precision
Height, can large scale deployment.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention;
Fig. 2 is a kind of sub-process schematic diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention;
Fig. 3 be another embodiment of the present invention provides a kind of cross-domain texts sensibility classification method flow diagram;
Fig. 4 is a kind of schematic block diagram of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic frame of the combining unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention
Figure;
Fig. 6 is a kind of schematic frame of the training unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention
Figure;
Fig. 7 is a kind of schematic frame for establishing unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention
Figure;
Fig. 8 be another embodiment of the present invention provides a kind of cross-domain texts emotional semantic classification device schematic block diagram;With
And
Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded
Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment
And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
As used in this specification and in the appended claims, term " if " can be according to context quilt
Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or
" if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true
It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
Fig. 1 is the flow diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention.As shown, should
Method includes the following steps S1-S5.
First training set and the second training set are merged into third training set, and are instructed by term vector tool to third by S1
Practice collection and carries out term vector training to obtain the text vector of each sample and emotion label label in third training set.Wherein,
One training set is by the training set of mark in source domain, and the second training set is in target domain by the training set of mark, mesh
It further include the training set without mark in mark field.
In embodiments of the present invention, the first training set is by the training set of mark in source domain, and the second training set is mesh
Pass through the training set of mark in mark field, further includes the training set without mark in target domain.The purpose of the embodiment of the present invention
It is by being led by passing through the training set of mark on a small quantity in the training set and target domain of mark to target in source domain
Largely the training set without mark is labeled in domain.
In specific implementation, the first training set and the second training set are merged into third training set, pass through term vector tool pair
Third training set carries out term vector training to obtain the text vector of each sample and emotion label label in third training set.It is logical
It crosses and merges the first training set with the second training set, then carrying out term vector training to third training set by term vector tool can
It is associated with so that being generated between the first training set and the second training set, so that the result of mark is more accurate.
In one embodiment, the term vector tool used is word2vec.Word2vec is a word to be characterized as real number
It is worth the efficient tool of vector, utilizes the thought of deep learning, the processing to content of text can be reduced to by training by K
Vector operation in dimensional vector space, and the similarity in vector space can be used to indicate the similarity on text semantic.
Come it should be noted that other kinds of term vector tool also can be used in those skilled in the art to third training set
Term vector training is carried out, the present invention is not specifically limited in this embodiment.
First training set is divided into multiple sub- training sets by S2, and respectively gathers each sub- training set and the second training
And accordingly obtain multiple feature training sets.
In specific implementation, the first training set is divided into multiple sub- training sets, and respectively by each sub- training set and second
Training set merges, and accordingly obtains multiple feature training sets.For example, in one embodiment, the first training set is A, the second training set
For B.First training set A is divided into sub- training set A1, sub- training set A2 and sub- training set A3, and respectively by sub- training set
A1, sub- training set A2 and sub- training set A3 merge in the second training set B, accordingly obtain three feature training sets.
In one embodiment, the first training ensemble average is divided into multiple sub- training sets, i.e. sample in every sub- training set
This quantity is identical.
S3 marks label to a classifier according to the text vector and emotion of each feature training set and each sample respectively
It is trained.
In specific implementation, respectively the text of each sample according to obtained in each feature training set and above step S1 to
Amount and emotion mark one classifier of label training.
In one embodiment, specifically training process includes the following steps S310-S330.
S310, the initial weight of each training sample in initialization feature training set
In specific implementation, the initial weight of each training sample first in setting feature training setWherein, x is spy
The text vector of the training sample of training set is levied, i is characterized the serial number of training sample in training set, and j is characterized the serial number of training set.
In one embodiment, by the initial weight of training sample each in feature training setIt is set asM is that M is
The quantity of training sample in feature training set.
S320 carries out successive ignition training to feature training set.
In specific implementation, successive ignition training is carried out according to feature training set, wherein repetitive exercise includes following each time
Step 1-4.
Step 1, the training sub-classifier on feature training setObtain classification function
Step 2, according to formulaCalculate sub-classifierClassification error rate
Step 3, according to formulaCalculate sub-classifierWeight
Step 4, according to formulaIt obtains and is respectively instructed in feature training set
Practice the weight that sample carries out next iteration training
It should be noted that
The meaning of each letter is as follows in above each formula: x is characterized the text vector of the training sample of training set, and y is spy
The emotion for levying the training sample of training set marks label, and i is characterized the serial number of training sample in training set, and M is characterized training set
The quantity of middle training sample, t are the number of current iteration training, and T is the total degree of repetitive exercise, and j is characterized the sequence of training set
Number,It is characterized weight of the training sample in current iteration training of training set, it should be noted that training sample first
Weight when secondary repetitive exercise is initial weight
S330, according to result from above and formulaDetermine classifier Lj(x)。
In specific implementation, according to formulaDetermine classifier Lj(x)。Lj(x) i.e. for according to spy
The classifier that sign training set training obtains.It is trained respectively by multiple feature training sets using above step S310-S330,
Multiple classifier L can accordingly be obtainedj(x)。
The meaning of each letter is as follows in above each formula: x is characterized the text vector of the training sample of training set, and y is spy
The emotion for levying the training sample of training set marks label, and i is characterized the serial number of training sample in training set, and M is characterized training set
The quantity of middle training sample, t are the number of current iteration training, and T is the total degree of repetitive exercise, and j is characterized the sequence of training set
Number,It is characterized weight of the training sample in current iteration training of training set, it should be noted that training sample first
Weight when secondary repetitive exercise is initial weight
S4 establishes integrated classifier according to the classifier after all training.
In specific implementation, integrated classifier is established according to the classifier after all training.
In one embodiment, according to formulaDetermine integrated classifier L (x).
Wherein, m is characterized the quantity of training set, and j is characterized the serial number of training set, Lj(x) for according to feature training training
The classifier got.
Use multiple classifier Lj(x) it is combined into integrated classifier L (x), single classifier over-fitting is avoided, improves general
Change ability.
S5 acts on the training set in target domain without mark by integrated classifier.
In specific implementation, the training set in target domain without mark is acted on by integrated classifier, mesh can be obtained
The emotional semantic classification label of each sample in training set in mark field without mark.
By the technical solution of the application embodiment of the present invention, can use in source domain by the data set of mark and a small amount of
By the data set of mark in target domain, realizes and the data set not marked largely in target domain is labeled, thus
Solve the problems, such as that mark sample needs to need respective mark sample by artificial acquisition and different fields in the prior art,
It highly shortened handling time and model construction time.In method of the invention, using multiple classifiers combinations at collection
Constituent class device avoids single classifier over-fitting, improves generalization ability.In addition, the present invention realizes that simply speed is fast, precision
Height, can large scale deployment.
Fig. 3 be another embodiment of the present invention provides a kind of cross-domain texts sensibility classification method flow diagram.Such as
Shown in Fig. 4, the cross-domain texts sensibility classification method of the present embodiment includes step S31-S37.Wherein step S33-S37 with it is above-mentioned
Step S1-S5 in embodiment is similar, and details are not described herein.The following detailed description of in the present embodiment increase step S31-
S32。
S31 carries out word segmentation processing to the first training set and the second training set.
Participle is a basic steps of text-processing, i.e., extracts word from the first training set and the second training set
As sample.In specific implementation, word segmentation processing is carried out to the first training set and the second training set using participle tool.
S32 removes the stop-word in the first training set and the second training set.
In specific implementation, the stop-word in the first training set and the second training set is removed.Stop-word (stopword), often
For preposition, adverbial word or conjunction etc..For example, " ", " the inside ", " also ", " ", " it ", " being " etc. be all stop-word.These words because
It is excessively high for frequency of use, so needing to remove stop-word.
Fig. 4 is a kind of schematic block diagram of cross-domain texts emotional semantic classification device 40 provided in an embodiment of the present invention.Such as Fig. 4
It is shown, correspond to the above cross-domain texts sensibility classification method, the present invention also provides a kind of cross-domain texts emotional semantic classification devices
40.The cross-domain texts emotional semantic classification device 40 includes the unit for executing above-mentioned cross-domain texts sensibility classification method, should
Device can be configured in desktop computer, tablet computer, laptop computer, etc. in terminals.Specifically, referring to Fig. 4, this is cross-cutting
Text emotion sorter 40 includes acquiring unit 41, combining unit 42, training unit 43, establishes unit 44 and action cell
45。
Acquiring unit 41 for the first training set and the second training set to be merged into third training set, and passes through term vector
Tool carries out term vector training to third training set to obtain the text vector and emotion of each sample in the third training set
Mark label, wherein first training set is by the training set of mark in source domain, and second training set is target neck
Pass through the training set of mark in domain, further includes the training set without mark in the target domain;
Combining unit 42, for first training set to be divided into multiple sub- training sets, and respectively by each son
Training set merges with second training set, accordingly obtains multiple feature training sets;
Training unit 43, for being marked respectively according to the text vector and emotion of each feature training set and each sample
Label is trained a classifier;
Unit 44 is established, for establishing integrated classifier according to the classifier after all training;
Action cell 45 acts on the training set in the target domain without mark by the integrated classifier.
In one embodiment, as shown in figure 5, the combining unit 42 includes equal sub-unit 421.
Equal sub-unit 421, for the first training ensemble average to be divided into multiple sub- training sets.
In one embodiment, as shown in fig. 6, the training unit 43 includes initialization unit 431, repetitive exercise unit
432 and first determination unit 433.
Initialization unit 431, for initializing the initial weight of each training sample in the feature training set
Repetitive exercise unit 432, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula
Trained weight
First determination unit 433 determines classifier L according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample
This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,
In one embodiment, as shown in fig. 7, the unit 44 of establishing includes the second determination unit 441.
Second determination unit 441, for according to formulaDetermine integrated classifier L (x),
In, m is the quantity of the feature training set.
Fig. 8 be another embodiment of the present invention provides a kind of cross-domain texts emotional semantic classification device schematic block diagram.Such as
Shown in Fig. 8, the cross-domain texts emotional semantic classification device of the present embodiment is to increase participle unit 46 on the basis of above-described embodiment
And removal unit 47.
Participle unit 46, for carrying out word segmentation processing to first training set and second training set.
Removal unit 47, for removing the stop-word in first training set and second training set.
It should be noted that it is apparent to those skilled in the art that, above-mentioned cross-domain texts emotion point
The specific implementation process of class device 40 and each unit, can be with reference to the corresponding description in preceding method embodiment, for description
Convenienct and succinct, details are not described herein.
Above-mentioned cross-domain texts emotional semantic classification device can be implemented as a kind of form of computer program, the computer program
It can be run in computer equipment as shown in Figure 9.
Referring to Fig. 9, Fig. 9 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The computer
Equipment 500 can be terminal, be also possible to server, wherein terminal can be smart phone, tablet computer, laptop,
Desktop computer, personal digital assistant and wearable device etc. have the electronic equipment of communication function.Server can be independent
Server is also possible to the server cluster of multiple server compositions.
Refering to Fig. 9, which includes processor 502, memory and the net connected by system bus 501
Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.
The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program
5032 include program instruction, which is performed, and processor 502 may make to execute a kind of cross-domain texts emotional semantic classification
Method.
The processor 502 is for providing calculating and control ability, to support the operation of entire computer equipment 500.
The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should
When computer program 5032 is executed by processor 502, processor 502 may make to execute a kind of cross-domain texts emotional semantic classification side
Method.
The network interface 505 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that in Fig. 8
The structure shown, only the block diagram of part-structure relevant to application scheme, does not constitute and is applied to application scheme
The restriction of computer equipment 500 thereon, specific computer equipment 500 may include more more or fewer than as shown in the figure
Component perhaps combines certain components or with different component layouts.
Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following step
It is rapid:
First training set and the second training set are merged into third training set, and by term vector tool to third training set
Term vector training is carried out to obtain the text vector of each sample and emotion label label in the third training set, wherein institute
Stating the first training set is by the training set of mark in source domain, and second training set is in target domain by the instruction of mark
Practice collection, further includes the training set without mark in the target domain;
First training set is divided into multiple sub- training sets, and respectively by each sub- training set and described second
Training set merges, and accordingly obtains multiple feature training sets;
Label is marked to classify one according to the text vector and emotion of each feature training set and each sample respectively
Device is trained;
Integrated classifier is established according to the classifier after all training;
The training set in the target domain without mark is acted on by the integrated classifier.
In one embodiment, first training set described be divided into multiple sub- training sets and walked realizing by processor 502
When rapid, it is implemented as follows step:
The first training ensemble average is divided into multiple sub- training sets.
In one embodiment, processor 502 obtains one in described train respectively according to each feature training set of realization
When a classifier step, it is implemented as follows step:
Initialize the initial weight of each training sample in the feature training set
Successive ignition training is carried out to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula
Trained weight
Classifier L is determined according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample
This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,
In one embodiment, processor 502 is realizing the classifier obtained according to each feature training set training
When establishing integrated classifier step, it is implemented as follows step:
According to formulaDetermine integrated classifier L (x), wherein m is the feature training set
Quantity.
In one embodiment, processor 502 realize it is described by the first training set and the second training set merge into third instruct
Practice collection, and term vector training is carried out to third training set to obtain each sample in the third training set by term vector tool
Before text vector and emotion label labelling step, following steps are also realized:
Word segmentation processing is carried out to first training set and second training set;
Remove the stop-word in first training set and second training set.
It should be appreciated that in the embodiment of the present application, processor 502 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices
Part, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or
The processor is also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process,
It is that relevant hardware can be instructed to complete by computer program.The computer program includes program instruction, computer journey
Sequence can be stored in a storage medium, which is computer readable storage medium.The program instruction is by the department of computer science
At least one processor in system executes, to realize the process step of the embodiment of the above method.
Therefore, the present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.This is deposited
Storage media is stored with computer program, and wherein computer program includes program instruction.The program instruction makes when being executed by processor
Processor executes following steps:
First training set and the second training set are merged into third training set, and by term vector tool to third training set
Term vector training is carried out to obtain the text vector of each sample and emotion label label in the third training set, wherein institute
Stating the first training set is by the training set of mark in source domain, and second training set is in target domain by the instruction of mark
Practice collection, further includes the training set without mark in the target domain;
First training set is divided into multiple sub- training sets, and respectively by each sub- training set and described second
Training set merges, and accordingly obtains multiple feature training sets;
Label is marked to classify one according to the text vector and emotion of each feature training set and each sample respectively
Device is trained;
Integrated classifier is established according to the classifier after all training;
The training set in the target domain without mark is acted on by the integrated classifier.
In one embodiment, the processor is realized described by first training set stroke in the instruction of execution described program
When being divided into multiple sub- training set steps, it is implemented as follows step:
The first training ensemble average is divided into multiple sub- training sets.
In one embodiment, the processor is realized described respectively according to each spy in the instruction of execution described program
When sign training set training obtains a classifier step, it is implemented as follows step:
Initialize the initial weight of each training sample in the feature training set
Successive ignition training is carried out to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula
Trained weight
Classifier L is determined according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample
This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,
In one embodiment, the processor is realized described according to each feature training in the instruction of execution described program
When the classifier got of assembling for training establishes integrated classifier step, it is implemented as follows step:
According to formulaDetermine integrated classifier L (x), wherein m is the feature training set
Quantity.
In one embodiment, the processor is realized described by the first training set and second in the instruction of execution described program
Training set merges into third training set, and by term vector tool carries out term vector training to third training set to obtain described the
Before the text vector of each sample and emotion mark labelling step in three training sets, following steps are also realized:
Word segmentation processing is carried out to first training set and second training set;
Remove the stop-word in first training set and second training set.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk
Or the various computer readable storage mediums that can store program code such as CD.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation.Such as multiple units or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.
The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair
Unit in bright embodiment device can be combined, divided and deleted according to actual needs.In addition, in each implementation of the present invention
Each functional unit in example can integrate in one processing unit, is also possible to each unit and physically exists alone, can also be with
It is that two or more units are integrated in one unit.
If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product,
It can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing skill
The all or part of part or the technical solution that art contributes can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, terminal or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in some embodiment
Part, reference can be made to the related descriptions of other embodiments.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, the even these modifications and changes of the present invention range that belongs to the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of cross-domain texts sensibility classification method characterized by comprising
First training set and the second training set are merged into third training set, and third training set is carried out by term vector tool
Term vector training is to obtain in the third training set text vector of each sample and emotion label label, wherein described the
One training set is by the training set of mark in source domain, and second training set is in target domain by the training of mark
Collect, further includes the training set without mark in the target domain;
First training set is divided into multiple sub- training sets, and respectively by each sub- training set and second training
Collection merges, and accordingly obtains multiple feature training sets;
Respectively according to the text vector and emotion of each feature training set and each sample mark label to a classifier into
Row training;
Integrated classifier is established according to the classifier after all training;
The training set in the target domain without mark is acted on by the integrated classifier.
2. cross-domain texts sensibility classification method according to claim 1, which is characterized in that described by first training
Collection is divided into multiple sub- training sets, comprising:
The first training ensemble average is divided into multiple sub- training sets.
3. cross-domain texts sensibility classification method according to claim 1, which is characterized in that described respectively according to each institute
It states the training of feature training set and obtains a classifier, comprising:
Initialize the initial weight of each training sample in the feature training set
Successive ignition training is carried out to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration training according to the following formula
Weight
Classifier L is determined according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
Emotion marks label, and i is the serial number of training sample in the feature training set, and M is training sample in the feature training set
Quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,The weight trained for the training sample of the feature training set in current iteration.
4. cross-domain texts sensibility classification method according to claim 3, which is characterized in that described according to each feature
The classifier that training set training obtains establishes integrated classifier, comprising:
According to formulaDetermine integrated classifier L (x), wherein m is the number of the feature training set
Amount.
5. the cross-domain texts sensibility classification method according to claim 1, which is characterized in that described by first
Training set and the second training set merge into third training set, and carry out term vector training to third training set by term vector tool
Before obtaining in the third training set text vector of each sample and emotion label label, the method also includes:
Word segmentation processing is carried out to first training set and second training set;
Remove the stop-word in first training set and second training set.
6. a kind of cross-domain texts emotional semantic classification device characterized by comprising
Acquiring unit for the first training set and the second training set to be merged into third training set, and passes through term vector tool pair
Third training set carries out term vector training to obtain the text vector of each sample and emotion label mark in the third training set
Label, wherein first training set is by the training set of mark in source domain, and second training set is to pass through in target domain
The training set of mark is crossed, further includes the training set without mark in the target domain;
Combining unit, for first training set to be divided into multiple sub- training sets, and respectively by each sub- training set
Merge with second training set, accordingly obtains multiple feature training sets;
Training unit, for marking label pair according to the text vector and emotion of each feature training set and each sample respectively
One classifier is trained;
Unit is established, for establishing integrated classifier according to the classifier after all training;
Action cell acts on the training set in the target domain without mark by the integrated classifier.
7. cross-domain texts emotional semantic classification device according to claim 6, which is characterized in that the training unit includes:
Initialization unit, for initializing the initial weight of each training sample in the feature training setRepetitive exercise
Unit, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time
The training sub-classifier on the feature training setObtain classification function
(1) calculates the sub-classifier according to the following formulaClassification error rate
(2) calculate the sub-classifier according to the following formulaWeight
(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration training according to the following formula
Weight
First determination unit determines classifier L according to result from above and following formula (5)j(x);
Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set
Emotion marks label, and i is the serial number of training sample in the feature training set, and M is training sample in the feature training set
Quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,The weight trained for the training sample of the feature training set in current iteration.
8. cross-domain texts emotional semantic classification device according to claim 7, which is characterized in that the unit of establishing includes:
Second determination unit, for according to formulaDetermine integrated classifier L (x), wherein m is institute
State the quantity of feature training set.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, on the memory
It is stored with computer program, the processor is realized as described in any one of claim 1-5 when executing the computer program
Method.
10. a kind of storage medium, which is characterized in that the storage medium is stored with computer program, the computer program quilt
Processor can realize method according to any one of claims 1 to 5 when executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810770172.5A CN108959265A (en) | 2018-07-13 | 2018-07-13 | Cross-domain texts sensibility classification method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810770172.5A CN108959265A (en) | 2018-07-13 | 2018-07-13 | Cross-domain texts sensibility classification method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108959265A true CN108959265A (en) | 2018-12-07 |
Family
ID=64483990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810770172.5A Withdrawn CN108959265A (en) | 2018-07-13 | 2018-07-13 | Cross-domain texts sensibility classification method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959265A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857861A (en) * | 2019-01-04 | 2019-06-07 | 平安科技(深圳)有限公司 | File classification method, device, server and medium based on convolutional neural networks |
CN110009038A (en) * | 2019-04-04 | 2019-07-12 | 北京百度网讯科技有限公司 | Training method, device and the storage medium of screening model |
CN110378389A (en) * | 2019-06-24 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of Adaboost classifier calculated machine creating device |
CN111078876A (en) * | 2019-12-04 | 2020-04-28 | 国家计算机网络与信息安全管理中心 | Short text classification method and system based on multi-model integration |
WO2020143303A1 (en) * | 2019-01-10 | 2020-07-16 | 平安科技(深圳)有限公司 | Method and device for training deep learning model, computer apparatus, and storage medium |
CN111831826A (en) * | 2020-07-24 | 2020-10-27 | 腾讯科技(深圳)有限公司 | Training method, classification method and device of cross-domain text classification model |
US11423333B2 (en) | 2020-03-25 | 2022-08-23 | International Business Machines Corporation | Mechanisms for continuous improvement of automated machine learning |
-
2018
- 2018-07-13 CN CN201810770172.5A patent/CN108959265A/en not_active Withdrawn
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857861A (en) * | 2019-01-04 | 2019-06-07 | 平安科技(深圳)有限公司 | File classification method, device, server and medium based on convolutional neural networks |
WO2020143303A1 (en) * | 2019-01-10 | 2020-07-16 | 平安科技(深圳)有限公司 | Method and device for training deep learning model, computer apparatus, and storage medium |
CN110009038A (en) * | 2019-04-04 | 2019-07-12 | 北京百度网讯科技有限公司 | Training method, device and the storage medium of screening model |
CN110378389A (en) * | 2019-06-24 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of Adaboost classifier calculated machine creating device |
CN111078876A (en) * | 2019-12-04 | 2020-04-28 | 国家计算机网络与信息安全管理中心 | Short text classification method and system based on multi-model integration |
US11423333B2 (en) | 2020-03-25 | 2022-08-23 | International Business Machines Corporation | Mechanisms for continuous improvement of automated machine learning |
CN111831826A (en) * | 2020-07-24 | 2020-10-27 | 腾讯科技(深圳)有限公司 | Training method, classification method and device of cross-domain text classification model |
CN111831826B (en) * | 2020-07-24 | 2022-10-18 | 腾讯科技(深圳)有限公司 | Training method, classification method and device of cross-domain text classification model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959265A (en) | Cross-domain texts sensibility classification method, device, computer equipment and storage medium | |
CN106611052B (en) | The determination method and device of text label | |
CN106484139B (en) | Emoticon recommended method and device | |
CN108399228A (en) | Article sorting technique, device, computer equipment and storage medium | |
CN107038480A (en) | A kind of text sentiment classification method based on convolutional neural networks | |
CN106445919A (en) | Sentiment classifying method and device | |
CN109376240A (en) | A kind of text analyzing method and terminal | |
CN109299264A (en) | File classification method, device, computer equipment and storage medium | |
CN110532563A (en) | The detection method and device of crucial paragraph in text | |
CN105095179B (en) | The method and device that user's evaluation is handled | |
CN108733675B (en) | Emotion evaluation method and device based on large amount of sample data | |
CN111046886A (en) | Automatic identification method, device and equipment for number plate and computer readable storage medium | |
CN105956083A (en) | Application software classification system, application software classification method and server | |
Otte et al. | Local feature based online mode detection with recurrent neural networks | |
CN105117740A (en) | Font identification method and device | |
CN106778878A (en) | A kind of character relation sorting technique and device | |
CN106537387B (en) | Retrieval/storage image associated with event | |
Nihal et al. | Bangla sign alphabet recognition with zero-shot and transfer learning | |
CN109471932A (en) | Rumour detection method, system and storage medium based on learning model | |
CN102708164A (en) | Method and system for calculating movie expectation | |
CN109446300A (en) | A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment | |
CN108009248A (en) | A kind of data classification method and system | |
CN109359198A (en) | A kind of file classification method and device | |
CN107330009A (en) | Descriptor disaggregated model creation method, creating device and storage medium | |
CN109597987A (en) | A kind of text restoring method, device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181207 |
|
WW01 | Invention patent application withdrawn after publication |