CN108959265A

CN108959265A - Cross-domain texts sensibility classification method, device, computer equipment and storage medium

Info

Publication number: CN108959265A
Application number: CN201810770172.5A
Authority: CN
Inventors: 秦兴德; 刘奕慧; 郭玮
Original assignee: Shenzhen Dingfeng Cattle Technology Co Ltd
Current assignee: Shenzhen Dingfeng Cattle Technology Co Ltd
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2018-12-07

Abstract

The embodiment of the invention discloses one kind, the embodiment of the invention provides a kind of cross-domain texts sensibility classification method, device, computer equipment and storage mediums.Wherein, method includes: the first training set and the second training set to be merged into third training set, and carry out term vector training to third training set by term vector tool；First training set is divided into multiple sub- training sets, and merges each sub- training set with the second training set to obtain multiple feature training sets respectively；Respectively according to each feature training set one classifier of training；Integrated classifier is established according to the classifier after all training；The training set in target domain without mark is acted on by integrated classifier, to solve the problems, such as that mark sample needs to need respective mark sample by artificial acquisition and different fields in the prior art, highly shortened handling time and model construction time.In addition, the present invention realizes that simply speed is fast, precision is high, can large scale deployment.

Description

Cross-domain texts sensibility classification method, device, computer equipment and storage medium

Technical field

The present invention relates to emotional semantic classification technical field more particularly to a kind of cross-domain texts sensibility classification methods, device, meter Calculate machine equipment and storage medium.

Background technique

Emotional semantic classification is one of natural language processing main task.For emotional semantic classification, existing method focus mostly in The emotional semantic classification in single field.Its commonly used method includes machine learning classification method based on vector space model, is based on The machine learning classification method of term vector model, based on RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network), CNN (Convolutional Neural Network, convolutional neural networks) even depth learning method.

When carrying out emotional semantic classification using above method, need largely to mark sample, and the acquisition for marking sample is main Manually, different fields needs respective mark sample again, time-consuming and laborious, has clearly disadvantageous.

Summary of the invention

The embodiment of the invention provides a kind of cross-domain texts sensibility classification method, device, computer equipment and storages to be situated between Matter, it is intended to which when carrying out emotional semantic classification, mark sample needs to need by artificial acquisition and different fields respective for solution The problem of marking sample.

In a first aspect, the embodiment of the invention provides a kind of cross-domain texts sensibility classification methods comprising:

First training set and the second training set are merged into third training set, and by term vector tool to third training set Term vector training is carried out to obtain the text vector of each sample and emotion label label in the third training set, wherein institute Stating the first training set is by the training set of mark in source domain, and second training set is in target domain by the instruction of mark Practice collection, further includes the training set without mark in the target domain；

First training set is divided into multiple sub- training sets, and respectively by each sub- training set and described second Training set merges, and accordingly obtains multiple feature training sets；

Label is marked to classify one according to the text vector and emotion of each feature training set and each sample respectively Device is trained；

Integrated classifier is established according to the classifier after all training；

The training set in the target domain without mark is acted on by the integrated classifier.

Its further technical solution is, described that first training set is divided into multiple sub- training sets, comprising:

The first training ensemble average is divided into multiple sub- training sets.

Its further technical solution is, described to obtain a classification according to each feature training set training respectively Device, comprising:

Initialize the initial weight of each training sample in the feature training set

Successive ignition training is carried out to the feature training set, repetitive exercise includes: each time

The training sub-classifier on the feature training setObtain classification function

(1) calculates the sub-classifier according to the following formulaClassification error rate

(2) calculate the sub-classifier according to the following formulaWeight

(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration according to the following formula Trained weight

Classifier L is determined according to result from above and following formula (5)_j(x)；

Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set This emotion marks label, and i is the serial number of training sample in the feature training set, and M is feature training concentration training sample This quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,For the feature training set training sample current iteration training weight,

Its further technical solution is that the classifier obtained according to each feature training set training is established integrated Classifier, comprising:

According to formulaDetermine integrated classifier L (x), wherein m is the feature training set Quantity.

Its further technical solution is that first training set and the second training set are merged into third training set described, And term vector training is carried out to third training set to obtain the text of each sample in the third training set by term vector tool Before vector and emotion label label, the method also includes:

Word segmentation processing is carried out to first training set and second training set；

Remove the stop-word in first training set and second training set.

Second aspect, the embodiment of the invention also provides a kind of cross-domain texts emotional semantic classification devices comprising:

Acquiring unit for the first training set and the second training set to be merged into third training set, and passes through term vector work Have and term vector training is carried out to obtain the text vector and emotion mark of each sample in the third training set to third training set Remember label, wherein first training set is by the training set of mark in source domain, and second training set is target domain It is middle to pass through the training set marked, it further include the training set without mark in the target domain；

Combining unit, for first training set to be divided into multiple sub- training sets, and respectively by each sub- instruction Practice collection to merge with second training set, accordingly obtains multiple feature training sets；

Training unit, for respectively according to one classifier of each feature training set training；

Unit is established, the classifier for obtaining according to each feature training set training establishes integrated classifier；

Action cell acts on the training set in the target domain without mark by the integrated classifier.

Its further technical solution is that the training unit includes:

Initialization unit, for initializing the initial weight of each training sample in the feature training set

Repetitive exercise unit, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time

(2) calculate the sub-classifier according to the following formulaWeight

First determination unit determines classifier L according to result from above and following formula (5)_j(x)；

Its further technical solution is that the unit of establishing includes:

Second determination unit, for according to formulaDetermine integrated classifier L (x), wherein m For the quantity of the feature training set.

The third aspect, the embodiment of the invention also provides a kind of computer equipments comprising memory and processor, it is described Computer program is stored on memory, the processor realizes the above method when executing the computer program.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the storage medium storage There is computer program, the computer program includes program instruction, and described program instruction can be realized when being executed by a processor State method.

By the technical solution of the application embodiment of the present invention, can use in source domain by the data set of mark and a small amount of By the data set of mark in target domain, realizes and the data set not marked largely in target domain is labeled, thus Solve the problems, such as that mark sample needs to need respective mark sample by artificial acquisition and different fields in the prior art, It highly shortened handling time and model construction time.In method of the invention, using multiple classifiers combinations at collection Constituent class device avoids single classifier over-fitting, improves generalization ability.In addition, the present invention realizes that simply speed is fast, precision Height, can large scale deployment.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention；

Fig. 2 is a kind of sub-process schematic diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention；

Fig. 3 be another embodiment of the present invention provides a kind of cross-domain texts sensibility classification method flow diagram；

Fig. 4 is a kind of schematic block diagram of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention；

Fig. 5 is a kind of schematic frame of the combining unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention Figure；

Fig. 6 is a kind of schematic frame of the training unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention Figure；

Fig. 7 is a kind of schematic frame for establishing unit of cross-domain texts emotional semantic classification device provided in an embodiment of the present invention Figure；

Fig. 8 be another embodiment of the present invention provides a kind of cross-domain texts emotional semantic classification device schematic block diagram；With And

Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".

Fig. 1 is the flow diagram of cross-domain texts sensibility classification method provided in an embodiment of the present invention.As shown, should Method includes the following steps S1-S5.

First training set and the second training set are merged into third training set, and are instructed by term vector tool to third by S1 Practice collection and carries out term vector training to obtain the text vector of each sample and emotion label label in third training set.Wherein, One training set is by the training set of mark in source domain, and the second training set is in target domain by the training set of mark, mesh It further include the training set without mark in mark field.

In embodiments of the present invention, the first training set is by the training set of mark in source domain, and the second training set is mesh Pass through the training set of mark in mark field, further includes the training set without mark in target domain.The purpose of the embodiment of the present invention It is by being led by passing through the training set of mark on a small quantity in the training set and target domain of mark to target in source domain Largely the training set without mark is labeled in domain.

In specific implementation, the first training set and the second training set are merged into third training set, pass through term vector tool pair Third training set carries out term vector training to obtain the text vector of each sample and emotion label label in third training set.It is logical It crosses and merges the first training set with the second training set, then carrying out term vector training to third training set by term vector tool can It is associated with so that being generated between the first training set and the second training set, so that the result of mark is more accurate.

In one embodiment, the term vector tool used is word2vec.Word2vec is a word to be characterized as real number It is worth the efficient tool of vector, utilizes the thought of deep learning, the processing to content of text can be reduced to by training by K Vector operation in dimensional vector space, and the similarity in vector space can be used to indicate the similarity on text semantic.

Come it should be noted that other kinds of term vector tool also can be used in those skilled in the art to third training set Term vector training is carried out, the present invention is not specifically limited in this embodiment.

First training set is divided into multiple sub- training sets by S2, and respectively gathers each sub- training set and the second training And accordingly obtain multiple feature training sets.

In specific implementation, the first training set is divided into multiple sub- training sets, and respectively by each sub- training set and second Training set merges, and accordingly obtains multiple feature training sets.For example, in one embodiment, the first training set is A, the second training set For B.First training set A is divided into sub- training set A1, sub- training set A2 and sub- training set A3, and respectively by sub- training set A1, sub- training set A2 and sub- training set A3 merge in the second training set B, accordingly obtain three feature training sets.

In one embodiment, the first training ensemble average is divided into multiple sub- training sets, i.e. sample in every sub- training set This quantity is identical.

S3 marks label to a classifier according to the text vector and emotion of each feature training set and each sample respectively It is trained.

In specific implementation, respectively the text of each sample according to obtained in each feature training set and above step S1 to Amount and emotion mark one classifier of label training.

In one embodiment, specifically training process includes the following steps S310-S330.

S310, the initial weight of each training sample in initialization feature training set

In specific implementation, the initial weight of each training sample first in setting feature training setWherein, x is spy The text vector of the training sample of training set is levied, i is characterized the serial number of training sample in training set, and j is characterized the serial number of training set.

In one embodiment, by the initial weight of training sample each in feature training setIt is set asM is that M is The quantity of training sample in feature training set.

S320 carries out successive ignition training to feature training set.

In specific implementation, successive ignition training is carried out according to feature training set, wherein repetitive exercise includes following each time Step 1-4.

Step 1, the training sub-classifier on feature training setObtain classification function

Step 2, according to formulaCalculate sub-classifierClassification error rate

Step 3, according to formulaCalculate sub-classifierWeight

Step 4, according to formulaIt obtains and is respectively instructed in feature training set Practice the weight that sample carries out next iteration training

It should be noted that

The meaning of each letter is as follows in above each formula: x is characterized the text vector of the training sample of training set, and y is spy The emotion for levying the training sample of training set marks label, and i is characterized the serial number of training sample in training set, and M is characterized training set The quantity of middle training sample, t are the number of current iteration training, and T is the total degree of repetitive exercise, and j is characterized the sequence of training set Number,It is characterized weight of the training sample in current iteration training of training set, it should be noted that training sample first Weight when secondary repetitive exercise is initial weight

S330, according to result from above and formulaDetermine classifier L_j(x)。

In specific implementation, according to formulaDetermine classifier L_j(x)。L_j(x) i.e. for according to spy The classifier that sign training set training obtains.It is trained respectively by multiple feature training sets using above step S310-S330, Multiple classifier L can accordingly be obtained_j(x)。

S4 establishes integrated classifier according to the classifier after all training.

In specific implementation, integrated classifier is established according to the classifier after all training.

In one embodiment, according to formulaDetermine integrated classifier L (x).

Wherein, m is characterized the quantity of training set, and j is characterized the serial number of training set, L_j(x) for according to feature training training The classifier got.

Use multiple classifier L_j(x) it is combined into integrated classifier L (x), single classifier over-fitting is avoided, improves general Change ability.

S5 acts on the training set in target domain without mark by integrated classifier.

In specific implementation, the training set in target domain without mark is acted on by integrated classifier, mesh can be obtained The emotional semantic classification label of each sample in training set in mark field without mark.

Fig. 3 be another embodiment of the present invention provides a kind of cross-domain texts sensibility classification method flow diagram.Such as Shown in Fig. 4, the cross-domain texts sensibility classification method of the present embodiment includes step S31-S37.Wherein step S33-S37 with it is above-mentioned Step S1-S5 in embodiment is similar, and details are not described herein.The following detailed description of in the present embodiment increase step S31- S32。

S31 carries out word segmentation processing to the first training set and the second training set.

Participle is a basic steps of text-processing, i.e., extracts word from the first training set and the second training set As sample.In specific implementation, word segmentation processing is carried out to the first training set and the second training set using participle tool.

S32 removes the stop-word in the first training set and the second training set.

In specific implementation, the stop-word in the first training set and the second training set is removed.Stop-word (stopword), often For preposition, adverbial word or conjunction etc..For example, " ", " the inside ", " also ", " ", " it ", " being " etc. be all stop-word.These words because It is excessively high for frequency of use, so needing to remove stop-word.

Fig. 4 is a kind of schematic block diagram of cross-domain texts emotional semantic classification device 40 provided in an embodiment of the present invention.Such as Fig. 4 It is shown, correspond to the above cross-domain texts sensibility classification method, the present invention also provides a kind of cross-domain texts emotional semantic classification devices 40.The cross-domain texts emotional semantic classification device 40 includes the unit for executing above-mentioned cross-domain texts sensibility classification method, should Device can be configured in desktop computer, tablet computer, laptop computer, etc. in terminals.Specifically, referring to Fig. 4, this is cross-cutting Text emotion sorter 40 includes acquiring unit 41, combining unit 42, training unit 43, establishes unit 44 and action cell 45。

Acquiring unit 41 for the first training set and the second training set to be merged into third training set, and passes through term vector Tool carries out term vector training to third training set to obtain the text vector and emotion of each sample in the third training set Mark label, wherein first training set is by the training set of mark in source domain, and second training set is target neck Pass through the training set of mark in domain, further includes the training set without mark in the target domain；

Combining unit 42, for first training set to be divided into multiple sub- training sets, and respectively by each son Training set merges with second training set, accordingly obtains multiple feature training sets；

Training unit 43, for being marked respectively according to the text vector and emotion of each feature training set and each sample Label is trained a classifier；

Unit 44 is established, for establishing integrated classifier according to the classifier after all training；

Action cell 45 acts on the training set in the target domain without mark by the integrated classifier.

In one embodiment, as shown in figure 5, the combining unit 42 includes equal sub-unit 421.

Equal sub-unit 421, for the first training ensemble average to be divided into multiple sub- training sets.

In one embodiment, as shown in fig. 6, the training unit 43 includes initialization unit 431, repetitive exercise unit 432 and first determination unit 433.

Initialization unit 431, for initializing the initial weight of each training sample in the feature training set

Repetitive exercise unit 432, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time

(2) calculate the sub-classifier according to the following formulaWeight

First determination unit 433 determines classifier L according to result from above and following formula (5)_j(x)；

In one embodiment, as shown in fig. 7, the unit 44 of establishing includes the second determination unit 441.

Second determination unit 441, for according to formulaDetermine integrated classifier L (x), In, m is the quantity of the feature training set.

Fig. 8 be another embodiment of the present invention provides a kind of cross-domain texts emotional semantic classification device schematic block diagram.Such as Shown in Fig. 8, the cross-domain texts emotional semantic classification device of the present embodiment is to increase participle unit 46 on the basis of above-described embodiment And removal unit 47.

Participle unit 46, for carrying out word segmentation processing to first training set and second training set.

Removal unit 47, for removing the stop-word in first training set and second training set.

It should be noted that it is apparent to those skilled in the art that, above-mentioned cross-domain texts emotion point The specific implementation process of class device 40 and each unit, can be with reference to the corresponding description in preceding method embodiment, for description Convenienct and succinct, details are not described herein.

Above-mentioned cross-domain texts emotional semantic classification device can be implemented as a kind of form of computer program, the computer program It can be run in computer equipment as shown in Figure 9.

Referring to Fig. 9, Fig. 9 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The computer Equipment 500 can be terminal, be also possible to server, wherein terminal can be smart phone, tablet computer, laptop, Desktop computer, personal digital assistant and wearable device etc. have the electronic equipment of communication function.Server can be independent Server is also possible to the server cluster of multiple server compositions.

Refering to Fig. 9, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.

The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 include program instruction, which is performed, and processor 502 may make to execute a kind of cross-domain texts emotional semantic classification Method.

The processor 502 is for providing calculating and control ability, to support the operation of entire computer equipment 500.

The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute a kind of cross-domain texts emotional semantic classification side Method.

The network interface 505 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that in Fig. 8 The structure shown, only the block diagram of part-structure relevant to application scheme, does not constitute and is applied to application scheme The restriction of computer equipment 500 thereon, specific computer equipment 500 may include more more or fewer than as shown in the figure Component perhaps combines certain components or with different component layouts.

Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following step It is rapid:

In one embodiment, first training set described be divided into multiple sub- training sets and walked realizing by processor 502 When rapid, it is implemented as follows step:

In one embodiment, processor 502 obtains one in described train respectively according to each feature training set of realization When a classifier step, it is implemented as follows step:

(2) calculate the sub-classifier according to the following formulaWeight

In one embodiment, processor 502 is realizing the classifier obtained according to each feature training set training When establishing integrated classifier step, it is implemented as follows step:

In one embodiment, processor 502 realize it is described by the first training set and the second training set merge into third instruct Practice collection, and term vector training is carried out to third training set to obtain each sample in the third training set by term vector tool Before text vector and emotion label labelling step, following steps are also realized:

Remove the stop-word in first training set and second training set.

It should be appreciated that in the embodiment of the present application, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or The processor is also possible to any conventional processor etc..

Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process, It is that relevant hardware can be instructed to complete by computer program.The computer program includes program instruction, computer journey Sequence can be stored in a storage medium, which is computer readable storage medium.The program instruction is by the department of computer science At least one processor in system executes, to realize the process step of the embodiment of the above method.

Therefore, the present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.This is deposited Storage media is stored with computer program, and wherein computer program includes program instruction.The program instruction makes when being executed by processor Processor executes following steps:

In one embodiment, the processor is realized described by first training set stroke in the instruction of execution described program When being divided into multiple sub- training set steps, it is implemented as follows step:

In one embodiment, the processor is realized described respectively according to each spy in the instruction of execution described program When sign training set training obtains a classifier step, it is implemented as follows step:

(2) calculate the sub-classifier according to the following formulaWeight

In one embodiment, the processor is realized described according to each feature training in the instruction of execution described program When the classifier got of assembling for training establishes integrated classifier step, it is implemented as follows step:

In one embodiment, the processor is realized described by the first training set and second in the instruction of execution described program Training set merges into third training set, and by term vector tool carries out term vector training to third training set to obtain described the Before the text vector of each sample and emotion mark labelling step in three training sets, following steps are also realized:

Remove the stop-word in first training set and second training set.

The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk Or the various computer readable storage mediums that can store program code such as CD.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only Only a kind of logical function partition, there may be another division manner in actual implementation.Such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.

The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair Unit in bright embodiment device can be combined, divided and deleted according to actual needs.In addition, in each implementation of the present invention Each functional unit in example can integrate in one processing unit, is also possible to each unit and physically exists alone, can also be with It is that two or more units are integrated in one unit.

If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product, It can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing skill The all or part of part or the technical solution that art contributes can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, terminal or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in some embodiment Part, reference can be made to the related descriptions of other embodiments.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, the even these modifications and changes of the present invention range that belongs to the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of cross-domain texts sensibility classification method characterized by comprising

First training set and the second training set are merged into third training set, and third training set is carried out by term vector tool Term vector training is to obtain in the third training set text vector of each sample and emotion label label, wherein described the One training set is by the training set of mark in source domain, and second training set is in target domain by the training of mark Collect, further includes the training set without mark in the target domain；

First training set is divided into multiple sub- training sets, and respectively by each sub- training set and second training Collection merges, and accordingly obtains multiple feature training sets；

Respectively according to the text vector and emotion of each feature training set and each sample mark label to a classifier into Row training；

2. cross-domain texts sensibility classification method according to claim 1, which is characterized in that described by first training Collection is divided into multiple sub- training sets, comprising:

3. cross-domain texts sensibility classification method according to claim 1, which is characterized in that described respectively according to each institute It states the training of feature training set and obtains a classifier, comprising:

(2) calculate the sub-classifier according to the following formulaWeight

(3) and formula (4) obtain each training sample in the feature training set and carry out next iteration training according to the following formula Weight

Wherein, x is the text vector of the training sample of the feature training set, and y is the training sample of the feature training set Emotion marks label, and i is the serial number of training sample in the feature training set, and M is training sample in the feature training set Quantity, t are the number of current repetitive exercise, and T is the total degree of repetitive exercise, and j is the serial number of the feature training set,The weight trained for the training sample of the feature training set in current iteration.

4. cross-domain texts sensibility classification method according to claim 3, which is characterized in that described according to each feature The classifier that training set training obtains establishes integrated classifier, comprising:

According to formulaDetermine integrated classifier L (x), wherein m is the number of the feature training set Amount.

5. the cross-domain texts sensibility classification method according to claim 1, which is characterized in that described by first Training set and the second training set merge into third training set, and carry out term vector training to third training set by term vector tool Before obtaining in the third training set text vector of each sample and emotion label label, the method also includes:

Remove the stop-word in first training set and second training set.

6. a kind of cross-domain texts emotional semantic classification device characterized by comprising

Acquiring unit for the first training set and the second training set to be merged into third training set, and passes through term vector tool pair Third training set carries out term vector training to obtain the text vector of each sample and emotion label mark in the third training set Label, wherein first training set is by the training set of mark in source domain, and second training set is to pass through in target domain The training set of mark is crossed, further includes the training set without mark in the target domain；

Combining unit, for first training set to be divided into multiple sub- training sets, and respectively by each sub- training set Merge with second training set, accordingly obtains multiple feature training sets；

Training unit, for marking label pair according to the text vector and emotion of each feature training set and each sample respectively One classifier is trained；

Unit is established, for establishing integrated classifier according to the classifier after all training；

7. cross-domain texts emotional semantic classification device according to claim 6, which is characterized in that the training unit includes:

Initialization unit, for initializing the initial weight of each training sample in the feature training setRepetitive exercise Unit, for carrying out successive ignition training to the feature training set, repetitive exercise includes: each time

(2) calculate the sub-classifier according to the following formulaWeight

8. cross-domain texts emotional semantic classification device according to claim 7, which is characterized in that the unit of establishing includes:

Second determination unit, for according to formulaDetermine integrated classifier L (x), wherein m is institute State the quantity of feature training set.

9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, on the memory It is stored with computer program, the processor is realized as described in any one of claim 1-5 when executing the computer program Method.

10. a kind of storage medium, which is characterized in that the storage medium is stored with computer program, the computer program quilt Processor can realize method according to any one of claims 1 to 5 when executing.