CN108985358A - Emotion identification method, apparatus, equipment and storage medium - Google Patents

Emotion identification method, apparatus, equipment and storage medium Download PDF

Info

Publication number
CN108985358A
CN108985358A CN201810694899.XA CN201810694899A CN108985358A CN 108985358 A CN108985358 A CN 108985358A CN 201810694899 A CN201810694899 A CN 201810694899A CN 108985358 A CN108985358 A CN 108985358A
Authority
CN
China
Prior art keywords
session
modal
session information
fusion
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810694899.XA
Other languages
Chinese (zh)
Other versions
CN108985358B (en
Inventor
林英展
陈炳金
梁川
梁一川
凌光
周超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810694899.XA priority Critical patent/CN108985358B/en
Publication of CN108985358A publication Critical patent/CN108985358A/en
Application granted granted Critical
Publication of CN108985358B publication Critical patent/CN108985358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Acoustics & Sound (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a kind of Emotion identification method, apparatus, equipment and storage mediums.Wherein, this method comprises: determining the fusion session characteristics of multi-modal session information;In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance, the emotional characteristics of the multi-modal session information are obtained.Technical solution provided in an embodiment of the present invention, by the way that the session characteristics of mode each in multi-modal session information are merged to obtain fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, for model training, final mood result can directly be predicted, it is not necessary that the identification model of each mode is respectively trained, and carry out the fusion of different model results.Sample training process is simplified, and improves the accuracy of Emotion identification result.

Description

Emotion identification method, apparatus, equipment and storage medium
Technical field
The present embodiments relate to field of artificial intelligence more particularly to a kind of Emotion identification method, apparatus, equipment and Storage medium.
Background technique
With the development of artificial intelligence, intelligent interaction plays increasingly important role in more and more fields. And in intelligent interaction, an important direction is how to identify the emotional state that user is current in multi-modal interactive process, from And the feedback of mood level is provided for entire intelligent interactive system, it makes adjustment in time, to cope under different emotional states User promotes the service quality of entire interactive process.
Currently, main Emotion identification method is as shown in Figure 1, whole process is as follows: by voice, text and facial expression image Independent modeling is carried out etc. each mode, and is fused together the result of each model finally, according to rule or engineering Model is practised, fusion judgement, the multi-modal Emotion identification result of one entirety of final output are carried out to the result of multiple mode.
Since meaning is different under different scenes for same word, the emotional state of expression is different, and above method versatility It is poor;In addition it is also necessary to acquire mass data, higher cost and result controllability is poor dependent on manual operation.
Summary of the invention
The embodiment of the invention provides a kind of Emotion identification method, apparatus, equipment and storage mediums, simplify sample training Process, and improve the accuracy of Emotion identification result.
In a first aspect, the embodiment of the invention provides a kind of Emotion identification methods, this method comprises:
Determine the fusion session characteristics of multi-modal session information;
In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance, Obtain the emotional characteristics of the multi-modal session information.Second aspect, the embodiment of the invention also provides a kind of Emotion identification dresses It sets, which includes:
Fusion feature determining module, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module, for construct the fusion session characteristics input of the multi-modal session information in advance In multi-modal Emotion identification model, the emotional characteristics of the multi-modal session information are obtained.
The third aspect, the embodiment of the invention also provides a kind of equipment, which includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes any Emotion identification method in first aspect.
Fourth aspect, the embodiment of the invention also provides a kind of storage mediums, are stored thereon with computer program, the program Any Emotion identification method in first aspect is realized when being executed by processor.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Detailed description of the invention
Fig. 1 is a kind of multi-modal emotion recognition schematic diagram based on independent modal training that the prior art provides;
Fig. 2A is a kind of flow chart of the Emotion identification method provided in the embodiment of the present invention one;
Fig. 2 B be the present invention implement be applicable in based on multi-modal Fusion Features learning model schematic diagram;
Fig. 3 is a kind of flow chart of the Emotion identification method provided in the embodiment of the present invention two;
Fig. 4 is a kind of structural block diagram of the Emotion identification device provided in the embodiment of the present invention three;
Fig. 5 is a kind of structural schematic diagram of the equipment provided in the embodiment of the present invention four.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.
Embodiment one
Fig. 2A is a kind of flow chart for Emotion identification method that the embodiment of the present invention one provides, and Fig. 2 B is the embodiment of the present invention Be applicable in based on multi-modal Fusion Features learning model schematic diagram.The present embodiment is suitable for how multi-modal interactive process The case where user emotion is recognized accurately.This method can be executed by Emotion identification device provided in an embodiment of the present invention, should The mode that software and/or hardware can be used in device is realized, and can be integrated in and be calculated in equipment.A and 2B referring to fig. 2, this method tool Body includes:
S210 determines the fusion session characteristics of multi-modal session information.
Wherein, the term used when mode is a kind of interactive, it is multi-modal refer to integrated use text, image, video, The phenomenon that multiple means such as voice and gesture and symbolic carrier interact.Corresponding, multi-modal session information is while wrapping Session information containing at least two mode such as includes the session information of three kinds of voice, text and image mode simultaneously.
Merging session characteristics is by merge by the session characteristics for the different modalities for including in a session information It arrives.Optionally, deep learning model can be used, while considering the multiple modal characteristics for including in a session information to determine The fusion session characteristics of multi-modal session information.
The fusion session characteristics of multi-modal session information are inputted the multi-modal Emotion identification model constructed in advance by S220 In, obtain the emotional characteristics of multi-modal session information.
Wherein, multi-modal Emotion identification model is known based on language identification, intelligent knowledge figure and the text in artificial intelligence The model that other technology etc. is established;Specifically, can be in advance using sample data set to initial machine learning model such as neural network What model training obtained.Emotional characteristics are multi-modal Emotion identification as a result, for characterizing individual to a kind of state of extraneous things Degree, may include type of emotion and emotional intensity etc.;Type of emotion may include happiness, anger, sorrow and pleasure etc.;Emotional intensity is to be used for Characterize the degree of strength of a certain mood.
Illustratively, the fusion session characteristics of multi-modal session information are inputted to the multi-modal Emotion identification mould constructed in advance It can also include: to believe according to the fusion session characteristics of multi-modal session sample information and multi-modal session sample before in type The emotional characteristics of breath are trained initial machine learning model, obtain multi-modal Emotion identification model.
Specifically, obtaining a large amount of multi-modal meeting by constantly accumulating the session information under various scenes in interactive process The fusion session characteristics of sample information and the emotional characteristics of corresponding multi-modal session sample information are talked about, as training sample This collection is input in neural network and is trained to it, after training by each sample, obtains multi-modal Emotion identification model. When the fusion session characteristics of a multi-modal session information are input in the multi-modal Emotion identification model, model can combine should The existing parameter of model judges the fusion session characteristics of input, and exports corresponding emotional characteristics.
It should be noted that since the prior art needs individually to establish identification model to each mode, and by each model As a result weighting obtains final mood result, it is therefore desirable to a large amount of training sample, and there are the moulds that single mode learns out Type poor quality, the problem of the whole Emotion identification effect difference eventually led to.And the present embodiment B referring to fig. 2, due to directly will be more The session characteristics of each mode merge to obtain fusion session characteristics in mode session information, and only need to be by fusion session characteristics input In the multi-modal Emotion identification model unified to one, for model training, so that it may export final emotional characteristics, training sample phase Greatly reduce than the prior art;And due to the fusion of multi-modal session characteristics so that the multi-modal Emotion identification model can not only be learned The characteristic information of each mode is practised, can also be learnt to the characteristic relation between different modalities, be can be avoided and the prior art occur Since the model quality that single mode learns out is bad, the problem of the whole Emotion identification effect difference eventually led to.
It is illustrated by taking text and voice bimodal session information as an example.When user says " I just want to buy apple X now, Just want that certificate ", when this sentence, if considering text modality information and speech modality respectively using existing technology Information eventually leads to Emotion identification result inaccuracy then can this sentence be not know be noted as negative emotions.But It is, using the technical solution of this implementation along with the letter in terms of the speech modality of user while considering text modality information Breath, for example, when user says this sentence voice fluctuating it is very violent, by by " text "+" voice " bimodal Fusion Features, It is negative emotions that mood, which can finally be recognized accurately,.
Furthermore, it is necessary to which, it is emphasized that the emotional characteristics of multi-modal session sample information used by the present embodiment are comprehensive It closes and considers to be labeled multi-modal session information in the case where each mode, it can be ensured that the emotional state marked is Do not have ambiguous, constructs a more accurate data set for model training below, make finally obtained multi-modal Emotion identification mould Type is more acurrate.And the prior art is that independent each mode is labeled, and since independence is labeled a mode, Ke Nengwu Method correctly marks the emotional characteristics of a sentence, and the recognition accuracy that will lead to the corresponding mood model of each mode is poor, Eventually lead to subsequent result fusing stage effect decline.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Embodiment two
Fig. 3 is a kind of flow chart of Emotion identification method provided by Embodiment 2 of the present invention, and the present embodiment is in above-mentioned implementation On the basis of example one, further the fusion session characteristics of the multi-modal session information of determination are optimized.Referring to Fig. 3, the party Method specifically includes:
S310 determines at least two mode meetings in voice conversation information, text session information and image session information respectively The vector for talking about information indicates.
Illustratively, multi-modal session information may include: voice conversation information, text session information and image session letter Breath.The vector expression of session information refers to expression of the session information in vector space, can be obtained by modeling.
Specifically, the characteristic parameter of emotional change can be characterized by extracting respectively, to text in voice conversation information Session information cut sentence and word cutting etc. extract keyword and extract in image session information effective dynamic expression feature or Static expressive features, and be input to vector and extract in model, the vector of available voice conversation information indicates, image session is believed The vector of breath indicates and the vector of text session information indicates.Vector extraction model, which can be one, to be had phonetic feature, text This keyword and characteristics of image etc. are converted to the collective model that corresponding vector indicates, are also possible to each submodel and are composed 's.
S320 merges the vector expression of at least two mode session informations, obtains melting for multi-modal session information The vector for closing session characteristics indicates.
Specifically, the vector of each mode session information being indicated to, direct splicing is one long according to certain rules Unified vector table be shown as multi-modal session information fusion session characteristics vector indicate, to realize multiple modalities meeting Talk about the fusion that the vector of information indicates.Key message part in the vector expression of each mode session information of extraction can also be passed through Vector indicate, and splice to obtain the vector expression of the fusion session characteristics of multi-modal session information.
Illustratively, carrying out fusion to the vector expression of at least two mode session informations may include: according to preset Mode sequence indicates to carry out sequential concatenation to the vector of at least two mode session informations.
Wherein, preset mode sequence can be the sequencing of pre-set mode input, can be according to the actual situation It is modified.It such as can increase, delete or be inserted into some mode, so as to dynamically adjust the sequencing of each mode input.
Specifically, when the vector that the corresponding each mode session information of the multi-modal session information of input has been determined indicates Afterwards, according to the input sequence of each mode, the vector expression of each mode session information is directly connected to, to realize a variety of moulds The fusion that the vector of state session information indicates.
Illustratively, merging to the vector expression of at least two mode session informations can also include: to extract respectively The nonlinear characteristic that the vector of at least two mode session informations indicates;To the non-thread of at least two mode session informations of extraction Property feature is merged.
Wherein, the nonlinear characteristic that vector indicates is used to characterize the unique portion of a vector, can be in vector expression It is not 0 part.The nonlinear characteristic that the vector of a corresponding mode session information indicates refers to a mode session information In can identify mood word vector indicate.Such as the vector of a mode session information is expressed as [0,1,1,0,0], then The nonlinear characteristic that the vector of the mode session information indicates can be [1,1].
Specifically, B can be by the vector table of each mode session information in multi-modal Fusion Features layer referring to fig. 2 Show, be input in deep learning model, first passes through one layer of full articulamentum (Full Connection Layer, FCL) behaviour respectively Make, extracts the nonlinear characteristic that the vector of each mode session information indicates, obtain corresponding hidden layer vector;Then by the hidden of output Layer vector is stitched together to realize the fusion that the vector of multiple modalities session information indicates.
The fusion session characteristics of multi-modal session information are inputted the multi-modal Emotion identification model constructed in advance by S330 In, obtain the emotional characteristics of multi-modal session information.
Specifically, the expression of the vectors of the fusion session characteristics of multi-modal session information is input to construct in advance it is multi-modal In Emotion identification model, model can judge the fusion session characteristics of input, and export in conjunction with the existing parameter of the model Corresponding emotional characteristics.
Technical solution provided in an embodiment of the present invention, by by mode session information each in multi-modal session information to Amount indicates that the vector for being merged to obtain the fusion session characteristics of multi-modal session information indicates, and by the fusion session characteristics Vector expression is input in a unified multi-modal Emotion identification model, for model training, so that it may directly predict final Mood is as a result, be not necessarily to that the identification model of each mode is respectively trained, and carry out the fusion of different model results.Simplify sample training Process, and improve the accuracy of Emotion identification result.
Embodiment three
Fig. 4 is a kind of structural block diagram for Emotion identification device that the embodiment of the present invention three provides, which can be performed this hair Emotion identification method provided by bright any embodiment has the corresponding functional module of execution method and beneficial effect.Such as Fig. 4 institute Show, the apparatus may include:
Fusion feature determining module 410, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module 420, for construct the fusion session characteristics input of multi-modal session information in advance In multi-modal Emotion identification model, the emotional characteristics of multi-modal session information are obtained.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Illustratively, fusion feature determining module 410 may include:
Multi-modal vector determination unit, for determining voice conversation information, text session information and image session letter respectively The vector of at least two mode session informations indicates in breath;
Vector determination unit is merged, merges, obtains more for the vector expression at least two mode session informations The vector of the fusion session characteristics of mode session information indicates.
Optionally, fusion vector determination unit is specifically used for:
According to preset mode sequence, the vector of at least two mode session informations is indicated to carry out sequential concatenation.
Optionally, fusion vector determination unit also particularly useful for:
The nonlinear characteristic that the vector of at least two mode session informations indicates is extracted respectively;To at least two moulds of extraction The nonlinear characteristic of state session information is merged.
Illustratively, above-mentioned apparatus can also include:
Identification model determining module, for according to multi-modal session sample information fusion session characteristics and multi-modal meeting The emotional characteristics for talking about sample information, are trained initial machine learning model, obtain multi-modal Emotion identification model.
Example IV
Fig. 5 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides, and Fig. 5, which is shown, to be suitable for being used to realizing this The block diagram of the example devices of inventive embodiments embodiment.The equipment 12 that Fig. 5 is shown is only an example, should not be to this hair The function and use scope of bright embodiment bring any restrictions.As shown in figure 5, the table in the form of universal computing device of equipment 12 It is existing.The component of equipment 12 can include but is not limited to: one or more processor or processing unit 16, system storage 28, connect the bus 18 of different system components (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by equipment 12 The usable medium of access, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Equipment 12 may further include it is other it is removable/nonremovable, Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing irremovable , non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 5, use can be provided In the disc driver read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real Apply the function of each embodiment of example.
Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42 Usually execute the function and/or method in described embodiment of the embodiment of the present invention.
Equipment 12 can also be communicated with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.), Can also be enabled a user to one or more equipment interacted with the equipment 12 communication, and/or with enable the equipment 12 with One or more of the other any equipment (such as network interface card, modem etc.) communication for calculating equipment and being communicated.It is this logical Letter can be carried out by input/output (I/O) interface 22.Also, equipment 12 can also by network adapter 20 and one or The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown, Network adapter 20 is communicated by bus 18 with other modules of equipment 12.It should be understood that although not shown in the drawings, can combine Equipment 12 use other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, External disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize Emotion identification method provided by the embodiment of the present invention.
Embodiment five
The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or For computer executable instructions), Emotion identification side described in above-mentioned any embodiment can be realized when which is executed by processor Method.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation Computer program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims It is fixed.

Claims (12)

1. a kind of Emotion identification method characterized by comprising
Determine the fusion session characteristics of multi-modal session information;
In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance, obtain The emotional characteristics of the multi-modal session information.
2. being wrapped the method according to claim 1, wherein determining the fusion session characteristics of multi-modal session information It includes:
Respectively determine voice conversation information, text session information and image session information at least two mode session informations to Amount indicates;
The vector expression of at least two mode session information is merged, the fusion session of multi-modal session information is obtained The vector of feature indicates.
3. according to the method described in claim 2, it is characterized in that, the vector at least two mode session information indicates It is merged, comprising:
According to preset mode sequence, the vector of at least two mode session information is indicated to carry out sequential concatenation.
4. according to the method described in claim 2, it is characterized in that, the vector at least two mode session information indicates It is merged, comprising:
The nonlinear characteristic that the vector of at least two mode session information indicates is extracted respectively;
The nonlinear characteristic of at least two mode session information of extraction is merged.
5. the method according to claim 1, wherein the fusion session characteristics of the multi-modal session information are defeated Before entering in the multi-modal Emotion identification model constructed in advance, further includes:
According to multi-modal session sample information fusion session characteristics and the multi-modal session sample information emotional characteristics, Initial machine learning model is trained, the multi-modal Emotion identification model is obtained.
6. a kind of Emotion identification device characterized by comprising
Fusion feature determining module, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module, for the fusion session characteristics of the multi-modal session information to be inputted the multimode constructed in advance In state Emotion identification model, the emotional characteristics of the multi-modal session information are obtained.
7. device according to claim 6, which is characterized in that the fusion feature determining module includes:
Multi-modal vector determination unit, for being determined in voice conversation information, text session information and image session information respectively The vector of at least two mode session informations indicates;
Vector determination unit is merged, merges, obtains more for the vector expression at least two mode session information The vector of the fusion session characteristics of mode session information indicates.
8. device according to claim 7, which is characterized in that the fusion vector determination unit is specifically used for:
According to preset mode sequence, the vector of at least two mode session information is indicated to carry out sequential concatenation.
9. device according to claim 7, which is characterized in that the fusion vector determination unit also particularly useful for:
The nonlinear characteristic that the vector of at least two mode session information indicates is extracted respectively;
The nonlinear characteristic of at least two mode session information of extraction is merged.
10. device according to claim 6, which is characterized in that further include:
Identification model determining module, for according to multi-modal session sample information fusion session characteristics and the multi-modal meeting The emotional characteristics for talking about sample information, are trained initial machine learning model, obtain the multi-modal Emotion identification model.
11. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Existing Emotion identification method according to any one of claims 1 to 5.
12. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor Emotion identification method according to any one of claims 1 to 5.
CN201810694899.XA 2018-06-29 2018-06-29 Emotion recognition method, device, equipment and storage medium Active CN108985358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810694899.XA CN108985358B (en) 2018-06-29 2018-06-29 Emotion recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810694899.XA CN108985358B (en) 2018-06-29 2018-06-29 Emotion recognition method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108985358A true CN108985358A (en) 2018-12-11
CN108985358B CN108985358B (en) 2021-03-02

Family

ID=64538992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810694899.XA Active CN108985358B (en) 2018-06-29 2018-06-29 Emotion recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108985358B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021308A (en) * 2019-05-16 2019-07-16 北京百度网讯科技有限公司 Voice mood recognition methods, device, computer equipment and storage medium
CN110083716A (en) * 2019-05-07 2019-08-02 青海大学 Multi-modal affection computation method and system based on Tibetan language
CN110390956A (en) * 2019-08-15 2019-10-29 龙马智芯(珠海横琴)科技有限公司 Emotion recognition network model, method and electronic equipment
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
CN111507402A (en) * 2020-04-17 2020-08-07 北京声智科技有限公司 Method, device, medium and equipment for determining response mode
CN111681645A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Emotion recognition model training method, emotion recognition device and electronic equipment
CN111816211A (en) * 2019-04-09 2020-10-23 Oppo广东移动通信有限公司 Emotion recognition method and device, storage medium and electronic equipment
CN112148836A (en) * 2020-09-07 2020-12-29 北京字节跳动网络技术有限公司 Multi-modal information processing method, device, equipment and storage medium
CN112183022A (en) * 2020-09-25 2021-01-05 北京优全智汇信息技术有限公司 Loss assessment method and device
CN112233698A (en) * 2020-10-09 2021-01-15 中国平安人寿保险股份有限公司 Character emotion recognition method and device, terminal device and storage medium
CN112347774A (en) * 2019-08-06 2021-02-09 北京搜狗科技发展有限公司 Model determination method and device for user emotion recognition
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device
CN114005468A (en) * 2021-09-07 2022-02-01 华院计算技术(上海)股份有限公司 Interpretable emotion recognition method and system based on global working space
CN115496226A (en) * 2022-09-29 2022-12-20 中国电信股份有限公司 Multi-modal emotion analysis method, device, equipment and storage based on gradient adjustment
WO2023226239A1 (en) * 2022-05-24 2023-11-30 网易(杭州)网络有限公司 Object emotion analysis method and apparatus and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930298A (en) * 2012-09-02 2013-02-13 北京理工大学 Audio visual emotion recognition method based on multi-layer boosted HMM
US8781989B2 (en) * 2008-01-14 2014-07-15 Aptima, Inc. Method and system to predict a data value
CN104835507A (en) * 2015-03-30 2015-08-12 渤海大学 Serial-parallel combined multi-mode emotion information fusion and identification method
CN105427869A (en) * 2015-11-02 2016-03-23 北京大学 Session emotion autoanalysis method based on depth learning
CN106503805A (en) * 2016-11-14 2017-03-15 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781989B2 (en) * 2008-01-14 2014-07-15 Aptima, Inc. Method and system to predict a data value
CN102930298A (en) * 2012-09-02 2013-02-13 北京理工大学 Audio visual emotion recognition method based on multi-layer boosted HMM
CN104835507A (en) * 2015-03-30 2015-08-12 渤海大学 Serial-parallel combined multi-mode emotion information fusion and identification method
CN105427869A (en) * 2015-11-02 2016-03-23 北京大学 Session emotion autoanalysis method based on depth learning
CN106503805A (en) * 2016-11-14 2017-03-15 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681645A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Emotion recognition model training method, emotion recognition device and electronic equipment
CN111681645B (en) * 2019-02-25 2023-03-31 北京嘀嘀无限科技发展有限公司 Emotion recognition model training method, emotion recognition device and electronic equipment
CN111816211A (en) * 2019-04-09 2020-10-23 Oppo广东移动通信有限公司 Emotion recognition method and device, storage medium and electronic equipment
CN110083716A (en) * 2019-05-07 2019-08-02 青海大学 Multi-modal affection computation method and system based on Tibetan language
CN110021308A (en) * 2019-05-16 2019-07-16 北京百度网讯科技有限公司 Voice mood recognition methods, device, computer equipment and storage medium
CN112347774A (en) * 2019-08-06 2021-02-09 北京搜狗科技发展有限公司 Model determination method and device for user emotion recognition
CN110390956A (en) * 2019-08-15 2019-10-29 龙马智芯(珠海横琴)科技有限公司 Emotion recognition network model, method and electronic equipment
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device
CN111507402A (en) * 2020-04-17 2020-08-07 北京声智科技有限公司 Method, device, medium and equipment for determining response mode
CN112148836A (en) * 2020-09-07 2020-12-29 北京字节跳动网络技术有限公司 Multi-modal information processing method, device, equipment and storage medium
CN112183022A (en) * 2020-09-25 2021-01-05 北京优全智汇信息技术有限公司 Loss assessment method and device
CN112233698A (en) * 2020-10-09 2021-01-15 中国平安人寿保险股份有限公司 Character emotion recognition method and device, terminal device and storage medium
CN112233698B (en) * 2020-10-09 2023-07-25 中国平安人寿保险股份有限公司 Character emotion recognition method, device, terminal equipment and storage medium
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN114005468A (en) * 2021-09-07 2022-02-01 华院计算技术(上海)股份有限公司 Interpretable emotion recognition method and system based on global working space
WO2023226239A1 (en) * 2022-05-24 2023-11-30 网易(杭州)网络有限公司 Object emotion analysis method and apparatus and electronic device
CN115496226A (en) * 2022-09-29 2022-12-20 中国电信股份有限公司 Multi-modal emotion analysis method, device, equipment and storage based on gradient adjustment

Also Published As

Publication number Publication date
CN108985358B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN108985358A (en) Emotion identification method, apparatus, equipment and storage medium
JP7432556B2 (en) Methods, devices, equipment and media for man-machine interaction
Zhao et al. Exploring spatio-temporal representations by integrating attention-based bidirectional-LSTM-RNNs and FCNs for speech emotion recognition
CN107657017B (en) Method and apparatus for providing voice service
CN109003624B (en) Emotion recognition method and device, computer equipment and storage medium
CN107481720B (en) Explicit voiceprint recognition method and device
CN109036405A (en) Voice interactive method, device, equipment and storage medium
CN108922564B (en) Emotion recognition method and device, computer equipment and storage medium
CN114694076A (en) Multi-modal emotion analysis method based on multi-task learning and stacked cross-modal fusion
WO2020253509A1 (en) Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium
US10956480B2 (en) System and method for generating dialogue graphs
CN111862977A (en) Voice conversation processing method and system
CN110262665A (en) Method and apparatus for output information
CN112527962A (en) Intelligent response method and device based on multi-mode fusion, machine readable medium and equipment
CN109034203A (en) Training, expression recommended method, device, equipment and the medium of expression recommended models
CN112765971B (en) Text-to-speech conversion method and device, electronic equipment and storage medium
CN112905772A (en) Semantic correlation analysis method and device and related products
CN116933051A (en) Multi-mode emotion recognition method and system for modal missing scene
Wu et al. Speaker personality recognition with multimodal explicit many2many interactions
CN108932943A (en) Command word sound detection method, device, equipment and storage medium
KR102226427B1 (en) Apparatus for determining title of user, system including the same, terminal and method for the same
CN116403601A (en) Emotion recognition model training method, emotion recognition device and storage medium
EP4064031A1 (en) Method and system for tracking in extended reality using voice commmand
US20200234181A1 (en) Implementing training of a machine learning model for embodied conversational agent
CN115640387A (en) Man-machine cooperation method and device based on multi-mode features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant