CN114078484B - Speech emotion recognition method, device and storage medium - Google Patents

Speech emotion recognition method, device and storage medium Download PDF

Info

Publication number
CN114078484B
CN114078484B CN202010833052.2A CN202010833052A CN114078484B CN 114078484 B CN114078484 B CN 114078484B CN 202010833052 A CN202010833052 A CN 202010833052A CN 114078484 B CN114078484 B CN 114078484B
Authority
CN
China
Prior art keywords
emotion
recognition
target object
network
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010833052.2A
Other languages
Chinese (zh)
Other versions
CN114078484A (en
Inventor
孟庆林
吴海英
蒋宁
王洪斌
赵立军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongkejin Finite Element Technology Co ltd
Original Assignee
Beijing Finite Element Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Finite Element Technology Co Ltd filed Critical Beijing Finite Element Technology Co Ltd
Priority to CN202010833052.2A priority Critical patent/CN114078484B/en
Publication of CN114078484A publication Critical patent/CN114078484A/en
Application granted granted Critical
Publication of CN114078484B publication Critical patent/CN114078484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a method, a device and a storage medium for recognizing voice emotion, wherein the method for recognizing voice emotion comprises the following steps: acquiring voice information related to a target object of emotion to be identified; and carrying out emotion recognition on the voice information by using a preset recognition model to determine the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit. By utilizing the recognition model comprising the residual error network and the gate control circulation unit to recognize the voice emotion of the target object, the feature mapping capability and the sequence processing capability of the recognition model are effectively improved, and the technical effects of improving the emotion recognition accuracy and the voice emotion recognition efficiency are achieved.

Description

Speech emotion recognition method, device and storage medium
Technical Field
The present application relates to the field of emotion recognition technologies, and in particular, to a method and apparatus for speech emotion recognition, and a storage medium.
Background
In the consumer finance scenario, a great deal of hotline, return visit, collect promotion and other businesses are processed in the customer service call center every day. Customer service represents the image of a company, improves the service quality of the customer service, and is very important to effectively manage the service attitude of the customer service. In addition, real-time feedback of the emotional state of the client in the conversation is also a key to improving the quality of service. The traditional method for feeding back customer service and customer dialogue emotion is usually manual spot check, which is time-consuming, labor-consuming and high in cost. Therefore, there is a need in the consumer finance field for a system that can accurately obtain the emotional states of customers and customers in real time in a voice conversation.
In the current customer service dialogue scene in the financial field, when carrying out emotion classification, the emotion of a customer and an agent can be divided into 3 types of emotion (positive, neutral and negative), wherein the positive and neutral are influenced by factors such as telephone channel noise and dialect easily due to the similarity of the voice emotion, the accuracy rate of voice emotion recognition is low, the recognition speed is low, the real-time requirement is difficult to meet, and the emotion recognition difficulty of the financial customer service scene is greatly increased.
Aiming at the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art, no effective solution is proposed at present.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device and a storage medium for recognizing voice emotion, which at least solve the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art.
According to one aspect of the disclosed embodiments, there is provided a method of speech emotion recognition, comprising: acquiring voice information related to a target object of emotion to be identified; and carrying out emotion recognition on the voice information by using a preset recognition model to determine the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit.
According to another aspect of the embodiments of the present disclosure, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is run.
According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for speech emotion recognition, including: the acquisition module is used for acquiring voice information related to a target object of emotion to be identified; and the emotion recognition module is used for carrying out emotion recognition on the voice information by utilizing a preset recognition model and determining the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit.
According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for speech emotion recognition, including: a processor; and a memory, coupled to the processor, for providing instructions to the processor for processing the steps of: acquiring voice information related to a target object of emotion to be identified; and carrying out emotion recognition on the voice information by using a preset recognition model to determine the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit.
In the embodiment of the disclosure, in order to improve emotion feature classification capability and improve emotion recognition efficiency, a residual network (ResNet network) is used to map features corresponding to voice information. The residual error network has the characteristics of small parameter quantity, deep network depth and strong feature mapping capability. The feature processing speed is increased due to the small parameter quantity, the feature mapping capability of the network is greatly improved due to the deep network depth, and the key effect is played on the improvement of the follow-up emotion recognition accuracy. And the gating circulation unit (BiGRU) is accessed after the ResNet network, the output of the residual error network is sent into the gating circulation unit (BiGRU), the time sequence information is encoded, the time sequence information of the voice emotion is effectively combined, the network parameters are effectively reduced, the voice emotion recognition efficiency is improved, and the recognition accuracy is not influenced. Therefore, the embodiment effectively improves the feature mapping capability and the sequence processing capability of the recognition model by utilizing the recognition model comprising the residual error network and the gating circulation unit to recognize the voice emotion of the target object, and achieves the technical effects of improving the emotion recognition accuracy and the voice emotion recognition efficiency. And further solves the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and do not constitute an undue limitation on the disclosure. In the drawings:
FIG. 1 is a block diagram of a hardware architecture of a computing device for implementing a method according to embodiment 1 of the present disclosure;
FIG. 2 is a flow chart of a method of speech emotion recognition according to a first aspect of embodiment 1 of the present disclosure;
FIG. 3 is a schematic diagram of the structure of an identification model according to embodiment 1 of the present disclosure;
FIG. 4 is a schematic diagram of an apparatus for speech emotion recognition according to embodiment 2 of the present disclosure; and
fig. 5 is a schematic diagram of an apparatus for speech emotion recognition according to embodiment 3 of the present disclosure.
Detailed Description
In order to better understand the technical solutions of the present disclosure, the following description will clearly and completely describe the technical solutions of the embodiments of the present disclosure with reference to the drawings in the embodiments of the present disclosure. It will be apparent that the described embodiments are merely embodiments of a portion, but not all, of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure, shall fall within the scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in describing embodiments of the present disclosure are applicable to the following explanation:
mel-spline: mel spectrogram, a feature widely used in speech emotion recognition, speech recognition, voiceprint recognition, and speech synthesis. Firstly pre-emphasizing, framing and windowing an audio signal, then carrying out short-time Fourier transform (STFT) on each frame of signal to obtain a short-time amplitude spectrum, and finally obtaining a Mel spectrogram through a Mel filter bank;
ResNet: and a residual network, which is one of the deep learning convolutional neural networks, is used for identifying a champion model of the large-scale image net match for the image of 2015. The most outstanding characteristics of the model are that the introduction of the dynamic routing layer not only deepens the network layer number and can train, but also greatly reduces network parameters, thereby improving the network performance and effectively improving the network efficiency. Aiming at a service scene, related network design can be carried out;
BiGRU: the gating circulation unit is an improvement on a bidirectional LSTM network, improves the mechanism of three gates of the LSTM, changes the mechanism into two gates update and reset, and effectively reduces the parameter quantity. The parameters are fewer, so that the convergence is easier, and the network operation speed is effectively improved when the method is applied;
attention mechanism: a method and mechanism for simulating human vision, hearing attention to important information and appropriately ignoring non-important information. Sequence alignment operation can be carried out on sequence information after the cyclic neural network, different weights can be distributed to different sequence information in the alignment process, and different attention capacities can be represented.
Example 1
According to the present embodiment, there is provided an embodiment of a method of speech emotion recognition, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
The method embodiments provided by the present embodiments may be performed in a server or similar computing device. FIG. 1 illustrates a block diagram of a hardware architecture of a computing device for implementing a method of speech emotion recognition. As shown in fig. 1, the computing device may include one or more processors (which may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, etc., processing means), memory for storing data, and transmission means for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computing device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computing device. As referred to in the embodiments of the present disclosure, the data processing circuit acts as a processor control (e.g., selection of the variable resistance termination path to interface with).
The memory may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method of speech emotion recognition in the embodiments of the present disclosure, and the processor executes the software programs and modules stored in the memory, thereby performing various functional applications and data processing, that is, implementing the method of speech emotion recognition of the application program. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to the computing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communications provider of the computing device. In one example, the transmission means comprises a network adapter (Network Interface Controller, NIC) connectable to other network devices via the base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.
It should be noted herein that in some alternative embodiments, the computing device shown in FIG. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computing devices described above.
In the above-mentioned operating environment, according to a first aspect of the present embodiment, a method for recognizing a speech emotion is provided, and the method may be applied to a robot customer service system for recognizing emotion of a customer in a communication process. Fig. 2 shows a schematic flow chart of the method, and referring to fig. 2, the method includes:
s202: acquiring voice information related to a target object of emotion to be identified; and
s204: and carrying out emotion recognition on the voice information by using a preset recognition model, and determining the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit.
As described in the background art, in the current customer service dialogue scene in the financial field, when carrying out emotion classification, the emotion of the customer and the seat can be divided into 3 types of emotion (positive, neutral and negative), wherein the positive and neutral are influenced by factors such as telephone channel noise and dialect easily due to similarity of the voice emotion, the accuracy rate of voice emotion recognition is low, the recognition speed is low, the real-time requirement is difficult to meet, and the emotion recognition difficulty of the financial customer service scene is greatly increased.
Aiming at the technical problems in the background art, the voice emotion recognition method provided by the embodiment firstly acquires voice information related to a target object of emotion to be recognized, then carries out emotion recognition on the voice information by using a preset recognition model, and determines the emotion type of the target object. The voice information can be generated in a customer service dialogue scene in the fusion area, the target object of emotion to be recognized can be customer service or a customer, and the recognition model comprises a residual error network and a gate control circulation unit.
Specifically, in the present embodiment, in order to enhance the emotion feature classification ability and enhance the emotion recognition efficiency, a residual network (res net network) is used to map features corresponding to voice information. The residual error network has the characteristics of small parameter quantity, deep network depth and strong feature mapping capability. The feature processing speed is increased due to the small parameter quantity, the feature mapping capability of the network is greatly improved due to the deep network depth, and the key effect is played on the improvement of the follow-up emotion recognition accuracy. And the gating circulation unit (BiGRU) is accessed after the ResNet network, the output of the residual error network is sent into the gating circulation unit (BiGRU), the time sequence information is encoded, the time sequence information of the voice emotion is effectively combined, the network parameters are effectively reduced, the voice emotion recognition efficiency is improved, and the recognition accuracy is not influenced. Therefore, the embodiment effectively improves the feature mapping capability and the sequence processing capability of the recognition model by utilizing the recognition model comprising the residual error network and the gating circulation unit to recognize the voice emotion of the target object, and achieves the technical effects of improving the emotion recognition accuracy and the voice emotion recognition efficiency. And further solves the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art.
Optionally, the recognition model further includes a feature extraction network and a classifier, and the operation of identifying emotion of the voice information by using the preset recognition model to determine emotion type of the target object includes: extracting the characteristics of the voice information by utilizing a characteristic extraction network to generate a Mel spectrogram characteristic, a first-order differential characteristic and a second-order differential characteristic; performing feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by using a residual error network to generate sequence features; coding the sequence characteristics by using a gating circulation unit; and inputting the coded sequence features into a classifier, and determining the emotion type of the target object according to the output result of the classifier.
Specifically, referring to fig. 3, the recognition model includes not only a residual network and a gating loop unit, but also a feature extraction network and a classifier. Because the customer and customer service in the financial scene have short dialogue and more financial voice feature words, in the embodiment, feature extraction is performed on voice information by using a feature extraction network, not only Mel spectrogram features are generated, but also first-order difference features and second-order difference features are added as inputs of a deep learning classification network (residual error network), then the features are mapped by using the residual error network (ResNet network) with strong feature extraction capability and small parameter quantity, and the network structure of the ResNet network is exemplarily shown in the following table 1. And then, the sequence features output by the residual error network are sent to a gating circulating unit (BiGRU), the time sequence information is subjected to coding processing, the time sequence information of the voice emotion is effectively combined, and finally, the sequence features which pass through the gating circulating unit (BiGRU) are sent to a classifier for emotion classification.
Therefore, by introducing the first-order difference feature and the second-order difference feature on the basis of the Mel spectrogram feature, the features of short dialogue between customers and customer service in a financial scene and more words of the financial voice feature are effectively combined, so that the accuracy performance in a specific financial dialogue scene is improved. The ResNet network with strong feature extraction capability and small parameter quantity is used for mapping the features, so that the feature processing speed is increased, the feature mapping capability of the network is greatly improved, and the key effect of improving the follow-up emotion recognition accuracy is achieved. The time sequence information of the voice emotion is effectively combined by using a gating circulating unit (BiGRU) to carry out coding processing on the time sequence information, so that network parameters are effectively reduced, and the recognition efficiency is improved.
TABLE 1
Figure BDA0002638679230000071
Optionally, the recognition model further includes an attention mechanism layer and a full connection layer, and before the operation of inputting the encoded processed sequence features into the classifier, further includes: inputting the coded sequence features into an attention mechanism layer for sequence alignment; and inputting the sequence characteristics after the sequence alignment into the full connection layer.
Specifically, referring to FIG. 3, the recognition model also includes an attention mechanism layer and a fully connected layer. In this embodiment, the sequence features after the encoding process performed by the biglu are further required to be sent to the attention mechanism layer for sequence alignment, then the sequence features after the sequence alignment are input to the full-connection layer, and finally sent to a classifier (for example, softmax classifier). In addition, different weights can be allocated to different sequence information in the sequence alignment process, so that different attention capacities can be represented.
Optionally, the operation of acquiring the voice information related to the target object of the emotion to be identified includes: acquiring dialogue recording information between an agent and a target object; and carrying out sound channel separation on the dialogue record information, and determining the mono record information as voice information related to the target object of the emotion to be recognized.
Specifically, the dialogue record between the customer service (agent) and the user can be obtained, and the voice channels are separated, so that the customer service and the user voice channels are separated. And under the condition that the target object of the emotion to be identified is customer service, determining the recording information of the customer service sound channel as voice information related to the target object of the emotion to be identified. In the case that the target object of emotion to be recognized is a customer, the recording information of the customer channel is determined as voice information related to the target object of emotion to be recognized. In this way, the voice information related to the target object of the emotion to be recognized can be accurately acquired.
Optionally, training the recognition model by: acquiring a plurality of sample dialogue record data, wherein the sample dialogue record data comprises seat record data and user record data; constructing an identification model, wherein the identification model comprises a feature extraction network, a residual error network, a gating circulation unit, an attention mechanism layer and a classifier; respectively outputting emotion categories of objects contained in the plurality of sample dialogue record data by utilizing the identification model; and comparing the outputted emotion categories with preset marked emotion categories corresponding to the plurality of sample dialogue record data, and adjusting the recognition model according to the comparison result, wherein the marked emotion categories are used for indicating the actual emotion categories of the objects contained in the sample dialogue record data.
Specifically, data amplification is carried out on financial customer service recording data of one thousand hours for completing emotion type marking in a mode of adding noise, accelerating speech speed, increasing data disturbance and the like, so that sample dialogue recording data is generated. Further, the sample dialogue recording data is subjected to data division according to the ratio of the training set to the test set of 7:3, wherein the speaker information is fully considered, and the separation of the speaker voices of the training set and the test set is achieved. And extracting the features of the Mel spectrogram and the features of the first-order difference and the second-order difference from each voice file in the training set, wherein each voice forms a three-channel feature map, and storing the three-channel feature map into the feature file.
Further, in the process of training the recognition model, the feature files to be trained are read in batches to form a data-label feature data combination. The above feature combinations are then fed into the designed ResNet network and BiGRU network in batches, aligning the speech frame level features through the Attention mechanism. The frame-level features after the Attention are sent to a Softmax classifier, so that the forward propagation process of the classification network is completed. And finally, comparing the emotion categories output by the Softmax classifier with preset marked emotion categories corresponding to the plurality of sample dialogue record data, and adjusting the recognition model according to the comparison result, wherein the marked emotion categories are used for indicating the actual emotion categories of the objects contained in the sample dialogue record data.
Optionally, the operation of comparing the outputted emotion classification with preset labeled emotion classifications corresponding to the plurality of sample dialogue record data includes: calculating a value of a cross entropy loss function between the outputted emotion category and the annotated emotion category, and adjusting the operation of the recognition model according to the result of the comparison, comprising: and adjusting the identification model according to the value of the cross entropy loss function. And performing back propagation training on the identification model according to the value of the cross entropy loss function until the loss converges, and storing the identification model to finish the adjustment of the identification model.
Further, referring to fig. 1, according to a second aspect of the present embodiment, there is provided a storage medium. The storage medium includes a stored program, wherein the method of any of the above is performed by a processor when the program is run.
In addition, the invention is mainly divided into three stages, and the whole flow is as follows:
1. sonogram feature extraction stage
1) And amplifying the data of the financial customer service record data of one thousand hours after completing emotion marking by means of adding noise, accelerating the speech speed, adding data disturbance and the like.
2) And dividing the data according to the ratio of 7:3 of the training set to the testing set, wherein the speaker information is fully considered, and the separation of the speaker voices of the training set and the testing set is achieved. And then respectively extracting Mel-spline characteristics, first-order difference characteristics and second-order difference characteristics from each voice file in the training set, forming a three-way characteristic diagram from each voice, and storing the three-way characteristic diagrams into the characteristic file.
2. Training stage of speech emotion recognition classification model
1) And reading the feature files to be trained in batches to form a data-label feature data combination.
2) The above feature combinations are fed into the designed ResNet and BiGRU network sequence classification network in batches, and the voice frame level features are aligned through an Attention mechanism. The frame-level features after the Attention are sent to a Softmax classifier, so that the forward propagation process of the classification network is completed. And then, performing back propagation training according to the cross entropy Loss until Loss converges, and storing a model.
3. Speech emotion recognition classification model recognition stage
1) And acquiring dialogue records of customer service and users, separating channels, and separating customer service channels from user channels.
2) And extracting characteristics of customer service and customer sound channel recordings respectively, wherein the characteristics comprise Mel-spline characteristics, first-order differential characteristics and second-order differential characteristics, sending the obtained characteristics into a trained network, and calling a model to classify emotion.
The key points of the invention are as follows:
(1) Combining the characteristic of more specific financial voice characteristic words of a financial customer service dialogue scene, combining the Mel-spline characteristic with the first-order differential characteristic and the second-order differential characteristic to form a three-dimensional characteristic, enhancing the characteristic coverage capability, and facilitating the deep learning classification network to perform classification learning;
(2) And (3) designing a ResNet network, designing a ResBlock block, and forming a corresponding deep learning feature mapping network under a financial customer service dialogue scene. The network has the characteristics of small parameter quantity, deep network depth and strong feature mapping capability. The feature processing speed is increased due to the small parameter quantity, the feature mapping capability of the network is greatly improved due to the deep network depth, and the key effect is played on the improvement of the follow-up emotion recognition accuracy;
(3) The method has the advantages that the two-way BiGRU network is accessed after the ResNet network is designed, the time sequence information is coded, the voice emotion time sequence information is effectively combined, and meanwhile, compared with the BiLSTM access, the network parameter is effectively reduced, the network operation speed is increased, and the network accuracy is not greatly influenced.
Compared with the prior art, the method introduces the first-order difference feature and the second-order difference feature on the basis of the Mel-spline feature, and the addition of the first-order difference feature and the second-order difference feature effectively combines the characteristics of short dialogue between a customer and customer service in a financial scene and more words of the financial voice feature, so that the accuracy performance in a specific financial dialogue scene is improved. In addition, in emotion recognition under a specific financial dialogue scene, the invention designs the ResNet network structure to perform feature mapping, and adds the BiGRU network to the rear layer of the ResNet network to perform sequential feature processing, thereby effectively improving the recognition accuracy performance and recognition efficiency under the financial dialogue scene.
In summary, in the present invention, in the task of recognizing speech emotion in a financial dialogue scene, the following effects can be produced:
1. the designed Mel-spline features and the fusion features of the combination of the first-order differential features and the second-order differential features effectively combine the characteristics of short customer and customer service dialogs and more financial voice feature words in the financial scene dialogs, and effectively improve the scene feature coverage capability.
2. The designed ResNet feature mapping network can carry out emotion feature mapping on voice data, and improves the accuracy performance of voice emotion recognition in a financial dialogue scene and the running efficiency of a model. In addition, the recognition model can also generate utility in voice gender recognition, voiceprint recognition and other voice classification related demand scenes.
3. And after the designed ResNet network is accessed, the bidirectional BiGRU sequence network is accessed, so that the characteristic processing capability of the voice sequence is improved, and compared with the case of accessing the BiLSTM sequence network, the network parameter quantity is effectively reduced, and the overall network operation efficiency is improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Example 2
Fig. 4 shows a speech emotion recognition device 400 according to the present embodiment, which device 400 corresponds to the method according to the first aspect of embodiment 1. Referring to fig. 4, the apparatus 400 includes: an acquisition module 410, configured to acquire voice information related to a target object of emotion to be identified; and an emotion recognition module 420, configured to perform emotion recognition on the voice information by using a preset recognition model, and determine an emotion category of the target object, where the recognition model includes a residual error network and a gating cycle unit.
Optionally, the recognition model further includes a feature extraction network and a classifier, and the emotion recognition module 420 includes: the first generation submodule is used for carrying out feature extraction on the voice information by utilizing a feature extraction network to generate a Mel spectrogram feature, a first-order difference feature and a second-order difference feature; the second generation submodule is used for carrying out feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by utilizing a residual error network to generate sequence features; the coding processing sub-module is used for coding the sequence characteristics by using the gating circulating unit; and the determining submodule is used for inputting the sequence characteristics after the coding processing into a classifier and determining the emotion type of the target object according to the output result of the classifier.
Optionally, the recognition model further includes an attention mechanism layer and a fully-connected layer, and the emotion recognition module 420 further includes a sequence alignment sub-module for inputting the encoded sequence features into the attention mechanism layer for sequence alignment before the encoded processing sub-module inputs the encoded sequence features into the classifier; and the full connection module is used for inputting the sequence characteristics after the sequence alignment into the full connection layer.
Optionally, the obtaining module 410 includes: the acquisition sub-module is used for acquiring dialogue recording information between the seat and the target object; and the voice information determining sub-module is used for carrying out sound channel separation on the dialogue recording information and determining the mono recording information as voice information related to the target object of the emotion to be recognized.
Optionally, the apparatus 400 further comprises a training module for training the recognition model by: acquiring a plurality of sample dialogue record data, wherein the sample dialogue record data comprises seat record data and user record data; constructing an identification model, wherein the identification model comprises a feature extraction network, a residual error network, a gating circulation unit, an attention mechanism layer and a classifier;
respectively outputting emotion categories of objects contained in the plurality of sample dialogue record data by utilizing the identification model; and comparing the outputted emotion categories with preset marked emotion categories corresponding to the plurality of sample dialogue record data, and adjusting the recognition model according to the comparison result, wherein the marked emotion categories are used for indicating the actual emotion categories of the objects contained in the sample dialogue record data.
Optionally, the operation of comparing the outputted emotion classification with preset labeled emotion classifications corresponding to the plurality of sample dialogue record data includes: calculating a value of a cross entropy loss function between the outputted emotion category and the annotated emotion category, and adjusting the operation of the recognition model according to the result of the comparison, comprising: and adjusting the identification model according to the value of the cross entropy loss function.
Thus, according to the present embodiment, in order to enhance emotion feature classification ability and to enhance emotion recognition efficiency, a residual network (res net network) is used to map features corresponding to voice information. The residual error network has the characteristics of small parameter quantity, deep network depth and strong feature mapping capability. The feature processing speed is increased due to the small parameter quantity, the feature mapping capability of the network is greatly improved due to the deep network depth, and the key effect is played on the improvement of the follow-up emotion recognition accuracy. And the gating circulation unit (BiGRU) is accessed after the ResNet network, the output of the residual error network is sent into the gating circulation unit (BiGRU), the time sequence information is encoded, the time sequence information of the voice emotion is effectively combined, the network parameters are effectively reduced, the voice emotion recognition efficiency is improved, and the recognition accuracy is not influenced. Therefore, the embodiment effectively improves the feature mapping capability and the sequence processing capability of the recognition model by utilizing the recognition model comprising the residual error network and the gating circulation unit to recognize the voice emotion of the target object, and achieves the technical effects of improving the emotion recognition accuracy and the voice emotion recognition efficiency. And further solves the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art.
Example 3
Fig. 5 shows a speech emotion recognition device 500 according to the present embodiment, which device 500 corresponds to the method according to the first aspect of embodiment 1. Referring to fig. 5, the apparatus 500 includes: a processor 510; and a memory 520 coupled to the processor 510 for providing instructions to the processor 510 for processing the following processing steps: acquiring voice information related to a target object of emotion to be identified; and carrying out emotion recognition on the voice information by using a preset recognition model to determine the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit.
Optionally, the recognition model further includes a feature extraction network and a classifier, and the operation of identifying emotion of the voice information by using the preset recognition model to determine emotion type of the target object includes: extracting the characteristics of the voice information by utilizing a characteristic extraction network to generate a Mel spectrogram characteristic, a first-order differential characteristic and a second-order differential characteristic; performing feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by using a residual error network to generate sequence features; coding the sequence characteristics by using a gating circulation unit; and inputting the coded sequence features into a classifier, and determining the emotion type of the target object according to the output result of the classifier.
Optionally, the recognition model further comprises an attention mechanism layer and a fully connected layer, and the memory 520 is further configured to provide instructions for the processor 510 to process the following processing steps: before inputting the coded sequence features into the classifier, inputting the coded sequence features into an attention mechanism layer for sequence alignment; and inputting the sequence characteristics after the sequence alignment into the full connection layer.
Optionally, the operation of acquiring the voice information related to the target object of the emotion to be identified includes: acquiring dialogue recording information between an agent and a target object; and carrying out sound channel separation on the dialogue record information, and determining the mono record information as voice information related to the target object of the emotion to be recognized.
Optionally, the memory 520 is also used to provide instructions for the processor 510 to process the following processing steps: training the recognition model by: acquiring a plurality of sample dialogue record data, wherein the sample dialogue record data comprises seat record data and user record data; constructing an identification model, wherein the identification model comprises a feature extraction network, a residual error network, a gating circulation unit, an attention mechanism layer and a classifier; respectively outputting emotion categories of objects contained in the plurality of sample dialogue record data by utilizing the identification model; and comparing the outputted emotion categories with preset marked emotion categories corresponding to the plurality of sample dialogue record data, and adjusting the recognition model according to the comparison result, wherein the marked emotion categories are used for indicating the actual emotion categories of the objects contained in the sample dialogue record data.
Optionally, the operation of comparing the outputted emotion classification with preset labeled emotion classifications corresponding to the plurality of sample dialogue record data includes: calculating a value of a cross entropy loss function between the outputted emotion category and the annotated emotion category, and adjusting the operation of the recognition model according to the result of the comparison, comprising: and adjusting the identification model according to the value of the cross entropy loss function.
Thus, according to the present embodiment, in order to enhance emotion feature classification ability and to enhance emotion recognition efficiency, a residual network (res net network) is used to map features corresponding to voice information. The residual error network has the characteristics of small parameter quantity, deep network depth and strong feature mapping capability. The feature processing speed is increased due to the small parameter quantity, the feature mapping capability of the network is greatly improved due to the deep network depth, and the key effect is played on the improvement of the follow-up emotion recognition accuracy. And the gating circulation unit (BiGRU) is accessed after the ResNet network, the output of the residual error network is sent into the gating circulation unit (BiGRU), the time sequence information is encoded, the time sequence information of the voice emotion is effectively combined, the network parameters are effectively reduced, the voice emotion recognition efficiency is improved, and the recognition accuracy is not influenced. Therefore, the embodiment effectively improves the feature mapping capability and the sequence processing capability of the recognition model by utilizing the recognition model comprising the residual error network and the gating circulation unit to recognize the voice emotion of the target object, and achieves the technical effects of improving the emotion recognition accuracy and the voice emotion recognition efficiency. And further solves the technical problems of low accuracy and low recognition efficiency of the existing voice emotion recognition method in the prior art.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (8)

1. A method of speech emotion recognition, comprising:
acquiring voice information related to a target object of emotion to be identified; and
carrying out emotion recognition on the voice information by using a preset recognition model, and determining emotion types of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit;
the recognition model further comprises a feature extraction network and a classifier, and the operation of identifying emotion of the voice information by using a preset recognition model and determining emotion type of the target object comprises the following steps:
performing feature extraction on the voice information by utilizing the feature extraction network to generate a Mel spectrogram feature, a first-order difference feature and a second-order difference feature;
performing feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by using the residual error network to generate sequence features;
Using the gating circulation unit to encode the sequence characteristics; and
inputting the sequence features after the coding processing into the classifier, and determining the emotion category of the target object according to the output result of the classifier.
2. The method of claim 1, wherein the recognition model further comprises an attention mechanism layer and a fully connected layer, and wherein prior to the operation of inputting the encoded sequence feature into the classifier, further comprising:
inputting the coded sequence features into the attention mechanism layer for sequence alignment; and
and inputting the sequence characteristics after sequence alignment into the full connection layer.
3. The method of claim 1, wherein the operation of obtaining speech information related to the target object of emotion to be recognized comprises:
acquiring dialogue recording information between an agent and the target object; and
and carrying out sound channel separation on the dialogue record information, and determining the mono record information as voice information related to the target object of the emotion to be recognized.
4. The method of claim 2, further comprising training the recognition model by:
Acquiring a plurality of sample dialogue record data, wherein the sample dialogue record data comprises seat record data and user record data;
constructing the recognition model, wherein the recognition model comprises the feature extraction network, the residual network, the gating loop unit, the attention mechanism layer and the classifier;
respectively outputting emotion categories of objects contained in the plurality of sample dialogue record data by utilizing the identification model; and
comparing the outputted emotion categories with preset marked emotion categories corresponding to the plurality of sample dialogue record data, and adjusting the recognition model according to the comparison result, wherein the marked emotion categories are used for indicating the actual emotion categories of the objects contained in the sample dialogue record data.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
the operation of comparing the outputted emotion classification with preset labeling emotion classifications corresponding to the plurality of sample dialogue record data comprises the following steps: calculating a value of a cross entropy loss function between the outputted emotion classification and the noted emotion classification, and
adjusting the operation of the recognition model according to the result of the comparison, comprising: and adjusting the identification model according to the value of the cross entropy loss function.
6. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 5 is performed by a processor when the program is run.
7. An apparatus for speech emotion recognition, comprising:
the acquisition module is used for acquiring voice information related to a target object of emotion to be identified; and
the emotion recognition module is used for carrying out emotion recognition on the voice information by utilizing a preset recognition model and determining the emotion type of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit;
the recognition model further includes a feature extraction network and a classifier, and the emotion recognition module includes:
the first generation submodule is used for carrying out feature extraction on the voice information by utilizing the feature extraction network to generate a Mel spectrogram feature, a first-order difference feature and a second-order difference feature;
the second generation submodule is used for carrying out feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by utilizing the residual error network to generate sequence features;
the coding processing sub-module is used for coding the sequence characteristics by utilizing the gating circulating unit; and
And the determining submodule is used for inputting the sequence characteristics after the coding processing into the classifier and determining the emotion category of the target object according to the output result of the classifier.
8. An apparatus for speech emotion recognition, comprising:
a processor; and
a memory, coupled to the processor, for providing instructions to the processor to process the following processing steps:
acquiring voice information related to a target object of emotion to be identified; and
carrying out emotion recognition on the voice information by using a preset recognition model, and determining emotion types of the target object, wherein the recognition model comprises a residual error network and a gating circulation unit;
the recognition model further comprises a feature extraction network and a classifier, and the operation of identifying emotion of the voice information by using a preset recognition model and determining emotion type of the target object comprises the following steps:
performing feature extraction on the voice information by utilizing the feature extraction network to generate a Mel spectrogram feature, a first-order difference feature and a second-order difference feature;
performing feature mapping on the Mel spectrogram features, the first-order difference features and the second-order difference features by using the residual error network to generate sequence features;
Using the gating circulation unit to encode the sequence characteristics; and
inputting the sequence features after the coding processing into the classifier, and determining the emotion category of the target object according to the output result of the classifier.
CN202010833052.2A 2020-08-18 2020-08-18 Speech emotion recognition method, device and storage medium Active CN114078484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010833052.2A CN114078484B (en) 2020-08-18 2020-08-18 Speech emotion recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010833052.2A CN114078484B (en) 2020-08-18 2020-08-18 Speech emotion recognition method, device and storage medium

Publications (2)

Publication Number Publication Date
CN114078484A CN114078484A (en) 2022-02-22
CN114078484B true CN114078484B (en) 2023-06-09

Family

ID=80281579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010833052.2A Active CN114078484B (en) 2020-08-18 2020-08-18 Speech emotion recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN114078484B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331701A (en) * 2022-08-15 2022-11-11 中国银行股份有限公司 Speech emotion recognition method, system and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460737A (en) * 2018-11-13 2019-03-12 四川大学 A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network
CN109859772A (en) * 2019-03-22 2019-06-07 平安科技(深圳)有限公司 Emotion identification method, apparatus and computer readable storage medium
CN110164476A (en) * 2019-05-24 2019-08-23 广西师范大学 A kind of speech-emotion recognition method of the BLSTM based on multi output Fusion Features
CN111508530A (en) * 2020-04-13 2020-08-07 腾讯科技(深圳)有限公司 Speech emotion recognition method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11170761B2 (en) * 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460737A (en) * 2018-11-13 2019-03-12 四川大学 A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network
CN109859772A (en) * 2019-03-22 2019-06-07 平安科技(深圳)有限公司 Emotion identification method, apparatus and computer readable storage medium
CN110164476A (en) * 2019-05-24 2019-08-23 广西师范大学 A kind of speech-emotion recognition method of the BLSTM based on multi output Fusion Features
CN111508530A (en) * 2020-04-13 2020-08-07 腾讯科技(深圳)有限公司 Speech emotion recognition method, device and storage medium

Also Published As

Publication number Publication date
CN114078484A (en) 2022-02-22

Similar Documents

Publication Publication Date Title
US11158324B2 (en) Speaker separation model training method, two-speaker separation method and computing device
CN109767765A (en) Talk about art matching process and device, storage medium, computer equipment
WO2022095380A1 (en) Ai-based virtual interaction model generation method and apparatus, computer device and storage medium
EP4099709A1 (en) Data processing method and apparatus, device, and readable storage medium
CN109119069B (en) Specific crowd identification method, electronic device and computer readable storage medium
CN111683285A (en) File content identification method and device, computer equipment and storage medium
Kinoshita et al. Tight integration of neural-and clustering-based diarization through deep unfolding of infinite gaussian mixture model
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
CN114677634B (en) Surface label identification method and device, electronic equipment and storage medium
CN114078484B (en) Speech emotion recognition method, device and storage medium
Uhle Applause sound detection
CN114758668A (en) Training method of voice enhancement model and voice enhancement method
CN110610697B (en) Voice recognition method and device
CN115497456A (en) Speech emotion recognition method and device for financial conversation scene and storage medium
Kinoshita et al. Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
CN111477212A (en) Content recognition, model training and data processing method, system and equipment
CN113763925B (en) Speech recognition method, device, computer equipment and storage medium
CN113593579A (en) Voiceprint recognition method and device and electronic equipment
CN113868415A (en) Knowledge base generation method and device, storage medium and electronic equipment
CN111785280A (en) Identity authentication method and device, storage medium and electronic equipment
CN112750448A (en) Sound scene recognition method, device, equipment and storage medium
CN115331673B (en) Voiceprint recognition household appliance control method and device in complex sound scene
CN110880326B (en) Voice interaction system and method
CN113590741A (en) Audio data evaluation method and device, electronic equipment and storage medium
CN118447855A (en) Speaker role recognition method and device based on double-recording system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 226, 2nd Floor, No. 5 Guanghua Road, Zhangjiawan Town, Tongzhou District, Beijing, 101100

Patentee after: Beijing Zhongkejin Finite Element Technology Co.,Ltd.

Address before: 100080 27-270 on the 23rd floor, block B, No. 1 Wangzhuang Road, Haidian District, Beijing

Patentee before: Beijing finite element technology Co.,Ltd.

CP03 Change of name, title or address