CN113129871A - Music emotion recognition method and system based on audio signal and lyrics - Google Patents

Music emotion recognition method and system based on audio signal and lyrics Download PDF

Info

Publication number
CN113129871A
CN113129871A CN202110328406.2A CN202110328406A CN113129871A CN 113129871 A CN113129871 A CN 113129871A CN 202110328406 A CN202110328406 A CN 202110328406A CN 113129871 A CN113129871 A CN 113129871A
Authority
CN
China
Prior art keywords
lyric
neural network
convolutional neural
audio
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110328406.2A
Other languages
Chinese (zh)
Inventor
李风环
李轶
田春晖
徐宏杰
张健炜
符善森
黎其钻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110328406.2A priority Critical patent/CN113129871A/en
Publication of CN113129871A publication Critical patent/CN113129871A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention provides a music emotion recognition method and system based on audio signals and lyrics, and solves the problems of single consideration factor and low emotion recognition accuracy of the conventional music emotion recognition method.

Description

Music emotion recognition method and system based on audio signal and lyrics
Technical Field
The invention relates to the technical field of music emotion recognition, in particular to a music emotion recognition method and system based on audio signals and lyrics.
Background
With the development of music and technology, music emotion recognition systems are actively used for various purposes, including personal music collection, music recommendation systems, and music treatment for emotional disorders. Analyzing the emotional content of music is a cross-disciplinary study that includes not only signal processing and machine learning, but also auditory perception concepts, psychology, cognitive sciences, and musics.
The existing music emotion recognition method firstly extracts acoustic features of music acoustic contents such as rhythm and tone and then applies different machine learning algorithms to understand the relationship between the extracted features of music and preset emotion labels.
Chinese patent No. CN111326178A, 6/23/2020, discloses a multi-modal speech emotion recognition system and method based on a convolutional neural network, wherein speech signals are processed through the convolutional neural network, so that emotion information in speech is recognized through extracting speech feature analysis, the accuracy of analysis and recognition is improved to a certain extent, but the effect of lyrics in music emotion recognition is ignored, so that the accuracy of music emotion recognition needs to be further improved, and the current music emotion recognition method research combining audio signals and music lyrics is lacked.
Disclosure of Invention
In order to solve the problems of single consideration and low emotion recognition accuracy of the existing music emotion recognition method, the invention provides a music emotion recognition method and system based on audio signals and lyrics, wherein the emotion is recognized by combining the audio signals and the lyrics, so that the accuracy of music emotion recognition is improved.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a music emotion recognition method based on audio signals and lyrics at least comprises the following steps:
s1, obtaining an audio data sample and a lyric data sample of music to be recognized, and respectively preprocessing the audio data sample and the lyric data sample to obtain an audio signal and a lyric vector;
s2, respectively constructing and training a first convolutional neural network for extracting audio signal characteristics and a second convolutional neural network for extracting lyric vector characteristics;
s3, inputting the preprocessed audio data sample into a first convolutional neural network, inputting the preprocessed lyric data sample into a second convolutional neural network, and serially connecting and fusing the output end of the first convolutional neural network and the output end of the second convolutional neural network by using a fusion module;
and S4, the fusion module outputs the fusion result of the audio signal characteristic and the lyric vector characteristic to the full connection layer module for analysis and processing, and the full connection layer module outputs a music emotion recognition result.
Preferably, before the audio data samples and the lyric data samples are respectively preprocessed, each audio data sample and each lyric data sample obtain three additional sample segments by utilizing pitch shift and lossy coding, and finally, the number of the audio data samples and the lyric data samples is expanded by using a data enhancement technology, so that the richness of the samples is improved.
Preferably, in step S1, when the audio data sample is preprocessed, the audio data sample is converted into a two-dimensional mel-frequency spectrogram audio signal, and the conversion process is as follows:
determining the number M of the Mel filters and the sample length N of the Hann window audio data;
setting a sampling frequency f;
the audio data samples are input to a mel filter and converted into a two-dimensional mel-frequency spectrogram audio signal.
Preferably, in step S1, when preprocessing the lyric data samples, a segment of a word is extracted from each track, and the lyric data samples are made into a lyric vector using a K-dimensional vector.
Preferably, the M mel-filters and the Hann window of the N audio data samples are non-overlapping.
Preferably, the first convolutional neural network of step S2 includes a first convolutional layer and a second convolutional layer connected in sequence, where the first convolutional layer and the second convolutional layer are both one-dimensional and include 32 feature maps with a size of 8 and a step size of 1 and 16 pooling layers with a size of 4 and a step size of 4.
Preferably, the second convolutional neural network of step S2 includes a third convolutional layer and an LSTM layer connected in sequence, where the third convolutional layer is a one-dimensional convolutional layer structure.
Preferably, the input of the first convolutional neural network is an audio signal, the output of the first convolutional neural network is an audio signal characteristic, and the trained network parameters of the first convolutional neural network are obtained through training by a gradient descent method; and the input of the second convolutional neural network is a lyric vector, the output of the second convolutional neural network is a lyric vector characteristic, and the trained network parameters of the second convolutional neural network are obtained through training by a gradient descent method.
Preferably, the full-link layer module includes a first full-link layer and a second full-link layer, an input end of the first full-link layer is connected to the common output end, an output end of the first full-link layer is connected to an input end of the second full-link layer, the first full-link layer and the second full-link layer jointly predict a fusion result of the audio signal feature and the lyric vector feature, and an output end of the second full-link layer outputs a music emotion recognition result.
The invention also provides a multi-mode music emotion recognition system based on the audio signals and the lyrics, which is used for realizing the music emotion recognition method based on the audio signals and the lyrics, and comprises the following steps:
the music data acquisition module comprises an audio data acquisition module and a lyric data acquisition module; the audio data acquisition module is used for acquiring an audio data sample of music to be recognized, and the lyric data acquisition module is used for acquiring a lyric data sample of the music to be recognized;
the preprocessing module is used for respectively preprocessing the audio data samples and the lyric data samples to obtain audio signals and lyric vectors;
the characteristic extraction module is used for extracting audio signal characteristics and lyric vector characteristics;
the fusion module is used for fusing the audio signal features and the lyric vector features extracted by the feature extraction module;
and the full connection layer module is used for receiving the result of fusion of the audio signal characteristics and the lyric vector characteristics output by the fusion module, analyzing and predicting and outputting a music emotion recognition result.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a music emotion recognition method and a music emotion recognition system based on audio signals and lyrics.
Drawings
Fig. 1 is a schematic flow chart of a music emotion recognition method based on audio signals and lyrics according to an embodiment of the present invention;
FIG. 2 is a block diagram of an overall neural network framework for multi-modal music emotion recognition based on audio signals and lyrics in an embodiment of the present invention;
fig. 3 is a system diagram of multimodal music emotion recognition based on audio signals and lyrics according to an embodiment of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;
it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Examples
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
fig. 1 is a schematic flow chart of a music emotion recognition method based on audio signals and lyrics, and referring to fig. 1, the method includes:
s1, obtaining an audio data sample and a lyric data sample of music to be recognized, and respectively preprocessing the audio data sample and the lyric data sample to obtain an audio signal and a lyric vector;
s2, respectively constructing and training a first convolutional neural network for extracting audio signal characteristics and a second convolutional neural network for extracting lyric vector characteristics;
s3, inputting the preprocessed audio data sample into a first convolutional neural network, inputting the preprocessed lyric data sample into a second convolutional neural network, and serially connecting and fusing the output end of the first convolutional neural network and the output end of the second convolutional neural network by using a fusion module;
and S4, the fusion module outputs the fusion result of the audio signal characteristic and the lyric vector characteristic to the full connection layer module for analysis and processing, and the full connection layer module outputs a music emotion recognition result.
In this embodiment, when the audio data samples and the lyric data samples are obtained in step S1, 30-second long segments are extracted from the audio, 7 segments are extracted from each track, the segments are uniformly extracted from the song, before the audio data samples and the lyric data samples are respectively preprocessed, pitch offset and lossy coding are used to obtain three additional sample segments for each audio data sample and lyric data sample, in specific implementation, the size of the whole sample training set is increased by about 21 times, and finally, the number of the audio data samples and lyric data samples is expanded by using a data enhancement technique.
When the audio data sample is preprocessed, the audio data sample is converted into a two-dimensional Mel frequency spectrogram audio signal, and the conversion process is as follows:
determining the number M of the Mel filters and the sample length N of the Hann window audio data;
setting a sampling frequency f;
the audio data samples are input to a mel filter and converted into a two-dimensional mel-frequency spectrogram audio signal.
When the lyric data samples are preprocessed, extracting fragments of words from each track, and making the lyric data samples into lyric vectors by using K-dimensional vectors; specifically, in the present embodiment, the input lyrics are represented by using a 100-dimensional vector, and seven segments of 50 words are extracted from each track by data expansion to produce a lyrics vector.
The M Mel filters and the Hann windows of the N audio data samples are not overlapped, in the embodiment, the number M of the Mel filters is 40, the length N of the samples of the Hann window audio data is 1024, the samples are not overlapped, and the sampling frequency f is 44.1 kHz;
in this embodiment, the first convolutional neural network described in step S2 includes a first convolutional layer and a second convolutional layer that are connected in sequence, where the first convolutional layer and the second convolutional layer are both one-dimensional, and each include 32 feature maps with a size of 8 and a stride of 1, and 16 pooling layers with a size of 4 and a stride of 4, the second convolutional neural network includes a third convolutional layer and an LSTM layer that are connected in sequence, and the third convolutional layer is a one-dimensional convolutional layer structure, and during specific training, an audio data sample and a lyric data sample of music to be recognized, which are originally obtained in step S1, are respectively divided into a test set and a training set, and then are respectively preprocessed, an input of the first convolutional neural network is an audio signal, an output is an audio signal feature, and network parameters of the trained first convolutional neural network are obtained through a gradient descent method training; the input of the second convolutional neural network is a lyric vector, the output of the second convolutional neural network is a lyric vector characteristic, network parameters of the trained second convolutional neural network are obtained through training by a gradient descent method, the input size of each neural network is 100 x 50, the full connection layer module comprises a first full connection layer and a second full connection layer, the input end of the first full connection layer is connected with the public output end, the output end of the first full connection layer is connected with the input end of the second full connection layer, the first full connection layer and the second full connection layer jointly predict the fusion result of the audio signal characteristic and the lyric vector characteristic, the output end of the second full connection layer outputs a music emotion recognition result, and the overall neural network framework of the multi-mode music emotion recognition based on the audio signal and the lyrics can refer to the figure 2.
Referring to fig. 3, the present invention further provides a multimodal music emotion recognition system based on audio signals and lyrics, where the system is used to implement the music emotion recognition method based on audio signals and lyrics, and the method includes:
the music data acquisition module comprises an audio data acquisition module and a lyric data acquisition module; the audio data acquisition module is used for acquiring an audio data sample of music to be recognized, and the lyric data acquisition module is used for acquiring a lyric data sample of the music to be recognized;
the preprocessing module is used for respectively preprocessing the audio data samples and the lyric data samples to obtain audio signals and lyric vectors;
the characteristic extraction module is used for extracting audio signal characteristics and lyric vector characteristics;
the fusion module is used for fusing the audio signal features and the lyric vector features extracted by the feature extraction module; during specific implementation, a fusion module compatible with the first convolutional neural network and the second convolutional neural network is selected according to specific components of the first convolutional neural network and the second convolutional neural network and respective final output ends of the first convolutional neural network and the second convolutional neural network.
And the full connection layer module is used for receiving the result of fusion of the audio signal characteristics and the lyric vector characteristics output by the fusion module, analyzing and predicting and outputting a music emotion recognition result.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A music emotion recognition method based on audio signals and lyrics is characterized by at least comprising the following steps:
s1, obtaining an audio data sample and a lyric data sample of music to be recognized, and respectively preprocessing the audio data sample and the lyric data sample to obtain an audio signal and a lyric vector;
s2, respectively constructing and training a first convolutional neural network for extracting audio signal characteristics and a second convolutional neural network for extracting lyric vector characteristics;
s3, inputting the preprocessed audio data sample into a first convolutional neural network, inputting the preprocessed lyric data sample into a second convolutional neural network, and serially connecting and fusing the output end of the first convolutional neural network and the output end of the second convolutional neural network by using a fusion module;
and S4, the fusion module outputs the fusion result of the audio signal characteristic and the lyric vector characteristic to the full connection layer module for analysis and processing, and the full connection layer module outputs a music emotion recognition result.
2. The method of claim 1, wherein the audio data samples and the lyric data samples are pre-processed separately by obtaining three additional sample segments for each of the audio data samples and the lyric data samples using pitch shifting and lossy coding, and finally expanding the number of the audio data samples and the lyric data samples using data enhancement techniques.
3. The method for music emotion recognition based on audio signal and lyrics of claim 2, wherein in step S1, the audio data sample is converted into two-dimensional mel-frequency spectrum audio signal during the pre-processing of the audio data sample, the conversion process is as follows:
determining the number M of the Mel filters and the sample length N of the Hann window audio data;
setting a sampling frequency f;
the audio data samples are input to a mel filter and converted into a two-dimensional mel-frequency spectrogram audio signal.
4. The method for music emotion recognition based on audio signal and lyrics of claim 3, wherein, in step S1, when the lyric data samples are preprocessed, segments of words are extracted from each track, and the lyric data samples are made into lyric vector using K-dimensional vector.
5. The method of claim 4, wherein the M Mel filters and the Hann windows of the N audio data samples are non-overlapping.
6. The method of claim 5, wherein the first convolutional neural network comprises a first convolutional layer and a second convolutional layer connected in sequence in step S2, the first convolutional layer and the second convolutional layer are one-dimensional and comprise 32 feature maps with size of 8 and step of 1 and 16 pooling layers with size of 4 and step of 4.
7. The method for music emotion recognition based on audio signal and lyrics of claim 6, wherein the second convolutional neural network of step S2 includes a third convolutional layer and an LSTM layer connected in sequence, and the third convolutional layer is a one-dimensional convolutional layer structure.
8. The method for music emotion recognition based on audio signals and lyrics of claim 7, wherein the input of the first convolutional neural network is audio signals, the output of the first convolutional neural network is audio signal characteristics, and the network parameters of the trained first convolutional neural network are obtained through training by a gradient descent method; and the input of the second convolutional neural network is a lyric vector, the output of the second convolutional neural network is a lyric vector characteristic, and the trained network parameters of the second convolutional neural network are obtained through training by a gradient descent method.
9. The method of claim 8, wherein the fully-connected layer module comprises a first fully-connected layer and a second fully-connected layer, an input terminal of the first fully-connected layer is connected to the common output terminal, an output terminal of the first fully-connected layer is connected to an input terminal of the second fully-connected layer, the first fully-connected layer and the second fully-connected layer jointly predict a result of fusion of the audio signal feature and the lyric vector feature, and an output terminal of the second fully-connected layer outputs the result of music emotion recognition.
10. A multimodal music emotion recognition system based on audio signals and lyrics, wherein the system is used for implementing the music emotion recognition method based on audio signals and lyrics as claimed in any one of claims 1 to 9, and the method comprises:
the music data acquisition module comprises an audio data acquisition module and a lyric data acquisition module; the audio data acquisition module is used for acquiring an audio data sample of music to be recognized, and the lyric data acquisition module is used for acquiring a lyric data sample of the music to be recognized;
the preprocessing module is used for respectively preprocessing the audio data samples and the lyric data samples to obtain audio signals and lyric vectors;
the characteristic extraction module is used for extracting audio signal characteristics and lyric vector characteristics;
the fusion module is used for fusing the audio signal features and the lyric vector features extracted by the feature extraction module;
and the full connection layer module is used for receiving the result of fusion of the audio signal characteristics and the lyric vector characteristics output by the fusion module, analyzing and predicting and outputting a music emotion recognition result.
CN202110328406.2A 2021-03-26 2021-03-26 Music emotion recognition method and system based on audio signal and lyrics Pending CN113129871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110328406.2A CN113129871A (en) 2021-03-26 2021-03-26 Music emotion recognition method and system based on audio signal and lyrics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110328406.2A CN113129871A (en) 2021-03-26 2021-03-26 Music emotion recognition method and system based on audio signal and lyrics

Publications (1)

Publication Number Publication Date
CN113129871A true CN113129871A (en) 2021-07-16

Family

ID=76773908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110328406.2A Pending CN113129871A (en) 2021-03-26 2021-03-26 Music emotion recognition method and system based on audio signal and lyrics

Country Status (1)

Country Link
CN (1) CN113129871A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114242070A (en) * 2021-12-20 2022-03-25 阿里巴巴(中国)有限公司 Video generation method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
US20140058735A1 (en) * 2012-08-21 2014-02-27 David A. Sharp Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music
CN106228977A (en) * 2016-08-02 2016-12-14 合肥工业大学 The song emotion identification method of multi-modal fusion based on degree of depth study
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique
CN110674339A (en) * 2019-09-18 2020-01-10 北京工业大学 Chinese song emotion classification method based on multi-mode fusion
CN111858943A (en) * 2020-07-30 2020-10-30 杭州网易云音乐科技有限公司 Music emotion recognition method and device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
US20140058735A1 (en) * 2012-08-21 2014-02-27 David A. Sharp Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music
CN106228977A (en) * 2016-08-02 2016-12-14 合肥工业大学 The song emotion identification method of multi-modal fusion based on degree of depth study
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique
CN110674339A (en) * 2019-09-18 2020-01-10 北京工业大学 Chinese song emotion classification method based on multi-mode fusion
CN111858943A (en) * 2020-07-30 2020-10-30 杭州网易云音乐科技有限公司 Music emotion recognition method and device, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114242070A (en) * 2021-12-20 2022-03-25 阿里巴巴(中国)有限公司 Video generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109817213B (en) Method, device and equipment for performing voice recognition on self-adaptive language
CN110853618B (en) Language identification method, model training method, device and equipment
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
EP3346463B1 (en) Identity verification method and apparatus based on voiceprint
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN111402857A (en) Speech synthesis model training method and device, electronic equipment and storage medium
CN112349301A (en) Information processing apparatus, information processing method, and recording medium
CN107221344A (en) A kind of speech emotional moving method
CN112999490A (en) Music healing system based on brain wave emotion recognition and processing method thereof
CN109377986A (en) A kind of non-parallel corpus voice personalization conversion method
CN113129871A (en) Music emotion recognition method and system based on audio signal and lyrics
CN110348482A (en) A kind of speech emotion recognition system based on depth model integrated architecture
CN117198338B (en) Interphone voiceprint recognition method and system based on artificial intelligence
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
CN113571095A (en) Speech emotion recognition method and system based on nested deep neural network
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN114626424B (en) Data enhancement-based silent speech recognition method and device
CN115376560A (en) Voice feature coding model for early screening of mild cognitive impairment and training method thereof
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language
Wang et al. Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition
CN114420086B (en) Speech synthesis method and device
Pagidirayi et al. An efficient Speech Emotion Recognition using LSTM model.
KR100202424B1 (en) Real time speech recognition method
CN116959499A (en) Method for recognizing audio emotion with indefinite length based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination