CN113035203A - Control method for dynamically changing voice response style - Google Patents

Control method for dynamically changing voice response style Download PDF

Info

Publication number
CN113035203A
CN113035203A CN202110326327.8A CN202110326327A CN113035203A CN 113035203 A CN113035203 A CN 113035203A CN 202110326327 A CN202110326327 A CN 202110326327A CN 113035203 A CN113035203 A CN 113035203A
Authority
CN
China
Prior art keywords
audio
voice
voiceprint
user
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110326327.8A
Other languages
Chinese (zh)
Inventor
焦其意
陆涛
郭杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Meiling Union Technology Co Ltd
Original Assignee
Hefei Meiling Union Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Meiling Union Technology Co Ltd filed Critical Hefei Meiling Union Technology Co Ltd
Priority to CN202110326327.8A priority Critical patent/CN113035203A/en
Publication of CN113035203A publication Critical patent/CN113035203A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a control method for dynamically changing a voice response style, and relates to the field of intelligent household appliances. The invention comprises the following steps: collecting voice data of different age groups and establishing a sample database; processing audio files in the sample database to obtain an audio frame sequence; performing Fourier change on each frame in the audio frame sequence to obtain spectrogram information feature extraction of the frame and obtain a voiceprint feature vector; training the voiceprint feature vectors to obtain a voiceprint feature model; the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection; and dynamically changing the voice response style according to the user type. According to the invention, the voice of the refrigerator user is collected, the age bracket of the user is judged according to the voiceprint of the user, and the voice response style corresponding to the age bracket is started, so that the intelligent equipment is more convenient for the old and children to use, and the intelligent degree of the equipment is improved.

Description

Control method for dynamically changing voice response style
Technical Field
The invention belongs to the technical field of intelligent household appliances, and particularly relates to a control method for dynamically changing a voice response style.
Background
With the development of artificial intelligence, more and more devices can realize the function of voice interaction with users, for example, an intelligent robot can perform dialogue communication with users.
In the prior art, various devices can recognize the voice of a user through a voice recognition technology, determine the dialogue content with the user according to a pre-trained voice dialogue model, and finally play the audio of the dialogue content through a terminal, thereby completing the voice interaction with the user.
Along with the continuous development of equipment, intelligent device is constantly updated, and voice response content is more and more complicated, and to old man or children, it is big to understand the degree of difficulty, uses inconveniently, becomes the problem that needs to solve.
Disclosure of Invention
The invention aims to provide a control method for dynamically changing a voice response style, which solves the problems of large understanding difficulty, inconvenient use and insufficient interest of the existing intelligent equipment for old people and children by collecting the voice of a refrigerator user, judging the age group of the user according to the voiceprint of the user and starting the voice response style corresponding to the age group.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a control method for dynamically changing a voice response style, which comprises the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
Preferably, in the step S1, the age of the user is divided into three stages: children, young and old people, and children, young and old people are alive, humorous and traditional corresponding to the voice response style respectively.
Preferably, in step S2, the audio frame sequence obtaining step includes:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
Preferably, in step S3, fourier transform is performed on each frame in the sequence of audio frames to obtain a frequency spectrum of each frame of audio sequence, and a power spectrum of the audio wash is obtained by taking a square of a modulus of the frequency spectrum of each frame of audio sequence; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
Preferably, in step S4, the time domain information and the frequency domain information of the spectrogram information are input into a two-dimensional convolutional neural network, so as to obtain the time domain feature and the frequency domain feature of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
Preferably, in step S5, inputting the voiceprint feature vector into a convolutional neural network model for training, and obtaining a voiceprint model for identifying a voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
Preferably, in step S7, after the user type is determined, matching is performed according to a pre-trained interaction style training model and a response text library, a target interactive character is determined, the target interactive character is converted into a response voice audio consistent with a voice response style according to an adjustment parameter corresponding to the current user type, and the response voice audio is played.
The invention has the following beneficial effects:
processing an audio file in a sample database to obtain an audio sequence, performing Fourier transform processing on each frame of the audio sequence, extracting a voiceprint characteristic vector, inputting the voiceprint characteristic vector into a convolutional neural network model for training to obtain a voiceprint characteristic model; the voice data of the user is input into the voiceprint feature model, the age of the user is judged according to the voiceprint of the user, and the voice response style corresponding to the age is started, so that the old and children can use the intelligent device more conveniently, and the intelligent degree of the device is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a step diagram of a control method for dynamically changing the voice response style according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a control method for dynamically changing a voice response style, comprising the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
In step S1, the age of the user is divided into three stages: children, young adults and old adults are lively, humorous and traditional respectively corresponding to the voice response style; this document defines children under 14 years of age, young children between 14 and 60 years of age, and elderly people above 60 years of age; when a child starts the refrigerator and gives a voice instruction, the refrigerator can converse with the child through lively tone, and guides the child to use the refrigerator, so that the child can be guided to correctly operate and use the refrigerator while understanding is facilitated, and for example, a child needs to eat less ice cream and remember to close a refrigerator door with the help of the child; similarly, when the old people use the refrigerator, the refrigerator can communicate with the old people through steady and temperate words, and remind and care the old people, for example, "the meal just taken out of the refrigerator can not be eaten directly, and the old people are advised to eat again after being heated".
Voiceprints can extract physiological or behavioral aspects of a speaker from speech waveforms, and then feature matching. To implement voiceprint recognition, a speaker first needs to input multiple age-based voice samples into the system and extract personal features using voiceprint feature extraction techniques. The data are finally put into a database through a voiceprint modeling technology, the identification objects are models stored in the database and voiceprint characteristics needing to be verified, and finally the age bracket corresponding to the speaker is identified.
In step S2, the audio frame sequence obtaining step includes:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
The fundamental frequency of the voice is about 100Hz for men and about 200Hz for women, the conversion period is 10ms and 5ms, the audio frame contains a plurality of periods, generally at least 20ms, and the gender of the speaker can be judged through the audio frame.
In step S23, pre-emphasis is performed to increase the high frequency component of the audio signal so that the audio signal becomes relatively flat from low frequency to high frequency; using a high-pass filter to boost the high-frequency component, the filter having a response characteristic such as
H(z)=1-uz-1
Wherein, the value range of the coefficient u is [0.9, 1], and u is a pre-emphasis coefficient;
pre-emphasis (Pre-emphasis) is a method of compensating for high frequency components of a transmission signal in advance at a transmitting end. Pre-emphasis is performed because the signal energy distribution is not uniform, and the signal-to-noise ratio (SNR) at the high frequency end of the speech signal may drop to the threshold range. The power spectrum of the voice signal is in inverse proportion to the frequency, the energy of the low-frequency region is high, the energy of the high-frequency region is low, and the reason of uneven distribution is considered, so that the signal amplitude generating the maximum frequency deviation can be speculatively judged to be mostly in the low frequency. And the noise power spectrum is pre-emphasized by changing the expression mode. This is an undesirable result for both people and therefore counter-balancing pre-emphasis and de-emphasis occurs. The pre-emphasis is to improve the high-frequency signal, remove the influence of glottis and lips, and facilitate the research on the influence of sound channels. However, in order to restore the original signal power distribution as much as possible, it is necessary to perform a reverse process, that is, a de-emphasis technique for de-emphasizing a high-frequency signal. In the process of the step, the high-frequency component of the noise is reduced, and it is unexpected that pre-emphasis has no influence on the noise, so that the output signal-to-noise ratio (SNR) is effectively improved.
In step S3, performing fourier transform on each frame in the sequence of audio frames to obtain a frequency spectrum of each frame of audio sequence, and performing a modulo square on the frequency spectrum of each frame of audio sequence to obtain an audio-washed power spectrum; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
In step S24, the data x (n) after sampling and normalizing the audio signal is subjected to frame windowing, and a window function w (n) with a certain length is multiplied by the audio signal x (n) to obtain a signal x after each frame is windowedi(n) commonly used window functions are hamming, hanning and rectangular windows; the formula is as follows:
xi(n)=w(n)*x(n)
hamming window:
Figure BDA0002994800910000071
hanning Window:
Figure BDA0002994800910000072
rectangular window:
Figure BDA0002994800910000073
in step S4, inputting the time domain information and the frequency domain information of the spectrogram information into a two-dimensional convolutional neural network, so as to obtain the time domain feature and the frequency domain feature of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
In step S5, inputting the voiceprint feature vector into the convolutional neural network model for training, and obtaining a voiceprint model for identifying a voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
In step S7, after the user type is determined, matching is performed according to the pre-trained interaction style training model and the response text library to determine the target interaction text, the target interaction text is converted into a response voice audio consistent with the voice response style according to the adjustment parameter corresponding to the current user type, and the response voice audio is played.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (7)

1. A control method for dynamically changing a voice response style is characterized by comprising the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
2. The control method according to claim 1, wherein in step S1, the age of the user is divided into three stages: children, young and old people, and children, young and old people are alive, humorous and traditional corresponding to the voice response style respectively.
3. The control method for dynamically changing the style of a voice response according to claim 1, wherein in step S2, the audio frame sequence obtaining step comprises:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
4. The method as claimed in claim 1, wherein in step S3, fourier transform is performed on each frame of the sequence of audio frames to obtain the frequency spectrum of each frame of audio sequence, and the square of the frequency spectrum of each frame of audio sequence is taken to obtain the power spectrum of audio washing; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
5. The method as claimed in claim 1, wherein in step S4, the time domain information and the frequency domain information of the spectrogram information are input into a two-dimensional convolutional neural network, so as to obtain the time domain features and the frequency domain features of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
6. The method as claimed in claim 1, wherein in step S5, the input of the voiceprint feature vector into the convolutional neural network model for training, and obtaining the voiceprint model for recognizing the voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
7. The method as claimed in claim 1, wherein in step S7, after determining the user type, matching is performed according to a pre-trained interaction style training model and a response text library to determine the target interactive text, and according to the adjustment parameter corresponding to the current user type, the target interactive text is converted into a response voice audio consistent with the voice response style, and the response voice audio is played.
CN202110326327.8A 2021-03-26 2021-03-26 Control method for dynamically changing voice response style Pending CN113035203A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110326327.8A CN113035203A (en) 2021-03-26 2021-03-26 Control method for dynamically changing voice response style

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110326327.8A CN113035203A (en) 2021-03-26 2021-03-26 Control method for dynamically changing voice response style

Publications (1)

Publication Number Publication Date
CN113035203A true CN113035203A (en) 2021-06-25

Family

ID=76474338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110326327.8A Pending CN113035203A (en) 2021-03-26 2021-03-26 Control method for dynamically changing voice response style

Country Status (1)

Country Link
CN (1) CN113035203A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707154A (en) * 2021-09-03 2021-11-26 上海瑾盛通信科技有限公司 Model training method and device, electronic equipment and readable storage medium
CN115101048A (en) * 2022-08-24 2022-09-23 深圳市人马互动科技有限公司 Science popularization information interaction method, device, system, interaction equipment and storage medium
CN117975971A (en) * 2024-04-02 2024-05-03 暨南大学 Voiceprint age group estimation method and system based on privacy protection

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
CN110336723A (en) * 2019-07-23 2019-10-15 珠海格力电器股份有限公司 Control method and device of intelligent household appliance and intelligent household appliance
CN110398897A (en) * 2018-04-25 2019-11-01 北京快乐智慧科技有限责任公司 A kind of Multi-mode switching method and system of intellectual product
CN110544472A (en) * 2019-09-29 2019-12-06 上海依图信息技术有限公司 Method for improving performance of voice task using CNN network structure
CN110648669A (en) * 2019-09-30 2020-01-03 上海依图信息技术有限公司 Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
WO2020006935A1 (en) * 2018-07-05 2020-01-09 平安科技(深圳)有限公司 Method and device for extracting animal voiceprint features and computer readable storage medium
CN111061953A (en) * 2019-12-18 2020-04-24 深圳市优必选科技股份有限公司 Intelligent terminal interaction method and device, terminal equipment and storage medium
CN111354364A (en) * 2020-04-23 2020-06-30 上海依图网络科技有限公司 Voiceprint recognition method and system based on RNN aggregation mode
CN111352348A (en) * 2018-12-24 2020-06-30 北京三星通信技术研究有限公司 Device control method, device, electronic device and computer-readable storage medium
CN111833884A (en) * 2020-05-27 2020-10-27 北京三快在线科技有限公司 Voiceprint feature extraction method and device, electronic equipment and storage medium
CN111951791A (en) * 2020-08-26 2020-11-17 上海依图网络科技有限公司 Voiceprint recognition model training method, recognition method, electronic device and storage medium
CN112331193A (en) * 2019-07-17 2021-02-05 华为技术有限公司 Voice interaction method and related device
CN112489636A (en) * 2020-10-15 2021-03-12 南京创维信息技术研究院有限公司 Intelligent voice broadcast assistant selection method and system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
CN110398897A (en) * 2018-04-25 2019-11-01 北京快乐智慧科技有限责任公司 A kind of Multi-mode switching method and system of intellectual product
WO2020006935A1 (en) * 2018-07-05 2020-01-09 平安科技(深圳)有限公司 Method and device for extracting animal voiceprint features and computer readable storage medium
CN111352348A (en) * 2018-12-24 2020-06-30 北京三星通信技术研究有限公司 Device control method, device, electronic device and computer-readable storage medium
CN112331193A (en) * 2019-07-17 2021-02-05 华为技术有限公司 Voice interaction method and related device
CN110336723A (en) * 2019-07-23 2019-10-15 珠海格力电器股份有限公司 Control method and device of intelligent household appliance and intelligent household appliance
CN110544472A (en) * 2019-09-29 2019-12-06 上海依图信息技术有限公司 Method for improving performance of voice task using CNN network structure
CN110648669A (en) * 2019-09-30 2020-01-03 上海依图信息技术有限公司 Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
CN111061953A (en) * 2019-12-18 2020-04-24 深圳市优必选科技股份有限公司 Intelligent terminal interaction method and device, terminal equipment and storage medium
CN111354364A (en) * 2020-04-23 2020-06-30 上海依图网络科技有限公司 Voiceprint recognition method and system based on RNN aggregation mode
CN111833884A (en) * 2020-05-27 2020-10-27 北京三快在线科技有限公司 Voiceprint feature extraction method and device, electronic equipment and storage medium
CN111951791A (en) * 2020-08-26 2020-11-17 上海依图网络科技有限公司 Voiceprint recognition model training method, recognition method, electronic device and storage medium
CN112489636A (en) * 2020-10-15 2021-03-12 南京创维信息技术研究院有限公司 Intelligent voice broadcast assistant selection method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707154A (en) * 2021-09-03 2021-11-26 上海瑾盛通信科技有限公司 Model training method and device, electronic equipment and readable storage medium
CN113707154B (en) * 2021-09-03 2023-11-10 上海瑾盛通信科技有限公司 Model training method, device, electronic equipment and readable storage medium
CN115101048A (en) * 2022-08-24 2022-09-23 深圳市人马互动科技有限公司 Science popularization information interaction method, device, system, interaction equipment and storage medium
CN115101048B (en) * 2022-08-24 2022-11-11 深圳市人马互动科技有限公司 Science popularization information interaction method, device, system, interaction equipment and storage medium
CN117975971A (en) * 2024-04-02 2024-05-03 暨南大学 Voiceprint age group estimation method and system based on privacy protection

Similar Documents

Publication Publication Date Title
CN113035203A (en) Control method for dynamically changing voice response style
CN103280220B (en) A kind of real-time recognition method for baby cry
CN109147763B (en) Audio and video keyword identification method and device based on neural network and inverse entropy weighting
CN109999314A (en) One kind is based on brain wave monitoring Intelligent sleep-assisting system and its sleep earphone
CN106847281A (en) Intelligent household voice control system and method based on voice fuzzy identification technology
CN102404278A (en) Song request system based on voiceprint recognition and application method thereof
CN111192598A (en) Voice enhancement method for jump connection deep neural network
CN110136709A (en) Audio recognition method and video conferencing system based on speech recognition
CN112382301B (en) Noise-containing voice gender identification method and system based on lightweight neural network
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN111583936A (en) Intelligent voice elevator control method and device
CN115602165B (en) Digital employee intelligent system based on financial system
WO2023184942A1 (en) Voice interaction method and apparatus and electric appliance
CN110867192A (en) Speech enhancement method based on gated cyclic coding and decoding network
CN112102846A (en) Audio processing method and device, electronic equipment and storage medium
CN111798875A (en) VAD implementation method based on three-value quantization compression
CN112151071A (en) Speech emotion recognition method based on mixed wavelet packet feature deep learning
CN113571078A (en) Noise suppression method, device, medium, and electronic apparatus
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN111081249A (en) Mode selection method, device and computer readable storage medium
WO2017177629A1 (en) Far-talking voice recognition method and device
CN107393539A (en) A kind of sound cipher control method
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN116434758A (en) Voiceprint recognition model training method and device, electronic equipment and storage medium
CN114023352B (en) Voice enhancement method and device based on energy spectrum depth modulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210625