CN113035203A - Control method for dynamically changing voice response style - Google Patents
Control method for dynamically changing voice response style Download PDFInfo
- Publication number
- CN113035203A CN113035203A CN202110326327.8A CN202110326327A CN113035203A CN 113035203 A CN113035203 A CN 113035203A CN 202110326327 A CN202110326327 A CN 202110326327A CN 113035203 A CN113035203 A CN 113035203A
- Authority
- CN
- China
- Prior art keywords
- audio
- voice
- voiceprint
- user
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004044 response Effects 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 21
- 239000013598 vector Substances 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000008859 change Effects 0.000 claims abstract description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 16
- 238000001228 spectrum Methods 0.000 claims description 14
- 230000003993 interaction Effects 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 230000002452 interceptive effect Effects 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005406 washing Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 235000015243 ice cream Nutrition 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a control method for dynamically changing a voice response style, and relates to the field of intelligent household appliances. The invention comprises the following steps: collecting voice data of different age groups and establishing a sample database; processing audio files in the sample database to obtain an audio frame sequence; performing Fourier change on each frame in the audio frame sequence to obtain spectrogram information feature extraction of the frame and obtain a voiceprint feature vector; training the voiceprint feature vectors to obtain a voiceprint feature model; the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection; and dynamically changing the voice response style according to the user type. According to the invention, the voice of the refrigerator user is collected, the age bracket of the user is judged according to the voiceprint of the user, and the voice response style corresponding to the age bracket is started, so that the intelligent equipment is more convenient for the old and children to use, and the intelligent degree of the equipment is improved.
Description
Technical Field
The invention belongs to the technical field of intelligent household appliances, and particularly relates to a control method for dynamically changing a voice response style.
Background
With the development of artificial intelligence, more and more devices can realize the function of voice interaction with users, for example, an intelligent robot can perform dialogue communication with users.
In the prior art, various devices can recognize the voice of a user through a voice recognition technology, determine the dialogue content with the user according to a pre-trained voice dialogue model, and finally play the audio of the dialogue content through a terminal, thereby completing the voice interaction with the user.
Along with the continuous development of equipment, intelligent device is constantly updated, and voice response content is more and more complicated, and to old man or children, it is big to understand the degree of difficulty, uses inconveniently, becomes the problem that needs to solve.
Disclosure of Invention
The invention aims to provide a control method for dynamically changing a voice response style, which solves the problems of large understanding difficulty, inconvenient use and insufficient interest of the existing intelligent equipment for old people and children by collecting the voice of a refrigerator user, judging the age group of the user according to the voiceprint of the user and starting the voice response style corresponding to the age group.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a control method for dynamically changing a voice response style, which comprises the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
Preferably, in the step S1, the age of the user is divided into three stages: children, young and old people, and children, young and old people are alive, humorous and traditional corresponding to the voice response style respectively.
Preferably, in step S2, the audio frame sequence obtaining step includes:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
Preferably, in step S3, fourier transform is performed on each frame in the sequence of audio frames to obtain a frequency spectrum of each frame of audio sequence, and a power spectrum of the audio wash is obtained by taking a square of a modulus of the frequency spectrum of each frame of audio sequence; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
Preferably, in step S4, the time domain information and the frequency domain information of the spectrogram information are input into a two-dimensional convolutional neural network, so as to obtain the time domain feature and the frequency domain feature of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
Preferably, in step S5, inputting the voiceprint feature vector into a convolutional neural network model for training, and obtaining a voiceprint model for identifying a voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
Preferably, in step S7, after the user type is determined, matching is performed according to a pre-trained interaction style training model and a response text library, a target interactive character is determined, the target interactive character is converted into a response voice audio consistent with a voice response style according to an adjustment parameter corresponding to the current user type, and the response voice audio is played.
The invention has the following beneficial effects:
processing an audio file in a sample database to obtain an audio sequence, performing Fourier transform processing on each frame of the audio sequence, extracting a voiceprint characteristic vector, inputting the voiceprint characteristic vector into a convolutional neural network model for training to obtain a voiceprint characteristic model; the voice data of the user is input into the voiceprint feature model, the age of the user is judged according to the voiceprint of the user, and the voice response style corresponding to the age is started, so that the old and children can use the intelligent device more conveniently, and the intelligent degree of the device is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a step diagram of a control method for dynamically changing the voice response style according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a control method for dynamically changing a voice response style, comprising the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
In step S1, the age of the user is divided into three stages: children, young adults and old adults are lively, humorous and traditional respectively corresponding to the voice response style; this document defines children under 14 years of age, young children between 14 and 60 years of age, and elderly people above 60 years of age; when a child starts the refrigerator and gives a voice instruction, the refrigerator can converse with the child through lively tone, and guides the child to use the refrigerator, so that the child can be guided to correctly operate and use the refrigerator while understanding is facilitated, and for example, a child needs to eat less ice cream and remember to close a refrigerator door with the help of the child; similarly, when the old people use the refrigerator, the refrigerator can communicate with the old people through steady and temperate words, and remind and care the old people, for example, "the meal just taken out of the refrigerator can not be eaten directly, and the old people are advised to eat again after being heated".
Voiceprints can extract physiological or behavioral aspects of a speaker from speech waveforms, and then feature matching. To implement voiceprint recognition, a speaker first needs to input multiple age-based voice samples into the system and extract personal features using voiceprint feature extraction techniques. The data are finally put into a database through a voiceprint modeling technology, the identification objects are models stored in the database and voiceprint characteristics needing to be verified, and finally the age bracket corresponding to the speaker is identified.
In step S2, the audio frame sequence obtaining step includes:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
The fundamental frequency of the voice is about 100Hz for men and about 200Hz for women, the conversion period is 10ms and 5ms, the audio frame contains a plurality of periods, generally at least 20ms, and the gender of the speaker can be judged through the audio frame.
In step S23, pre-emphasis is performed to increase the high frequency component of the audio signal so that the audio signal becomes relatively flat from low frequency to high frequency; using a high-pass filter to boost the high-frequency component, the filter having a response characteristic such as
H(z)=1-uz-1
Wherein, the value range of the coefficient u is [0.9, 1], and u is a pre-emphasis coefficient;
pre-emphasis (Pre-emphasis) is a method of compensating for high frequency components of a transmission signal in advance at a transmitting end. Pre-emphasis is performed because the signal energy distribution is not uniform, and the signal-to-noise ratio (SNR) at the high frequency end of the speech signal may drop to the threshold range. The power spectrum of the voice signal is in inverse proportion to the frequency, the energy of the low-frequency region is high, the energy of the high-frequency region is low, and the reason of uneven distribution is considered, so that the signal amplitude generating the maximum frequency deviation can be speculatively judged to be mostly in the low frequency. And the noise power spectrum is pre-emphasized by changing the expression mode. This is an undesirable result for both people and therefore counter-balancing pre-emphasis and de-emphasis occurs. The pre-emphasis is to improve the high-frequency signal, remove the influence of glottis and lips, and facilitate the research on the influence of sound channels. However, in order to restore the original signal power distribution as much as possible, it is necessary to perform a reverse process, that is, a de-emphasis technique for de-emphasizing a high-frequency signal. In the process of the step, the high-frequency component of the noise is reduced, and it is unexpected that pre-emphasis has no influence on the noise, so that the output signal-to-noise ratio (SNR) is effectively improved.
In step S3, performing fourier transform on each frame in the sequence of audio frames to obtain a frequency spectrum of each frame of audio sequence, and performing a modulo square on the frequency spectrum of each frame of audio sequence to obtain an audio-washed power spectrum; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
In step S24, the data x (n) after sampling and normalizing the audio signal is subjected to frame windowing, and a window function w (n) with a certain length is multiplied by the audio signal x (n) to obtain a signal x after each frame is windowedi(n) commonly used window functions are hamming, hanning and rectangular windows; the formula is as follows:
xi(n)=w(n)*x(n)
hamming window:
hanning Window:
rectangular window:
in step S4, inputting the time domain information and the frequency domain information of the spectrogram information into a two-dimensional convolutional neural network, so as to obtain the time domain feature and the frequency domain feature of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
In step S5, inputting the voiceprint feature vector into the convolutional neural network model for training, and obtaining a voiceprint model for identifying a voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
In step S7, after the user type is determined, matching is performed according to the pre-trained interaction style training model and the response text library to determine the target interaction text, the target interaction text is converted into a response voice audio consistent with the voice response style according to the adjustment parameter corresponding to the current user type, and the response voice audio is played.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (7)
1. A control method for dynamically changing a voice response style is characterized by comprising the following steps:
step S1: collecting voice data of different age groups and establishing a sample database;
step S2: processing audio files in the sample database to obtain an audio frame sequence;
step S3: fourier transformation is carried out on each frame in the audio frame sequence to obtain spectrogram information of the frame;
step S4: extracting the features of the spectrogram information to obtain a voiceprint feature vector;
step S5: inputting the voiceprint feature vector into a convolutional neural network model for training to obtain a voiceprint feature model;
step S6: the MIC detects the user voice and inputs the voice to a voice print characteristic model after voice print collection;
step S7: the intelligent refrigerator determines the type of the user according to the voiceprint recognition result;
step S8: and dynamically changing the voice response style according to the user type.
2. The control method according to claim 1, wherein in step S1, the age of the user is divided into three stages: children, young and old people, and children, young and old people are alive, humorous and traditional corresponding to the voice response style respectively.
3. The control method for dynamically changing the style of a voice response according to claim 1, wherein in step S2, the audio frame sequence obtaining step comprises:
step S21: sampling and quantizing the audio file;
step S22: converting the audio frequency digital signal into an audio frequency digital signal with fixed bit number according to a fixed sampling frequency;
step S23: pre-emphasis processing is carried out on the audio digital signal;
step S24: performing framing and windowing processing on a voice signal;
step S25: and obtaining a speech frame sequence.
4. The method as claimed in claim 1, wherein in step S3, fourier transform is performed on each frame of the sequence of audio frames to obtain the frequency spectrum of each frame of audio sequence, and the square of the frequency spectrum of each frame of audio sequence is taken to obtain the power spectrum of audio washing; filtering the power spectrum of the audio sequence through a preset filter to obtain the logarithmic energy of the audio sequence; and carrying out discrete cosine change on the logarithmic energy of the audio sequence to obtain the characteristic vector of the audio.
5. The method as claimed in claim 1, wherein in step S4, the time domain information and the frequency domain information of the spectrogram information are input into a two-dimensional convolutional neural network, so as to obtain the time domain features and the frequency domain features of the sound data; and after the time domain characteristics and the frequency domain characteristics of the sound data are subjected to characteristic aggregation, inputting the aggregated characteristics into a full connection layer to obtain a voiceprint characteristic vector.
6. The method as claimed in claim 1, wherein in step S5, the input of the voiceprint feature vector into the convolutional neural network model for training, and obtaining the voiceprint model for recognizing the voiceprint includes:
extracting local voiceprint information of the voiceprint characteristic vector through a convolution layer of the convolution neural network model;
connecting the extracted local voiceprint information through a full connection layer of the convolutional neural network model to obtain multi-dimensional local voiceprint information;
and performing dimensionality reduction processing on the multi-dimensional local voiceprint information through a pooling layer of the convolutional neural network model to obtain a voiceprint characteristic model.
7. The method as claimed in claim 1, wherein in step S7, after determining the user type, matching is performed according to a pre-trained interaction style training model and a response text library to determine the target interactive text, and according to the adjustment parameter corresponding to the current user type, the target interactive text is converted into a response voice audio consistent with the voice response style, and the response voice audio is played.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110326327.8A CN113035203A (en) | 2021-03-26 | 2021-03-26 | Control method for dynamically changing voice response style |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110326327.8A CN113035203A (en) | 2021-03-26 | 2021-03-26 | Control method for dynamically changing voice response style |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113035203A true CN113035203A (en) | 2021-06-25 |
Family
ID=76474338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110326327.8A Pending CN113035203A (en) | 2021-03-26 | 2021-03-26 | Control method for dynamically changing voice response style |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113035203A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113707154A (en) * | 2021-09-03 | 2021-11-26 | 上海瑾盛通信科技有限公司 | Model training method and device, electronic equipment and readable storage medium |
CN115101048A (en) * | 2022-08-24 | 2022-09-23 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
CN117975971A (en) * | 2024-04-02 | 2024-05-03 | 暨南大学 | Voiceprint age group estimation method and system based on privacy protection |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
CN110336723A (en) * | 2019-07-23 | 2019-10-15 | 珠海格力电器股份有限公司 | Control method and device of intelligent household appliance and intelligent household appliance |
CN110398897A (en) * | 2018-04-25 | 2019-11-01 | 北京快乐智慧科技有限责任公司 | A kind of Multi-mode switching method and system of intellectual product |
CN110544472A (en) * | 2019-09-29 | 2019-12-06 | 上海依图信息技术有限公司 | Method for improving performance of voice task using CNN network structure |
CN110648669A (en) * | 2019-09-30 | 2020-01-03 | 上海依图信息技术有限公司 | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium |
WO2020006935A1 (en) * | 2018-07-05 | 2020-01-09 | 平安科技(深圳)有限公司 | Method and device for extracting animal voiceprint features and computer readable storage medium |
CN111061953A (en) * | 2019-12-18 | 2020-04-24 | 深圳市优必选科技股份有限公司 | Intelligent terminal interaction method and device, terminal equipment and storage medium |
CN111354364A (en) * | 2020-04-23 | 2020-06-30 | 上海依图网络科技有限公司 | Voiceprint recognition method and system based on RNN aggregation mode |
CN111352348A (en) * | 2018-12-24 | 2020-06-30 | 北京三星通信技术研究有限公司 | Device control method, device, electronic device and computer-readable storage medium |
CN111833884A (en) * | 2020-05-27 | 2020-10-27 | 北京三快在线科技有限公司 | Voiceprint feature extraction method and device, electronic equipment and storage medium |
CN111951791A (en) * | 2020-08-26 | 2020-11-17 | 上海依图网络科技有限公司 | Voiceprint recognition model training method, recognition method, electronic device and storage medium |
CN112331193A (en) * | 2019-07-17 | 2021-02-05 | 华为技术有限公司 | Voice interaction method and related device |
CN112489636A (en) * | 2020-10-15 | 2021-03-12 | 南京创维信息技术研究院有限公司 | Intelligent voice broadcast assistant selection method and system |
-
2021
- 2021-03-26 CN CN202110326327.8A patent/CN113035203A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
CN110398897A (en) * | 2018-04-25 | 2019-11-01 | 北京快乐智慧科技有限责任公司 | A kind of Multi-mode switching method and system of intellectual product |
WO2020006935A1 (en) * | 2018-07-05 | 2020-01-09 | 平安科技(深圳)有限公司 | Method and device for extracting animal voiceprint features and computer readable storage medium |
CN111352348A (en) * | 2018-12-24 | 2020-06-30 | 北京三星通信技术研究有限公司 | Device control method, device, electronic device and computer-readable storage medium |
CN112331193A (en) * | 2019-07-17 | 2021-02-05 | 华为技术有限公司 | Voice interaction method and related device |
CN110336723A (en) * | 2019-07-23 | 2019-10-15 | 珠海格力电器股份有限公司 | Control method and device of intelligent household appliance and intelligent household appliance |
CN110544472A (en) * | 2019-09-29 | 2019-12-06 | 上海依图信息技术有限公司 | Method for improving performance of voice task using CNN network structure |
CN110648669A (en) * | 2019-09-30 | 2020-01-03 | 上海依图信息技术有限公司 | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium |
CN111061953A (en) * | 2019-12-18 | 2020-04-24 | 深圳市优必选科技股份有限公司 | Intelligent terminal interaction method and device, terminal equipment and storage medium |
CN111354364A (en) * | 2020-04-23 | 2020-06-30 | 上海依图网络科技有限公司 | Voiceprint recognition method and system based on RNN aggregation mode |
CN111833884A (en) * | 2020-05-27 | 2020-10-27 | 北京三快在线科技有限公司 | Voiceprint feature extraction method and device, electronic equipment and storage medium |
CN111951791A (en) * | 2020-08-26 | 2020-11-17 | 上海依图网络科技有限公司 | Voiceprint recognition model training method, recognition method, electronic device and storage medium |
CN112489636A (en) * | 2020-10-15 | 2021-03-12 | 南京创维信息技术研究院有限公司 | Intelligent voice broadcast assistant selection method and system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113707154A (en) * | 2021-09-03 | 2021-11-26 | 上海瑾盛通信科技有限公司 | Model training method and device, electronic equipment and readable storage medium |
CN113707154B (en) * | 2021-09-03 | 2023-11-10 | 上海瑾盛通信科技有限公司 | Model training method, device, electronic equipment and readable storage medium |
CN115101048A (en) * | 2022-08-24 | 2022-09-23 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
CN115101048B (en) * | 2022-08-24 | 2022-11-11 | 深圳市人马互动科技有限公司 | Science popularization information interaction method, device, system, interaction equipment and storage medium |
CN117975971A (en) * | 2024-04-02 | 2024-05-03 | 暨南大学 | Voiceprint age group estimation method and system based on privacy protection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113035203A (en) | Control method for dynamically changing voice response style | |
CN103280220B (en) | A kind of real-time recognition method for baby cry | |
CN109147763B (en) | Audio and video keyword identification method and device based on neural network and inverse entropy weighting | |
CN109999314A (en) | One kind is based on brain wave monitoring Intelligent sleep-assisting system and its sleep earphone | |
CN106847281A (en) | Intelligent household voice control system and method based on voice fuzzy identification technology | |
CN102404278A (en) | Song request system based on voiceprint recognition and application method thereof | |
CN111192598A (en) | Voice enhancement method for jump connection deep neural network | |
CN110136709A (en) | Audio recognition method and video conferencing system based on speech recognition | |
CN112382301B (en) | Noise-containing voice gender identification method and system based on lightweight neural network | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
CN111583936A (en) | Intelligent voice elevator control method and device | |
CN115602165B (en) | Digital employee intelligent system based on financial system | |
WO2023184942A1 (en) | Voice interaction method and apparatus and electric appliance | |
CN110867192A (en) | Speech enhancement method based on gated cyclic coding and decoding network | |
CN112102846A (en) | Audio processing method and device, electronic equipment and storage medium | |
CN111798875A (en) | VAD implementation method based on three-value quantization compression | |
CN112151071A (en) | Speech emotion recognition method based on mixed wavelet packet feature deep learning | |
CN113571078A (en) | Noise suppression method, device, medium, and electronic apparatus | |
CN112017658A (en) | Operation control system based on intelligent human-computer interaction | |
CN111081249A (en) | Mode selection method, device and computer readable storage medium | |
WO2017177629A1 (en) | Far-talking voice recognition method and device | |
CN107393539A (en) | A kind of sound cipher control method | |
US20230186943A1 (en) | Voice activity detection method and apparatus, and storage medium | |
CN116434758A (en) | Voiceprint recognition model training method and device, electronic equipment and storage medium | |
CN114023352B (en) | Voice enhancement method and device based on energy spectrum depth modulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210625 |