CN117095674B - Interactive control method and system for intelligent doors and windows - Google Patents

Interactive control method and system for intelligent doors and windows Download PDF

Info

Publication number
CN117095674B
CN117095674B CN202311086549.2A CN202311086549A CN117095674B CN 117095674 B CN117095674 B CN 117095674B CN 202311086549 A CN202311086549 A CN 202311086549A CN 117095674 B CN117095674 B CN 117095674B
Authority
CN
China
Prior art keywords
door
window control
control instruction
voice
global context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311086549.2A
Other languages
Chinese (zh)
Other versions
CN117095674A (en
Inventor
梁晓东
胡新尧
张俊峰
梁恒
林狄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Fulinmen Shijia Smart Home Co ltd
Original Assignee
Guangdong Fulinmen Shijia Smart Home Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Fulinmen Shijia Smart Home Co ltd filed Critical Guangdong Fulinmen Shijia Smart Home Co ltd
Priority to CN202311086549.2A priority Critical patent/CN117095674B/en
Publication of CN117095674A publication Critical patent/CN117095674A/en
Application granted granted Critical
Publication of CN117095674B publication Critical patent/CN117095674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • EFIXED CONSTRUCTIONS
    • E05LOCKS; KEYS; WINDOW OR DOOR FITTINGS; SAFES
    • E05FDEVICES FOR MOVING WINGS INTO OPEN OR CLOSED POSITION; CHECKS FOR WINGS; WING FITTINGS NOT OTHERWISE PROVIDED FOR, CONCERNED WITH THE FUNCTIONING OF THE WING
    • E05F15/00Power-operated mechanisms for wings
    • E05F15/70Power-operated mechanisms for wings with automatic actuation
    • E05F15/77Power-operated mechanisms for wings with automatic actuation using wireless control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • EFIXED CONSTRUCTIONS
    • E05LOCKS; KEYS; WINDOW OR DOOR FITTINGS; SAFES
    • E05YINDEXING SCHEME ASSOCIATED WITH SUBCLASSES E05D AND E05F, RELATING TO CONSTRUCTION ELEMENTS, ELECTRIC CONTROL, POWER SUPPLY, POWER SIGNAL OR TRANSMISSION, USER INTERFACES, MOUNTING OR COUPLING, DETAILS, ACCESSORIES, AUXILIARY OPERATIONS NOT OTHERWISE PROVIDED FOR, APPLICATION THEREOF
    • E05Y2900/00Application of doors, windows, wings or fittings thereof
    • E05Y2900/10Application of doors, windows, wings or fittings thereof for buildings or parts thereof
    • E05Y2900/13Type of wing
    • E05Y2900/132Doors
    • EFIXED CONSTRUCTIONS
    • E05LOCKS; KEYS; WINDOW OR DOOR FITTINGS; SAFES
    • E05YINDEXING SCHEME ASSOCIATED WITH SUBCLASSES E05D AND E05F, RELATING TO CONSTRUCTION ELEMENTS, ELECTRIC CONTROL, POWER SUPPLY, POWER SIGNAL OR TRANSMISSION, USER INTERFACES, MOUNTING OR COUPLING, DETAILS, ACCESSORIES, AUXILIARY OPERATIONS NOT OTHERWISE PROVIDED FOR, APPLICATION THEREOF
    • E05Y2900/00Application of doors, windows, wings or fittings thereof
    • E05Y2900/10Application of doors, windows, wings or fittings thereof for buildings or parts thereof
    • E05Y2900/13Type of wing
    • E05Y2900/148Windows
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an interactive control method and system of intelligent doors and windows, which are used for preprocessing and recognizing the voice input of the door and window control instruction of a user by adopting an artificial intelligent technology and a voice recognition technology based on deep learning so as to intelligently generate the door and window control instruction.

Description

Interactive control method and system for intelligent doors and windows
Technical Field
The application relates to the field of intelligent control, and more particularly, to an interactive control method and system for intelligent doors and windows.
Background
The reason for interactive control of doors and windows is to improve convenience and comfort of users. Conventional door and window control schemes typically require manual operation, such as using a switch or remote control, which may require the user to leave a comfortable location or a large distance from the door and window.
Therefore, an optimized interactive control scheme for doors and windows is desired.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides an interactive control method and system for intelligent doors and windows, which are used for realizing intelligent generation of door and window control instructions by preprocessing and recognizing the voice input of the door and window control instructions of a user by adopting an artificial intelligence technology and a voice recognition technology based on deep learning.
According to one aspect of the present application, there is provided an interactive control method for an intelligent door and window, including:
acquiring a door and window control instruction voice input provided by a user;
carrying out noise reduction treatment on the door and window control instruction voice input to obtain a door and window control instruction voice input after noise reduction; and
and carrying out voice recognition on the voice input of the door and window control instruction after noise reduction to generate a door and window control instruction.
According to another aspect of the present application, there is provided an interactive control system for a smart door and window, including:
the control instruction acquisition module is used for acquiring door and window control instruction voice input provided by a user;
the noise reduction module is used for carrying out noise reduction processing on the door and window control instruction voice input so as to obtain the door and window control instruction voice input after noise reduction; and
and the control instruction generation module is used for carrying out voice recognition on the noise-reduced door and window control instruction voice input so as to generate a door and window control instruction.
Compared with the prior art, the interactive control method and the interactive control system for the intelligent door and window, which are provided by the application, perform preprocessing and recognition on the door and window control instruction voice input of the user by adopting the artificial intelligence technology and the voice recognition technology based on deep learning, so that the door and window control instruction is intelligently generated.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flowchart of an interactive control method of a smart door and window according to an embodiment of the present application;
fig. 2 is a system architecture diagram of an interactive control method of a smart door and window according to an embodiment of the present application;
fig. 3 is a flowchart of substep S2 of the interactive control method of the smart door and window according to an embodiment of the present application;
fig. 4 is a flowchart of substep S21 of the interactive control method of the smart door and window according to the embodiment of the present application;
fig. 5 is a flowchart of substep S22 of the interactive control method of the smart door and window according to an embodiment of the present application;
fig. 6 is a flowchart of substep S23 of the interactive control method of the smart door and window according to an embodiment of the present application;
fig. 7 is a block diagram of an interactive control system for a smart door and window according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
Flowcharts are used in this application to describe the operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
The reason for interactive control of doors and windows is to improve convenience and comfort of users. Conventional door and window control schemes typically require manual operation, such as using a switch or remote control, which may require the user to leave a comfortable location or a large distance from the door and window. Therefore, an optimized interactive control scheme for doors and windows is desired.
In the technical scheme of the application, an interactive control method for intelligent doors and windows is provided. Fig. 1 is a flowchart of an interactive control method of a smart door and window according to an embodiment of the present application. Fig. 2 is a system architecture diagram of an interactive control method for a smart door and window according to an embodiment of the present application. As shown in fig. 1 and fig. 2, the interactive control method for the smart door and window according to the embodiment of the application includes the steps of: s1, acquiring door and window control instruction voice input provided by a user; s2, carrying out noise reduction treatment on the door and window control instruction voice input to obtain the door and window control instruction voice input after noise reduction; and S3, performing voice recognition on the noise-reduced door and window control instruction voice input to generate a door and window control instruction.
Specifically, in step S1, a door and window control instruction voice input provided by a user is acquired.
Specifically, in step S2, the door and window control command speech input is subjected to noise reduction processing to obtain a door and window control command speech input after noise reduction. In particular, in one specific example of the present application, as shown in fig. 3, the S2 includes: s21, carrying out data preprocessing on the door and window control instruction voice input to obtain a plurality of source domain enhanced door and window control instruction voice fragments; s22, feature extraction and sequence coding are carried out on the source domain enhanced door and window control instruction voice fragments so as to obtain global context door and window control instruction voice feature vectors; and S23, generating the door and window control command voice input after noise reduction based on the global context door and window control command voice feature vector.
Specifically, the step S21 is to perform data preprocessing on the input of the door and window control command voice to obtain a plurality of source domain enhanced door and window control command voice segments. It should be appreciated that with speech recognition technology, a user's voice input may provide door and window control commands, and the system then converts the voice input to text commands via the speech recognition technology, and then controls the movement of the door and window according to the text commands. However, the complexity of speech input is affected by user habits and expressions. In practice, the user may use different languages, dialects, accents, speech speeds, intonation, etc. to provide the door and window control instructions, such as different expressions of "open window", "open window a bit", "open window half", etc. These expressions increase the difficulty of speech recognition, resulting in reduced speech recognition efficiency, thereby affecting the generation and execution of door and window control commands. Therefore, in the technical scheme of the application, the door and window control command voice input is subjected to data preprocessing to obtain a plurality of source domain enhanced door and window control command voice fragments. In particular, in one specific example of the present application, as shown in fig. 4, the S21 includes: s211, segmenting the door and window control instruction voice input to obtain a plurality of door and window control instruction voice fragments; and S212, respectively up-sampling the door and window control command voice fragments to obtain the source domain enhanced door and window control command voice fragments.
More specifically, in S211, the door and window control command speech input is segmented to obtain a plurality of door and window control command speech segments. It should be appreciated that the function of splitting the door and window control command speech input is to split the entire speech sample into a plurality of segments, each segment corresponding to a particular door and window control command. The purpose of this is to facilitate the subsequent processing steps of data enhancement, data balancing, and data annotation, as well as the input processing when training and evaluating the model.
Accordingly, in one possible implementation manner, the door and window control command speech input may be segmented to obtain a plurality of door and window control command speech segments, for example: the door and window control command voice input is audio aligned to ensure that the start and end times of the command are consistent. Audio alignment may be implemented using speech processing tools or algorithms; a voice recognition technique or instruction detection algorithm is used to detect the position of the door and window control instructions in the audio. This may be accomplished by identifying a particular sound pattern or voice feature of the instruction; and according to the instruction detection result, cutting the audio sample into a plurality of fragments, wherein each fragment corresponds to a door and window control instruction. The segmentation can be performed according to the starting and ending time points of the instruction, so that each segment is ensured to contain a complete instruction; and carrying out necessary pretreatment on each door and window control instruction voice segment. This may include removing mute portions, noise reduction processing, audio enhancement, etc., to improve the recognition accuracy and quality of the instructions; the correct tags are added to each door and window control command speech segment for subsequent training and model evaluation. A tag may represent a specific instruction type or action.
More specifically, the step S212 is to up-sample the respective door and window control command speech segments to obtain the plurality of source domain enhanced door and window control command speech segments. It should be appreciated that upsampling the gate-window control command speech segments serves to increase the number and diversity of samples to improve the training effect and robustness of the model.
Notably, upsampling is a data processing technique that increases the number of samples in a data set. For audio data or speech data, upsampling generally refers to increasing the number of audio samples. This may be accomplished by different methods including, but not limited to, the following: repeat replication: existing samples are simply replicated to generate a plurality of identical samples. This method is straightforward, but may result in samples in the training data that are too similar, lacking in diversity; interpolation: a new sample is generated between existing samples using an interpolation algorithm. Interpolation may be linear interpolation based on time or interpolation based on features. This allows new samples to be generated that are similar to, but slightly different from, the original samples; generating a model: a new sample is generated using a generation model (e.g., generating an antagonism network). The generative model may learn the distribution of the data and then generate new samples that are similar to, but not exactly the same as, the original samples. This approach can increase the diversity of the data.
Accordingly, in one possible implementation manner, the respective door and window control command speech segments may be up-sampled to obtain the plurality of source-domain enhanced door and window control command speech segments, for example, by: a speech segment data set containing different door and window control instructions is collected. These speech segments should come from different speakers, speech rates, pitch, and environmental conditions to increase the diversity of the data; the collected speech segments are preprocessed, including noise removal, normalized audio quality, volume balance, and the like. This may enhance the effect of subsequent processing steps; the speech segments are processed using various data enhancement techniques to generate additional samples. Common data enhancement techniques include speed perturbation (e.g., accelerating or decelerating speech), pitch perturbation (e.g., increasing or decreasing pitch), time stretching and compression, and the like. These techniques can increase the diversity of data; the class distribution between the generated enhanced samples and the original samples is checked to ensure a sufficient number of samples per door and window control command. If the number of samples of certain instructions is small, the data set may be balanced by copying or generating more samples; labeling the generated enhanced samples to ensure that each sample has a correct door and window control instruction label.
It should be noted that, in other specific examples of the present application, the door and window control command voice input may be further subjected to data preprocessing in other manners to obtain a plurality of source domain enhanced door and window control command voice segments, for example: voice sample data including door and window control instructions is collected. These samples may come from different speakers, environments, and recording devices to obtain diverse source domain data; audio pretreatment: audio alignment: performing audio alignment on the collected voice samples to ensure that the starting time and the ending time of the door and window control instruction are consistent; audio segmentation: dividing an audio sample into a plurality of fragments according to the starting time and the ending time of the door and window control instruction, wherein each fragment corresponds to one instruction; data enhancement: noise injection: selecting different types of noise from the noise library, and mixing the noise with each voice segment to simulate different environmental noise; speed of sound variation: the speed of the voice is increased or reduced by changing the playing speed of the voice fragment so as to simulate different speaking speeds; tone variation: by changing the tone of the speech segment, the pitch of the speech is increased or decreased to simulate the tone characteristics of different speakers; reverberation effect: adding a proper amount of reverberation effect on the voice segment to simulate different recording environments; ensuring that the number of samples of each door and window control instruction in the generated enhanced data set is relatively balanced, and avoiding that certain instructions are too concentrated or too rare; a correct door and window control command label is added to each enhanced speech segment for subsequent training and model evaluation.
Specifically, in S22, feature extraction and sequence encoding are performed on the plurality of source domain enhanced door and window control command speech segments to obtain a global context door and window control command speech feature vector. That is, the voice segments of the control instructions of the enhanced doors and windows of each source domain are independently extracted in characteristics so as to capture the local voice characteristics contained in the voice segments; and extracting global context association relations among the local voice features to enhance feature interaction and communication among the local voice features. In particular, in one specific example of the present application, as shown in fig. 5, the S22 includes: s221, enabling the plurality of source domain enhanced door and window control instruction voice fragments to pass through a voice feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control instruction voice feature vectors; and S222, enabling the plurality of door and window control instruction voice feature vectors to pass through a converter-based intermediate feature sequence encoder to obtain the global context door and window control instruction voice feature vector.
More specifically, the step S221 is to pass the plurality of source domain enhanced door and window control command speech segments through a speech feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control command speech feature vectors. It should be appreciated that a one-dimensional convolution layer may slide the convolution kernel over the input speech segment, capturing local speech features. These local features may include spectral morphology of the audio, formants of the speech, and so forth. Through convolution operation, the voice feature extractor can automatically learn and extract local voice features related to the door and window control instructions. Specifically, the S221 includes: each layer of the voice characteristic extractor based on the one-dimensional convolution layer is used for respectively carrying out input data in forward transfer of the layer: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature images based on a feature matrix to obtain pooled feature images; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the voice feature extractor based on the one-dimensional convolution layer is the voice feature vectors of the door and window control instructions, and the input of the first layer of the voice feature extractor based on the one-dimensional convolution layer is the voice segments of the source domain enhanced door and window control instructions.
Notably, one-dimensional convolutional layers are a common layer type in convolutional neural networks for processing one-dimensional sequence data. The method can effectively extract local features in the sequence data, and filter and map the features on the input through convolution operation. The structure of the one-dimensional convolution layer is as follows: input: the input of the one-dimensional convolution layer is a one-dimensional sequence, such as a word sequence in time sequence data or text data; convolution kernel: the one-dimensional convolution layer contains a plurality of convolution kernels, each of which is a small matrix of learnable parameters. The size of the convolution kernel is typically a window size that slides in the time dimension for capturing local features of the input sequence; convolution operation: for each convolution kernel, a one-dimensional convolution layer convolves it with the input sequence. The convolution operation is performed by sliding a convolution kernel over the input sequence and calculating a dot product of the convolution kernel and the input sequence. This process can be seen as filtering the input sequence, highlighting local features in the sequence; activation function: an activation function is applied to the result of the convolution operation to introduce nonlinearities. Common activation functions include ReLU, sigmoid, tanh, etc.; pooling layer: after the one-dimensional convolution layer, a pooling layer may be added to reduce the dimension of the feature map. Common pooling operations include maximum pooling and average pooling, which can extract the most significant portion of a feature or downsample a feature. The one-dimensional convolution layer can extract local features at different positions through sliding operation of a plurality of convolution kernels on an input sequence, and processes and compresses the features through an activation function and a pooling operation. The one-dimensional convolution layer is very effective in the aspects of feature extraction and pattern recognition of sequence data, and is often applied to the fields of voice recognition, text classification, time sequence analysis and the like.
More specifically, the step S222 is to pass the plurality of door and window control command speech feature vectors through a converter-based intermediate feature sequence encoder to obtain the global context door and window control command speech feature vector. It should be understood that, through the converter-based intermediate feature sequence encoder, the context modeling, long-distance dependency modeling, feature interaction and integration can be performed on the plurality of door and window control command voice feature vectors, and position codes are added, so that the context association relationship among the plurality of feature vectors is integrated, and the door and window control command voice feature vectors with global context are obtained. Specifically, the door and window control command voice feature vectors are arranged in one dimension to obtain global door and window control command voice feature vectors; calculating the product between the global door and window control command voice feature vector and the transpose vector of each door and window control command voice feature vector in the plurality of door and window control command voice feature vectors to obtain a plurality of self-attention association matrixes; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each door and window control instruction voice feature vector in the plurality of door and window control instruction voice feature vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic door and window control instruction voice feature vectors; and cascading the context semantic window control instruction voice feature vectors to obtain the global context window control instruction voice feature vector.
Notably, the intermediate feature sequence encoder uses a self-attention mechanism to interact and integrate feature vectors in the input sequence. The self-attention mechanism allows each feature vector to interact with other feature vectors and weight them according to their relative importance. Thus, the model can capture the dependency of the global context and perform feature integration according to the relationship between features. In addition to the self-attention mechanism, the intermediate signature sequence encoder also includes a feed-forward neural network layer. The feed forward neural network layer is typically a fully connected layer or layers for non-linear transformation and mapping of the feature vectors for each location. This helps to enhance the representation capability of the features and the non-linear modeling capability of the model.
It should be noted that, in other specific examples of the present application, the feature extraction and the sequence encoding may be performed on the plurality of source domain enhanced door and window control command speech segments in other manners to obtain a global context door and window control command speech feature vector, for example: collecting enhanced door and window control instruction voice fragments of a plurality of source domains as training data; preprocessing the collected voice fragments, including removing noise, reducing noise, adjusting audio gain and the like, so as to improve voice quality and reduce interference; and performing data enhancement operation on the preprocessed voice fragments to expand the diversity of training data. Common data enhancement methods include speed disturbance, voice tonal modification, noise addition, etc.; the sample quantity balance of each category is ensured, and the problem of unbalanced category during model training is avoided. The data set may be balanced by over-sampling, under-sampling, or generating a composite sample, etc.; labeling each voice segment with a corresponding door and window control instruction label so as to train and evaluate the model later; and extracting the characteristics of the enhanced door and window control instruction voice fragments. Common feature extraction methods include short-time fourier transforms, and mel-frequency cepstral coefficients. These features may capture spectral information and acoustic properties of the speech segments; the signature sequence is encoded using a recurrent neural network or a variant thereof (e.g., long and short term memory network, LSTM). The RNN can establish a context dependency relationship in the sequence data and capture global context information of door and window control instruction voice; and processing the sequence coding result through pooling operation to obtain the global context door and window control instruction voice feature vector with fixed dimension. This feature vector will incorporate information of the enhanced samples for training the model or other related tasks.
Specifically, the step S23 is to generate the noise-reduced door and window control command voice input based on the global context door and window control command voice feature vector. In particular, in one specific example of the present application, as shown in fig. 6, the S23 includes: s231, performing feature distribution optimization on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector; and S232, the optimized global context door and window control command voice feature vector is input through a voice generator based on a countermeasure generation network to obtain the noise-reduced door and window control command voice.
More specifically, in S231, feature distribution optimization is performed on the global context door and window control command speech feature vector to obtain an optimized global context door and window control command speech feature vector. Particularly, in the technical scheme of the application, when the plurality of source domain enhanced door and window control command voice fragments are obtained through a voice feature extractor based on a one-dimensional convolution layer, each door and window control command voice feature vector can express local amplitude correlation features in the time sequence direction of the corresponding door and window control command voice fragment, so that the obtained global context door and window control command voice feature vectors can further express global context correlation of the local time sequence amplitude correlation features after the plurality of door and window control command voice feature vectors are passed through an intermediate feature sequence encoder based on a converter. In this way, the amplitude correlation feature with respect to the local timing is taken as the foreground object feature, and when the global timing context correlation is performed, the background distribution noise related to the feature distribution interference at each local timing is also introduced, and the global context window control instruction speech feature vector also has the time-space hierarchical semantic expression at the local timing and the global timing, thereby, it is desirable to enhance the expression effect thereof based on the distribution characteristics of the global context window control instruction speech feature vector. Therefore, the applicant of the present application performs a distribution gain based on a probability density feature simulation paradigm on the global context window control command voice feature vector, specifically expressed as:
wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v i ' represents the optimized global context window control instruction speech feature vector. Here, the feature-mimicking paradigm of the standard cauchy distribution on the probability density for the natural gaussian distribution, the distribution gain based on the probability density feature-mimicking paradigm may distinguish between foreground object features and background distribution noise in a high-dimensional feature space with feature scale as a mimicking mask, so as to perform semantic-aware distribution soft matching of feature-space mapping on the high-dimensional space based on the spatially-hierarchical semantics of the high-dimensional featuresThe unconstrained distribution gain of the high-dimensional feature distribution is obtained, the expression effect of the global context door and window control instruction voice feature vector based on the feature distribution characteristic is improved, and the voice quality of the door and window control instruction voice input obtained by the global context door and window control instruction voice feature vector through the voice generator based on the countermeasure generation network is improved, so that the control quality of the door and window control instruction generated by carrying out semantic recognition on the door and window control instruction voice input after noise reduction is improved.
More specifically, S232, the optimized global context door and window control command voice feature vector is passed through a voice generator based on a countermeasure generation network to obtain the noise-reduced door and window control command voice input. It should be appreciated that for door and window control command speech input, noise and interference may be present, which can be confusing for speech recognition and understanding. By the voice generator based on the countermeasure generation network, the noise-reduced voice input can be generated, thereby reducing the influence of noise and improving the definition and the understandability of the voice signal. The noise-reduced door and window control command voice input is cleaner and more accurate, which is helpful for improving the reliability and accuracy of the voice recognition system. The generated voice input is closer to a real sample, so that the subsequent door and window control instruction recognition and processing tasks are more reliable.
Notably, the countermeasure generation network (Generative Adversarial Networks, GAN for short) is a neural network structure composed of a generator and a arbiter for generating realistic data samples. The core idea of GAN is to improve the quality of the samples generated by the generator by the countermeasure training between the generator and the arbiter. The structure of the GAN includes the following two main components: a generator: the generator is a neural network model that receives as input a random noise vector and attempts to generate new samples that are similar to the training data samples. The goal of the generator is to learn to generate realistic data samples, making them similar in appearance and distribution to the real samples; a discriminator: the arbiter is also a neural network model that receives as input the samples generated by the generator and the actual data samples and attempts to distinguish them. The goal of the arbiter is to learn to distinguish the generator-generated samples from the true samples, i.e., to determine whether the input samples are true samples or the generator-generated samples.
Accordingly, in one possible implementation, the optimized global context door and window control command speech feature vector may be passed through a countermeasure-generating network based speech generator to obtain the denoised door and window control command speech input, for example by: and collecting a voice data set containing door and window control instructions and marking. Ensuring that the data set contains various speaker, environment and voice characteristics so as to improve the generalization capability of the model; the collected speech data is preprocessed. This includes removing silence segments, performing speech segmentation, audio format conversion, etc., to prepare the data for subsequent processing; features are extracted from the preprocessed speech data using a one-dimensional convolution layer or the like. Common feature extraction methods include Mel spectrum features, MFCC (Mel frequency cepstrum coefficient), and the like; in order to increase the diversity and richness of the data, data enhancement operations such as adding noise, shifting, tonal modification, etc. may be performed on the features. This helps to improve the generalization ability of the generator; challenge-generating network training: the network structure of the generator and the arbiter is defined. The generator receives as input the optimized global context window control command speech feature vector, attempts to generate a denoised speech input. The discriminator receives the real voice input and the voice input generated by the generator, and judges and distinguishes the real voice input and the voice input; and performing countermeasure training. Alternating training the generator and the arbiter, the generator improving the quality of the generated samples by maximizing the error of the arbiter, while the arbiter accurately distinguishing between the real samples and the generated samples by minimizing the error; using appropriate penalty functions, such as generator penalty and arbiter penalty, to guide the training process of the network; generating door and window control instruction voice input after noise reduction: giving an optimized global context door and window control instruction voice feature vector as input, and generating a noise-reduced voice input through a generator; the output of the generator can be adjusted according to the need, such as adjusting the volume of the audio, the speech speed and the like; and carrying out post-processing operation on the generated noise-reduced voice input. This includes audio format conversion, audio gain adjustment, etc., to ensure that the generated speech input meets system requirements.
It should be noted that, in other specific examples of the present application, the noise-reduced door and window control instruction speech input may be generated by other manners based on the global context door and window control instruction speech feature vector, for example: a speech sample data set containing door and window control instructions is collected. These samples may include different voice instructions, different speakers, and recordings of different environmental conditions; the collected speech samples are preprocessed. This includes removing noise, reducing background noise interference, etc. Common noise reduction methods include the use of noise reduction filters, spectral subtraction, or noise reduction models based on deep learning; in order to increase the diversity and robustness of the data, data enhancement can be performed on the preprocessed speech samples. Data enhancement techniques include speed disturbances, voice pitch changes, adding noise, and the like. Thus, more samples can be generated, and the generalization capability of the model is enhanced; and labeling the voice samples after preprocessing and data enhancement. The labels can be corresponding door and window control instruction labels so as to conduct supervised learning during model training; and extracting the characteristics of the preprocessed and labeled voice samples by using a voice characteristic extractor based on a one-dimensional convolution layer. These feature extractors can extract local features, multichannel features, and capture important features of speech through nonlinear mapping, downsampling, and other operations; the result of the feature extraction is input into a converter-based intermediate feature sequence encoder. The encoder can perform context modeling, long-distance dependence modeling, feature interaction and integration on the feature sequence, and generate a door and window control instruction voice feature vector with global context; and restoring the generated door and window control instruction voice feature vector into a voice signal through inverse transformation. The inverse transform may be implemented using techniques such as inverse convolution, inverse fourier transform, and the like.
It should be noted that, in other specific examples of the present application, the door and window control command voice input may be further subjected to noise reduction processing in other manners to obtain a post-noise reduction door and window control command voice input, for example: a microphone or suitable recording device is used to record the voice input containing the door and window control instructions. Ensuring the recorded audio quality as clear as possible for subsequent processing; audio pretreatment: noise detection: analyzing a noise level in the audio using a noise detection algorithm; noise reduction: noise is reduced or eliminated from the audio signal using noise reduction techniques, such as adaptive filters or spectral subtraction. This will improve the clarity of the instruction speech; and (3) voice recognition: feature extraction: extracting features of audio using an audio processing algorithm, such as mel-frequency cepstral coefficient (MFCC) or Linear Predictive Coding (LPC), etc.; speech recognition model: inputting the extracted features into a speech recognition model, such as a Deep Neural Network (DNN) or a Recurrent Neural Network (RNN), to identify a door and window control command; instruction analysis: and matching the recognized text instruction with an instruction in the door and window control system to execute corresponding operation. This may involve communicating with an interface of the door and window control system to send the correct instructions.
Specifically, in step S3, the noise-reduced door and window control command voice input is subjected to voice recognition to generate a door and window control command. It should be understood that the voice input after noise reduction is clearer and more accurate, and the recognition system can more quickly convert the voice command into door and window control operation. This can increase the response speed of the door and window control system, so that the user can control the door and window more quickly.
Accordingly, in one possible implementation manner, the voice input of the door and window control command after noise reduction may be subjected to voice recognition to generate a door and window control command, for example: and preprocessing the noise-reduced door and window control instruction voice input. This includes audio format conversion, audio segmentation, audio gain adjustment, etc. operations to prepare the data for subsequent processing; and extracting features from the preprocessed door and window control instruction voice input. Common feature extraction methods include Mel-frequency spectrum features, MFCC (Mel-frequency cepstral coefficient), and the like. These features represent the spectral information of the speech, providing input for subsequent speech recognition; and training a voice recognition model by using the labeled door and window control instruction voice data set. Common speech recognition models include deep learning based end-to-end models, such as a combination model of a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN), and traditional combination models of acoustic models and language models, such as HMM and GMM; and (3) reasoning the noise-reduced door and window control instruction voice input by using the trained voice recognition model. Inputting the feature vector into a voice recognition model, and outputting a text representation of a corresponding door and window control instruction by the model; and carrying out post-processing on the door and window control instruction text output by the voice recognition model. This includes voice command parsing, semantic understanding, etc., converting text commands into specific door and window control operations, such as opening and closing doors and windows, adjusting curtains, etc.
In summary, the interactive control method of the intelligent door and window according to the embodiment of the application is explained, and the door and window control instruction is intelligently generated by preprocessing and recognizing the door and window control instruction voice input of a user by adopting an artificial intelligence technology and a voice recognition technology based on deep learning.
Further, an interactive control system of the intelligent door and window is also provided.
Fig. 7 is a block diagram of an interactive control system for a smart door and window according to an embodiment of the present application. As shown in fig. 7, an interactive control system 300 for a smart door and window according to an embodiment of the present application includes: a control instruction acquisition module 310, configured to acquire a door and window control instruction voice input provided by a user; the noise reduction module 320 is configured to perform noise reduction processing on the door and window control command voice input to obtain a door and window control command voice input after noise reduction; and a control instruction generating module 330, configured to perform speech recognition on the noise-reduced door and window control instruction speech input to generate a door and window control instruction.
As described above, the interactive control system 300 for doors and windows according to the embodiment of the present application may be implemented in various wireless terminals, such as a server or the like having an interactive control algorithm for doors and windows. In one possible implementation, the fenestration interactive control system 300 according to the embodiments of the present application may be integrated into the wireless terminal as a software module and/or a hardware module. For example, the fenestration interaction control system 300 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the fenestration interactive control system 300 can also be one of a plurality of hardware modules of the wireless terminal.
Alternatively, in another example, the fenestration interaction control system 300 and the wireless terminal may be separate devices, and the fenestration interaction control system 300 may be connected to the wireless terminal through a wired and/or wireless network and transmit the interaction information in an agreed data format.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (4)

1. The interactive control method for the intelligent doors and windows is characterized by comprising the following steps of:
acquiring a door and window control instruction voice input provided by a user;
carrying out noise reduction treatment on the door and window control instruction voice input to obtain a door and window control instruction voice input after noise reduction; and
performing voice recognition on the noise-reduced door and window control instruction voice input to generate a door and window control instruction;
noise reduction processing is carried out on the door and window control instruction voice input to obtain the door and window control instruction voice input after noise reduction, and the method comprises the following steps:
performing data preprocessing on the door and window control command voice input to obtain a plurality of source domain enhanced door and window control command voice fragments;
performing feature extraction and sequence coding on the plurality of source domain enhanced door and window control instruction voice fragments to obtain global context door and window control instruction voice feature vectors; and
generating the door and window control command voice input after noise reduction based on the global context door and window control command voice feature vector;
generating the noise-reduced door and window control instruction voice input based on the global context door and window control instruction voice feature vector, including:
performing feature distribution optimization on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector; and
the optimized global context door and window control command voice feature vector is input through a voice generator based on a countermeasure generation network to obtain the door and window control command voice after noise reduction;
feature distribution optimization is performed on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector, and the feature distribution optimization method comprises the following steps: performing feature distribution optimization on the global context door and window control instruction voice feature vector by using the following optimization formula to obtain an optimized global context door and window control instruction voice feature vector;
wherein, the formula is:
wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v i ' represents the optimized global context window control instruction speech feature vector.
2. The interactive control method of intelligent doors and windows according to claim 1, wherein the preprocessing of the data of the door and window control command speech input to obtain a plurality of source-domain enhanced door and window control command speech segments comprises:
dividing the door and window control instruction voice input to obtain a plurality of door and window control instruction voice fragments; and
and respectively carrying out up-sampling on the door and window control command voice fragments to obtain the source domain enhanced door and window control command voice fragments.
3. The interactive control method of intelligent doors and windows according to claim 2, wherein performing feature extraction and sequence encoding on the plurality of source domain enhanced door and window control command speech segments to obtain global context door and window control command speech feature vectors comprises:
the plurality of source domain enhanced door and window control instruction voice fragments pass through a voice feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control instruction voice feature vectors; and
and enabling the plurality of door and window control instruction voice feature vectors to pass through a converter-based intermediate feature sequence encoder to obtain the global context door and window control instruction voice feature vector.
4. An interactive control system for intelligent doors and windows, comprising:
the control instruction acquisition module is used for acquiring door and window control instruction voice input provided by a user;
the noise reduction module is used for carrying out noise reduction processing on the door and window control instruction voice input so as to obtain the door and window control instruction voice input after noise reduction; and
the control instruction generation module is used for carrying out voice recognition on the noise-reduced door and window control instruction voice input so as to generate a door and window control instruction;
the noise reduction module is specifically configured to:
performing data preprocessing on the door and window control command voice input to obtain a plurality of source domain enhanced door and window control command voice fragments;
performing feature extraction and sequence coding on the plurality of source domain enhanced door and window control instruction voice fragments to obtain global context door and window control instruction voice feature vectors; and
generating the door and window control command voice input after noise reduction based on the global context door and window control command voice feature vector;
the noise reduction module is specifically configured to:
performing feature distribution optimization on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector; and
the optimized global context door and window control command voice feature vector is input through a voice generator based on a countermeasure generation network to obtain the door and window control command voice after noise reduction;
performing feature distribution optimization on the global context door and window control instruction voice feature vector by using the following optimization formula to obtain an optimized global context door and window control instruction voice feature vector;
wherein, the formula is:
wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v' i Representing the optimized global context gateWindow control instructs speech feature vectors.
CN202311086549.2A 2023-08-25 2023-08-25 Interactive control method and system for intelligent doors and windows Active CN117095674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311086549.2A CN117095674B (en) 2023-08-25 2023-08-25 Interactive control method and system for intelligent doors and windows

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311086549.2A CN117095674B (en) 2023-08-25 2023-08-25 Interactive control method and system for intelligent doors and windows

Publications (2)

Publication Number Publication Date
CN117095674A CN117095674A (en) 2023-11-21
CN117095674B true CN117095674B (en) 2024-03-26

Family

ID=88769662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311086549.2A Active CN117095674B (en) 2023-08-25 2023-08-25 Interactive control method and system for intelligent doors and windows

Country Status (1)

Country Link
CN (1) CN117095674B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02262199A (en) * 1989-04-03 1990-10-24 Toshiba Corp Speech recognizing device with environment monitor
CN102427418A (en) * 2011-12-09 2012-04-25 福州海景科技开发有限公司 Intelligent household system based on speech recognition
CN108286386A (en) * 2018-01-22 2018-07-17 奇瑞汽车股份有限公司 The method and apparatus of vehicle window control
CN114217536A (en) * 2021-12-13 2022-03-22 安徽蓝九信息科技有限公司 Intelligent monitoring system based on Internet of things
CN114360561A (en) * 2021-12-07 2022-04-15 广东电力信息科技有限公司 Voice enhancement method based on deep neural network technology
CN115910074A (en) * 2022-10-27 2023-04-04 深圳市经纬纵横科技有限公司 Voice control method and device for intelligent access control
CN116013297A (en) * 2022-12-17 2023-04-25 西安交通大学 Audio-visual voice noise reduction method based on multi-mode gating lifting model
CN116312570A (en) * 2023-03-15 2023-06-23 山东新一代信息产业技术研究院有限公司 Voice noise reduction method, device, equipment and medium based on voiceprint recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02262199A (en) * 1989-04-03 1990-10-24 Toshiba Corp Speech recognizing device with environment monitor
CN102427418A (en) * 2011-12-09 2012-04-25 福州海景科技开发有限公司 Intelligent household system based on speech recognition
CN108286386A (en) * 2018-01-22 2018-07-17 奇瑞汽车股份有限公司 The method and apparatus of vehicle window control
CN114360561A (en) * 2021-12-07 2022-04-15 广东电力信息科技有限公司 Voice enhancement method based on deep neural network technology
CN114217536A (en) * 2021-12-13 2022-03-22 安徽蓝九信息科技有限公司 Intelligent monitoring system based on Internet of things
CN115910074A (en) * 2022-10-27 2023-04-04 深圳市经纬纵横科技有限公司 Voice control method and device for intelligent access control
CN116013297A (en) * 2022-12-17 2023-04-25 西安交通大学 Audio-visual voice noise reduction method based on multi-mode gating lifting model
CN116312570A (en) * 2023-03-15 2023-06-23 山东新一代信息产业技术研究院有限公司 Voice noise reduction method, device, equipment and medium based on voiceprint recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基音调整的语音分析方法;杨慧敏, 陈弘毅, 孙义和;清华大学学报(自然科学版)(S1);全文 *

Also Published As

Publication number Publication date
CN117095674A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
Ravanelli et al. Multi-task self-supervised learning for robust speech recognition
WO2021139294A1 (en) Method and apparatus for training speech separation model, storage medium, and computer device
CN111312245B (en) Voice response method, device and storage medium
KR100908121B1 (en) Speech feature vector conversion method and apparatus
CA2122575C (en) Speaker independent isolated word recognition system using neural networks
CN114023300A (en) Chinese speech synthesis method based on diffusion probability model
CN114550703A (en) Training method and device of voice recognition system, and voice recognition method and device
Zöhrer et al. Representation learning for single-channel source separation and bandwidth extension
Rajesh Kumar et al. Optimization-enabled deep convolutional network for the generation of normal speech from non-audible murmur based on multi-kernel-based features
CN115881164A (en) Voice emotion recognition method and system
Sivaram et al. Data-driven and feedback based spectro-temporal features for speech recognition
CN117746908A (en) Voice emotion recognition method based on time-frequency characteristic separation type transducer cross fusion architecture
Dua et al. Noise robust automatic speech recognition: review and analysis
CN117095674B (en) Interactive control method and system for intelligent doors and windows
Jagadeeshwar et al. ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN
Shome et al. Speaker Recognition through Deep Learning Techniques: A Comprehensive Review and Research Challenges
CN117980915A (en) Contrast learning and masking modeling for end-to-end self-supervised pre-training
CN115171878A (en) Depression detection method based on BiGRU and BiLSTM
CN114360491A (en) Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium
CN114298019A (en) Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product
Soni et al. Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
Jannu et al. An Overview of Speech Enhancement Based on Deep Learning Techniques
Maruf et al. Effects of noise on RASTA-PLP and MFCC based Bangla ASR using CNN
CN112951270A (en) Voice fluency detection method and device and electronic equipment
Iswarya et al. Speech query recognition for Tamil language using wavelet and wavelet packets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant