CN117095674B

CN117095674B - Interactive control method and system for intelligent doors and windows

Info

Publication number: CN117095674B
Application number: CN202311086549.2A
Authority: CN
Inventors: 梁晓东; 胡新尧; 张俊峰; 梁恒; 林狄
Original assignee: Guangdong Fulinmen Shijia Smart Home Co ltd
Current assignee: Guangdong Fulinmen Shijia Smart Home Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2024-03-26
Anticipated expiration: 2043-08-25
Also published as: CN117095674A

Abstract

The application discloses an interactive control method and system of intelligent doors and windows, which are used for preprocessing and recognizing the voice input of the door and window control instruction of a user by adopting an artificial intelligent technology and a voice recognition technology based on deep learning so as to intelligently generate the door and window control instruction.

Description

Interactive control method and system for intelligent doors and windows

Technical Field

The application relates to the field of intelligent control, and more particularly, to an interactive control method and system for intelligent doors and windows.

Background

The reason for interactive control of doors and windows is to improve convenience and comfort of users. Conventional door and window control schemes typically require manual operation, such as using a switch or remote control, which may require the user to leave a comfortable location or a large distance from the door and window.

Therefore, an optimized interactive control scheme for doors and windows is desired.

Disclosure of Invention

The present application has been made in order to solve the above technical problems. The embodiment of the application provides an interactive control method and system for intelligent doors and windows, which are used for realizing intelligent generation of door and window control instructions by preprocessing and recognizing the voice input of the door and window control instructions of a user by adopting an artificial intelligence technology and a voice recognition technology based on deep learning.

According to one aspect of the present application, there is provided an interactive control method for an intelligent door and window, including:

acquiring a door and window control instruction voice input provided by a user;

carrying out noise reduction treatment on the door and window control instruction voice input to obtain a door and window control instruction voice input after noise reduction; and

and carrying out voice recognition on the voice input of the door and window control instruction after noise reduction to generate a door and window control instruction.

According to another aspect of the present application, there is provided an interactive control system for a smart door and window, including:

the control instruction acquisition module is used for acquiring door and window control instruction voice input provided by a user;

the noise reduction module is used for carrying out noise reduction processing on the door and window control instruction voice input so as to obtain the door and window control instruction voice input after noise reduction; and

and the control instruction generation module is used for carrying out voice recognition on the noise-reduced door and window control instruction voice input so as to generate a door and window control instruction.

Compared with the prior art, the interactive control method and the interactive control system for the intelligent door and window, which are provided by the application, perform preprocessing and recognition on the door and window control instruction voice input of the user by adopting the artificial intelligence technology and the voice recognition technology based on deep learning, so that the door and window control instruction is intelligently generated.

Drawings

The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a flowchart of an interactive control method of a smart door and window according to an embodiment of the present application;

fig. 2 is a system architecture diagram of an interactive control method of a smart door and window according to an embodiment of the present application;

fig. 3 is a flowchart of substep S2 of the interactive control method of the smart door and window according to an embodiment of the present application;

fig. 4 is a flowchart of substep S21 of the interactive control method of the smart door and window according to the embodiment of the present application;

fig. 5 is a flowchart of substep S22 of the interactive control method of the smart door and window according to an embodiment of the present application;

fig. 6 is a flowchart of substep S23 of the interactive control method of the smart door and window according to an embodiment of the present application;

fig. 7 is a block diagram of an interactive control system for a smart door and window according to an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.

Flowcharts are used in this application to describe the operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

The reason for interactive control of doors and windows is to improve convenience and comfort of users. Conventional door and window control schemes typically require manual operation, such as using a switch or remote control, which may require the user to leave a comfortable location or a large distance from the door and window. Therefore, an optimized interactive control scheme for doors and windows is desired.

In the technical scheme of the application, an interactive control method for intelligent doors and windows is provided. Fig. 1 is a flowchart of an interactive control method of a smart door and window according to an embodiment of the present application. Fig. 2 is a system architecture diagram of an interactive control method for a smart door and window according to an embodiment of the present application. As shown in fig. 1 and fig. 2, the interactive control method for the smart door and window according to the embodiment of the application includes the steps of: s1, acquiring door and window control instruction voice input provided by a user; s2, carrying out noise reduction treatment on the door and window control instruction voice input to obtain the door and window control instruction voice input after noise reduction; and S3, performing voice recognition on the noise-reduced door and window control instruction voice input to generate a door and window control instruction.

Specifically, in step S1, a door and window control instruction voice input provided by a user is acquired.

Specifically, in step S2, the door and window control command speech input is subjected to noise reduction processing to obtain a door and window control command speech input after noise reduction. In particular, in one specific example of the present application, as shown in fig. 3, the S2 includes: s21, carrying out data preprocessing on the door and window control instruction voice input to obtain a plurality of source domain enhanced door and window control instruction voice fragments; s22, feature extraction and sequence coding are carried out on the source domain enhanced door and window control instruction voice fragments so as to obtain global context door and window control instruction voice feature vectors; and S23, generating the door and window control command voice input after noise reduction based on the global context door and window control command voice feature vector.

Specifically, the step S21 is to perform data preprocessing on the input of the door and window control command voice to obtain a plurality of source domain enhanced door and window control command voice segments. It should be appreciated that with speech recognition technology, a user's voice input may provide door and window control commands, and the system then converts the voice input to text commands via the speech recognition technology, and then controls the movement of the door and window according to the text commands. However, the complexity of speech input is affected by user habits and expressions. In practice, the user may use different languages, dialects, accents, speech speeds, intonation, etc. to provide the door and window control instructions, such as different expressions of "open window", "open window a bit", "open window half", etc. These expressions increase the difficulty of speech recognition, resulting in reduced speech recognition efficiency, thereby affecting the generation and execution of door and window control commands. Therefore, in the technical scheme of the application, the door and window control command voice input is subjected to data preprocessing to obtain a plurality of source domain enhanced door and window control command voice fragments. In particular, in one specific example of the present application, as shown in fig. 4, the S21 includes: s211, segmenting the door and window control instruction voice input to obtain a plurality of door and window control instruction voice fragments; and S212, respectively up-sampling the door and window control command voice fragments to obtain the source domain enhanced door and window control command voice fragments.

More specifically, in S211, the door and window control command speech input is segmented to obtain a plurality of door and window control command speech segments. It should be appreciated that the function of splitting the door and window control command speech input is to split the entire speech sample into a plurality of segments, each segment corresponding to a particular door and window control command. The purpose of this is to facilitate the subsequent processing steps of data enhancement, data balancing, and data annotation, as well as the input processing when training and evaluating the model.

Accordingly, in one possible implementation manner, the door and window control command speech input may be segmented to obtain a plurality of door and window control command speech segments, for example: the door and window control command voice input is audio aligned to ensure that the start and end times of the command are consistent. Audio alignment may be implemented using speech processing tools or algorithms; a voice recognition technique or instruction detection algorithm is used to detect the position of the door and window control instructions in the audio. This may be accomplished by identifying a particular sound pattern or voice feature of the instruction; and according to the instruction detection result, cutting the audio sample into a plurality of fragments, wherein each fragment corresponds to a door and window control instruction. The segmentation can be performed according to the starting and ending time points of the instruction, so that each segment is ensured to contain a complete instruction; and carrying out necessary pretreatment on each door and window control instruction voice segment. This may include removing mute portions, noise reduction processing, audio enhancement, etc., to improve the recognition accuracy and quality of the instructions; the correct tags are added to each door and window control command speech segment for subsequent training and model evaluation. A tag may represent a specific instruction type or action.

More specifically, the step S212 is to up-sample the respective door and window control command speech segments to obtain the plurality of source domain enhanced door and window control command speech segments. It should be appreciated that upsampling the gate-window control command speech segments serves to increase the number and diversity of samples to improve the training effect and robustness of the model.

Notably, upsampling is a data processing technique that increases the number of samples in a data set. For audio data or speech data, upsampling generally refers to increasing the number of audio samples. This may be accomplished by different methods including, but not limited to, the following: repeat replication: existing samples are simply replicated to generate a plurality of identical samples. This method is straightforward, but may result in samples in the training data that are too similar, lacking in diversity; interpolation: a new sample is generated between existing samples using an interpolation algorithm. Interpolation may be linear interpolation based on time or interpolation based on features. This allows new samples to be generated that are similar to, but slightly different from, the original samples; generating a model: a new sample is generated using a generation model (e.g., generating an antagonism network). The generative model may learn the distribution of the data and then generate new samples that are similar to, but not exactly the same as, the original samples. This approach can increase the diversity of the data.

Accordingly, in one possible implementation manner, the respective door and window control command speech segments may be up-sampled to obtain the plurality of source-domain enhanced door and window control command speech segments, for example, by: a speech segment data set containing different door and window control instructions is collected. These speech segments should come from different speakers, speech rates, pitch, and environmental conditions to increase the diversity of the data; the collected speech segments are preprocessed, including noise removal, normalized audio quality, volume balance, and the like. This may enhance the effect of subsequent processing steps; the speech segments are processed using various data enhancement techniques to generate additional samples. Common data enhancement techniques include speed perturbation (e.g., accelerating or decelerating speech), pitch perturbation (e.g., increasing or decreasing pitch), time stretching and compression, and the like. These techniques can increase the diversity of data; the class distribution between the generated enhanced samples and the original samples is checked to ensure a sufficient number of samples per door and window control command. If the number of samples of certain instructions is small, the data set may be balanced by copying or generating more samples; labeling the generated enhanced samples to ensure that each sample has a correct door and window control instruction label.

It should be noted that, in other specific examples of the present application, the door and window control command voice input may be further subjected to data preprocessing in other manners to obtain a plurality of source domain enhanced door and window control command voice segments, for example: voice sample data including door and window control instructions is collected. These samples may come from different speakers, environments, and recording devices to obtain diverse source domain data; audio pretreatment: audio alignment: performing audio alignment on the collected voice samples to ensure that the starting time and the ending time of the door and window control instruction are consistent; audio segmentation: dividing an audio sample into a plurality of fragments according to the starting time and the ending time of the door and window control instruction, wherein each fragment corresponds to one instruction; data enhancement: noise injection: selecting different types of noise from the noise library, and mixing the noise with each voice segment to simulate different environmental noise; speed of sound variation: the speed of the voice is increased or reduced by changing the playing speed of the voice fragment so as to simulate different speaking speeds; tone variation: by changing the tone of the speech segment, the pitch of the speech is increased or decreased to simulate the tone characteristics of different speakers; reverberation effect: adding a proper amount of reverberation effect on the voice segment to simulate different recording environments; ensuring that the number of samples of each door and window control instruction in the generated enhanced data set is relatively balanced, and avoiding that certain instructions are too concentrated or too rare; a correct door and window control command label is added to each enhanced speech segment for subsequent training and model evaluation.

Specifically, in S22, feature extraction and sequence encoding are performed on the plurality of source domain enhanced door and window control command speech segments to obtain a global context door and window control command speech feature vector. That is, the voice segments of the control instructions of the enhanced doors and windows of each source domain are independently extracted in characteristics so as to capture the local voice characteristics contained in the voice segments; and extracting global context association relations among the local voice features to enhance feature interaction and communication among the local voice features. In particular, in one specific example of the present application, as shown in fig. 5, the S22 includes: s221, enabling the plurality of source domain enhanced door and window control instruction voice fragments to pass through a voice feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control instruction voice feature vectors; and S222, enabling the plurality of door and window control instruction voice feature vectors to pass through a converter-based intermediate feature sequence encoder to obtain the global context door and window control instruction voice feature vector.

More specifically, the step S221 is to pass the plurality of source domain enhanced door and window control command speech segments through a speech feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control command speech feature vectors. It should be appreciated that a one-dimensional convolution layer may slide the convolution kernel over the input speech segment, capturing local speech features. These local features may include spectral morphology of the audio, formants of the speech, and so forth. Through convolution operation, the voice feature extractor can automatically learn and extract local voice features related to the door and window control instructions. Specifically, the S221 includes: each layer of the voice characteristic extractor based on the one-dimensional convolution layer is used for respectively carrying out input data in forward transfer of the layer: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature images based on a feature matrix to obtain pooled feature images; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the voice feature extractor based on the one-dimensional convolution layer is the voice feature vectors of the door and window control instructions, and the input of the first layer of the voice feature extractor based on the one-dimensional convolution layer is the voice segments of the source domain enhanced door and window control instructions.

Notably, one-dimensional convolutional layers are a common layer type in convolutional neural networks for processing one-dimensional sequence data. The method can effectively extract local features in the sequence data, and filter and map the features on the input through convolution operation. The structure of the one-dimensional convolution layer is as follows: input: the input of the one-dimensional convolution layer is a one-dimensional sequence, such as a word sequence in time sequence data or text data; convolution kernel: the one-dimensional convolution layer contains a plurality of convolution kernels, each of which is a small matrix of learnable parameters. The size of the convolution kernel is typically a window size that slides in the time dimension for capturing local features of the input sequence; convolution operation: for each convolution kernel, a one-dimensional convolution layer convolves it with the input sequence. The convolution operation is performed by sliding a convolution kernel over the input sequence and calculating a dot product of the convolution kernel and the input sequence. This process can be seen as filtering the input sequence, highlighting local features in the sequence; activation function: an activation function is applied to the result of the convolution operation to introduce nonlinearities. Common activation functions include ReLU, sigmoid, tanh, etc.; pooling layer: after the one-dimensional convolution layer, a pooling layer may be added to reduce the dimension of the feature map. Common pooling operations include maximum pooling and average pooling, which can extract the most significant portion of a feature or downsample a feature. The one-dimensional convolution layer can extract local features at different positions through sliding operation of a plurality of convolution kernels on an input sequence, and processes and compresses the features through an activation function and a pooling operation. The one-dimensional convolution layer is very effective in the aspects of feature extraction and pattern recognition of sequence data, and is often applied to the fields of voice recognition, text classification, time sequence analysis and the like.

More specifically, the step S222 is to pass the plurality of door and window control command speech feature vectors through a converter-based intermediate feature sequence encoder to obtain the global context door and window control command speech feature vector. It should be understood that, through the converter-based intermediate feature sequence encoder, the context modeling, long-distance dependency modeling, feature interaction and integration can be performed on the plurality of door and window control command voice feature vectors, and position codes are added, so that the context association relationship among the plurality of feature vectors is integrated, and the door and window control command voice feature vectors with global context are obtained. Specifically, the door and window control command voice feature vectors are arranged in one dimension to obtain global door and window control command voice feature vectors; calculating the product between the global door and window control command voice feature vector and the transpose vector of each door and window control command voice feature vector in the plurality of door and window control command voice feature vectors to obtain a plurality of self-attention association matrixes; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each door and window control instruction voice feature vector in the plurality of door and window control instruction voice feature vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic door and window control instruction voice feature vectors; and cascading the context semantic window control instruction voice feature vectors to obtain the global context window control instruction voice feature vector.

Notably, the intermediate feature sequence encoder uses a self-attention mechanism to interact and integrate feature vectors in the input sequence. The self-attention mechanism allows each feature vector to interact with other feature vectors and weight them according to their relative importance. Thus, the model can capture the dependency of the global context and perform feature integration according to the relationship between features. In addition to the self-attention mechanism, the intermediate signature sequence encoder also includes a feed-forward neural network layer. The feed forward neural network layer is typically a fully connected layer or layers for non-linear transformation and mapping of the feature vectors for each location. This helps to enhance the representation capability of the features and the non-linear modeling capability of the model.

It should be noted that, in other specific examples of the present application, the feature extraction and the sequence encoding may be performed on the plurality of source domain enhanced door and window control command speech segments in other manners to obtain a global context door and window control command speech feature vector, for example: collecting enhanced door and window control instruction voice fragments of a plurality of source domains as training data; preprocessing the collected voice fragments, including removing noise, reducing noise, adjusting audio gain and the like, so as to improve voice quality and reduce interference; and performing data enhancement operation on the preprocessed voice fragments to expand the diversity of training data. Common data enhancement methods include speed disturbance, voice tonal modification, noise addition, etc.; the sample quantity balance of each category is ensured, and the problem of unbalanced category during model training is avoided. The data set may be balanced by over-sampling, under-sampling, or generating a composite sample, etc.; labeling each voice segment with a corresponding door and window control instruction label so as to train and evaluate the model later; and extracting the characteristics of the enhanced door and window control instruction voice fragments. Common feature extraction methods include short-time fourier transforms, and mel-frequency cepstral coefficients. These features may capture spectral information and acoustic properties of the speech segments; the signature sequence is encoded using a recurrent neural network or a variant thereof (e.g., long and short term memory network, LSTM). The RNN can establish a context dependency relationship in the sequence data and capture global context information of door and window control instruction voice; and processing the sequence coding result through pooling operation to obtain the global context door and window control instruction voice feature vector with fixed dimension. This feature vector will incorporate information of the enhanced samples for training the model or other related tasks.

Specifically, the step S23 is to generate the noise-reduced door and window control command voice input based on the global context door and window control command voice feature vector. In particular, in one specific example of the present application, as shown in fig. 6, the S23 includes: s231, performing feature distribution optimization on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector; and S232, the optimized global context door and window control command voice feature vector is input through a voice generator based on a countermeasure generation network to obtain the noise-reduced door and window control command voice.

More specifically, in S231, feature distribution optimization is performed on the global context door and window control command speech feature vector to obtain an optimized global context door and window control command speech feature vector. Particularly, in the technical scheme of the application, when the plurality of source domain enhanced door and window control command voice fragments are obtained through a voice feature extractor based on a one-dimensional convolution layer, each door and window control command voice feature vector can express local amplitude correlation features in the time sequence direction of the corresponding door and window control command voice fragment, so that the obtained global context door and window control command voice feature vectors can further express global context correlation of the local time sequence amplitude correlation features after the plurality of door and window control command voice feature vectors are passed through an intermediate feature sequence encoder based on a converter. In this way, the amplitude correlation feature with respect to the local timing is taken as the foreground object feature, and when the global timing context correlation is performed, the background distribution noise related to the feature distribution interference at each local timing is also introduced, and the global context window control instruction speech feature vector also has the time-space hierarchical semantic expression at the local timing and the global timing, thereby, it is desirable to enhance the expression effect thereof based on the distribution characteristics of the global context window control instruction speech feature vector. Therefore, the applicant of the present application performs a distribution gain based on a probability density feature simulation paradigm on the global context window control command voice feature vector, specifically expressed as:

wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V _i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v _i ' represents the optimized global context window control instruction speech feature vector. Here, the feature-mimicking paradigm of the standard cauchy distribution on the probability density for the natural gaussian distribution, the distribution gain based on the probability density feature-mimicking paradigm may distinguish between foreground object features and background distribution noise in a high-dimensional feature space with feature scale as a mimicking mask, so as to perform semantic-aware distribution soft matching of feature-space mapping on the high-dimensional space based on the spatially-hierarchical semantics of the high-dimensional featuresThe unconstrained distribution gain of the high-dimensional feature distribution is obtained, the expression effect of the global context door and window control instruction voice feature vector based on the feature distribution characteristic is improved, and the voice quality of the door and window control instruction voice input obtained by the global context door and window control instruction voice feature vector through the voice generator based on the countermeasure generation network is improved, so that the control quality of the door and window control instruction generated by carrying out semantic recognition on the door and window control instruction voice input after noise reduction is improved.

More specifically, S232, the optimized global context door and window control command voice feature vector is passed through a voice generator based on a countermeasure generation network to obtain the noise-reduced door and window control command voice input. It should be appreciated that for door and window control command speech input, noise and interference may be present, which can be confusing for speech recognition and understanding. By the voice generator based on the countermeasure generation network, the noise-reduced voice input can be generated, thereby reducing the influence of noise and improving the definition and the understandability of the voice signal. The noise-reduced door and window control command voice input is cleaner and more accurate, which is helpful for improving the reliability and accuracy of the voice recognition system. The generated voice input is closer to a real sample, so that the subsequent door and window control instruction recognition and processing tasks are more reliable.

Notably, the countermeasure generation network (Generative Adversarial Networks, GAN for short) is a neural network structure composed of a generator and a arbiter for generating realistic data samples. The core idea of GAN is to improve the quality of the samples generated by the generator by the countermeasure training between the generator and the arbiter. The structure of the GAN includes the following two main components: a generator: the generator is a neural network model that receives as input a random noise vector and attempts to generate new samples that are similar to the training data samples. The goal of the generator is to learn to generate realistic data samples, making them similar in appearance and distribution to the real samples; a discriminator: the arbiter is also a neural network model that receives as input the samples generated by the generator and the actual data samples and attempts to distinguish them. The goal of the arbiter is to learn to distinguish the generator-generated samples from the true samples, i.e., to determine whether the input samples are true samples or the generator-generated samples.

Accordingly, in one possible implementation, the optimized global context door and window control command speech feature vector may be passed through a countermeasure-generating network based speech generator to obtain the denoised door and window control command speech input, for example by: and collecting a voice data set containing door and window control instructions and marking. Ensuring that the data set contains various speaker, environment and voice characteristics so as to improve the generalization capability of the model; the collected speech data is preprocessed. This includes removing silence segments, performing speech segmentation, audio format conversion, etc., to prepare the data for subsequent processing; features are extracted from the preprocessed speech data using a one-dimensional convolution layer or the like. Common feature extraction methods include Mel spectrum features, MFCC (Mel frequency cepstrum coefficient), and the like; in order to increase the diversity and richness of the data, data enhancement operations such as adding noise, shifting, tonal modification, etc. may be performed on the features. This helps to improve the generalization ability of the generator; challenge-generating network training: the network structure of the generator and the arbiter is defined. The generator receives as input the optimized global context window control command speech feature vector, attempts to generate a denoised speech input. The discriminator receives the real voice input and the voice input generated by the generator, and judges and distinguishes the real voice input and the voice input; and performing countermeasure training. Alternating training the generator and the arbiter, the generator improving the quality of the generated samples by maximizing the error of the arbiter, while the arbiter accurately distinguishing between the real samples and the generated samples by minimizing the error; using appropriate penalty functions, such as generator penalty and arbiter penalty, to guide the training process of the network; generating door and window control instruction voice input after noise reduction: giving an optimized global context door and window control instruction voice feature vector as input, and generating a noise-reduced voice input through a generator; the output of the generator can be adjusted according to the need, such as adjusting the volume of the audio, the speech speed and the like; and carrying out post-processing operation on the generated noise-reduced voice input. This includes audio format conversion, audio gain adjustment, etc., to ensure that the generated speech input meets system requirements.

It should be noted that, in other specific examples of the present application, the noise-reduced door and window control instruction speech input may be generated by other manners based on the global context door and window control instruction speech feature vector, for example: a speech sample data set containing door and window control instructions is collected. These samples may include different voice instructions, different speakers, and recordings of different environmental conditions; the collected speech samples are preprocessed. This includes removing noise, reducing background noise interference, etc. Common noise reduction methods include the use of noise reduction filters, spectral subtraction, or noise reduction models based on deep learning; in order to increase the diversity and robustness of the data, data enhancement can be performed on the preprocessed speech samples. Data enhancement techniques include speed disturbances, voice pitch changes, adding noise, and the like. Thus, more samples can be generated, and the generalization capability of the model is enhanced; and labeling the voice samples after preprocessing and data enhancement. The labels can be corresponding door and window control instruction labels so as to conduct supervised learning during model training; and extracting the characteristics of the preprocessed and labeled voice samples by using a voice characteristic extractor based on a one-dimensional convolution layer. These feature extractors can extract local features, multichannel features, and capture important features of speech through nonlinear mapping, downsampling, and other operations; the result of the feature extraction is input into a converter-based intermediate feature sequence encoder. The encoder can perform context modeling, long-distance dependence modeling, feature interaction and integration on the feature sequence, and generate a door and window control instruction voice feature vector with global context; and restoring the generated door and window control instruction voice feature vector into a voice signal through inverse transformation. The inverse transform may be implemented using techniques such as inverse convolution, inverse fourier transform, and the like.

It should be noted that, in other specific examples of the present application, the door and window control command voice input may be further subjected to noise reduction processing in other manners to obtain a post-noise reduction door and window control command voice input, for example: a microphone or suitable recording device is used to record the voice input containing the door and window control instructions. Ensuring the recorded audio quality as clear as possible for subsequent processing; audio pretreatment: noise detection: analyzing a noise level in the audio using a noise detection algorithm; noise reduction: noise is reduced or eliminated from the audio signal using noise reduction techniques, such as adaptive filters or spectral subtraction. This will improve the clarity of the instruction speech; and (3) voice recognition: feature extraction: extracting features of audio using an audio processing algorithm, such as mel-frequency cepstral coefficient (MFCC) or Linear Predictive Coding (LPC), etc.; speech recognition model: inputting the extracted features into a speech recognition model, such as a Deep Neural Network (DNN) or a Recurrent Neural Network (RNN), to identify a door and window control command; instruction analysis: and matching the recognized text instruction with an instruction in the door and window control system to execute corresponding operation. This may involve communicating with an interface of the door and window control system to send the correct instructions.

Specifically, in step S3, the noise-reduced door and window control command voice input is subjected to voice recognition to generate a door and window control command. It should be understood that the voice input after noise reduction is clearer and more accurate, and the recognition system can more quickly convert the voice command into door and window control operation. This can increase the response speed of the door and window control system, so that the user can control the door and window more quickly.

Accordingly, in one possible implementation manner, the voice input of the door and window control command after noise reduction may be subjected to voice recognition to generate a door and window control command, for example: and preprocessing the noise-reduced door and window control instruction voice input. This includes audio format conversion, audio segmentation, audio gain adjustment, etc. operations to prepare the data for subsequent processing; and extracting features from the preprocessed door and window control instruction voice input. Common feature extraction methods include Mel-frequency spectrum features, MFCC (Mel-frequency cepstral coefficient), and the like. These features represent the spectral information of the speech, providing input for subsequent speech recognition; and training a voice recognition model by using the labeled door and window control instruction voice data set. Common speech recognition models include deep learning based end-to-end models, such as a combination model of a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN), and traditional combination models of acoustic models and language models, such as HMM and GMM; and (3) reasoning the noise-reduced door and window control instruction voice input by using the trained voice recognition model. Inputting the feature vector into a voice recognition model, and outputting a text representation of a corresponding door and window control instruction by the model; and carrying out post-processing on the door and window control instruction text output by the voice recognition model. This includes voice command parsing, semantic understanding, etc., converting text commands into specific door and window control operations, such as opening and closing doors and windows, adjusting curtains, etc.

In summary, the interactive control method of the intelligent door and window according to the embodiment of the application is explained, and the door and window control instruction is intelligently generated by preprocessing and recognizing the door and window control instruction voice input of a user by adopting an artificial intelligence technology and a voice recognition technology based on deep learning.

Further, an interactive control system of the intelligent door and window is also provided.

Fig. 7 is a block diagram of an interactive control system for a smart door and window according to an embodiment of the present application. As shown in fig. 7, an interactive control system 300 for a smart door and window according to an embodiment of the present application includes: a control instruction acquisition module 310, configured to acquire a door and window control instruction voice input provided by a user; the noise reduction module 320 is configured to perform noise reduction processing on the door and window control command voice input to obtain a door and window control command voice input after noise reduction; and a control instruction generating module 330, configured to perform speech recognition on the noise-reduced door and window control instruction speech input to generate a door and window control instruction.

As described above, the interactive control system 300 for doors and windows according to the embodiment of the present application may be implemented in various wireless terminals, such as a server or the like having an interactive control algorithm for doors and windows. In one possible implementation, the fenestration interactive control system 300 according to the embodiments of the present application may be integrated into the wireless terminal as a software module and/or a hardware module. For example, the fenestration interaction control system 300 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the fenestration interactive control system 300 can also be one of a plurality of hardware modules of the wireless terminal.

Alternatively, in another example, the fenestration interaction control system 300 and the wireless terminal may be separate devices, and the fenestration interaction control system 300 may be connected to the wireless terminal through a wired and/or wireless network and transmit the interaction information in an agreed data format.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. The interactive control method for the intelligent doors and windows is characterized by comprising the following steps of:

acquiring a door and window control instruction voice input provided by a user;

performing voice recognition on the noise-reduced door and window control instruction voice input to generate a door and window control instruction;

noise reduction processing is carried out on the door and window control instruction voice input to obtain the door and window control instruction voice input after noise reduction, and the method comprises the following steps:

performing data preprocessing on the door and window control command voice input to obtain a plurality of source domain enhanced door and window control command voice fragments;

performing feature extraction and sequence coding on the plurality of source domain enhanced door and window control instruction voice fragments to obtain global context door and window control instruction voice feature vectors; and

generating the door and window control command voice input after noise reduction based on the global context door and window control command voice feature vector;

generating the noise-reduced door and window control instruction voice input based on the global context door and window control instruction voice feature vector, including:

performing feature distribution optimization on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector; and

the optimized global context door and window control command voice feature vector is input through a voice generator based on a countermeasure generation network to obtain the door and window control command voice after noise reduction;

feature distribution optimization is performed on the global context door and window control instruction voice feature vector to obtain an optimized global context door and window control instruction voice feature vector, and the feature distribution optimization method comprises the following steps: performing feature distribution optimization on the global context door and window control instruction voice feature vector by using the following optimization formula to obtain an optimized global context door and window control instruction voice feature vector;

wherein, the formula is:

wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V _i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v _i ' represents the optimized global context window control instruction speech feature vector.

2. The interactive control method of intelligent doors and windows according to claim 1, wherein the preprocessing of the data of the door and window control command speech input to obtain a plurality of source-domain enhanced door and window control command speech segments comprises:

dividing the door and window control instruction voice input to obtain a plurality of door and window control instruction voice fragments; and

and respectively carrying out up-sampling on the door and window control command voice fragments to obtain the source domain enhanced door and window control command voice fragments.

3. The interactive control method of intelligent doors and windows according to claim 2, wherein performing feature extraction and sequence encoding on the plurality of source domain enhanced door and window control command speech segments to obtain global context door and window control command speech feature vectors comprises:

the plurality of source domain enhanced door and window control instruction voice fragments pass through a voice feature extractor based on a one-dimensional convolution layer to obtain a plurality of door and window control instruction voice feature vectors; and

and enabling the plurality of door and window control instruction voice feature vectors to pass through a converter-based intermediate feature sequence encoder to obtain the global context door and window control instruction voice feature vector.

4. An interactive control system for intelligent doors and windows, comprising:

the control instruction generation module is used for carrying out voice recognition on the noise-reduced door and window control instruction voice input so as to generate a door and window control instruction;

the noise reduction module is specifically configured to:

performing feature distribution optimization on the global context door and window control instruction voice feature vector by using the following optimization formula to obtain an optimized global context door and window control instruction voice feature vector;

wherein, the formula is:

wherein V is the global context door and window control instruction voice feature vector, L is the length of the global context door and window control instruction voice feature vector, V _i Is the feature value of the ith position of the global context window control instruction speech feature vector,representing the square of the two norms of the global context window control instruction speech feature vector, and α is a weighted hyper-parameter, exp () represents the exponential operation of the vector, v' _i Representing the optimized global context gateWindow control instructs speech feature vectors.