CN115373280A

CN115373280A - Remote voice control method, device and system

Info

Publication number: CN115373280A
Application number: CN202110549978.3A
Authority: CN
Inventors: 杜兆臣; 孟卫明; 王彦芳
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2022-11-22

Abstract

The application discloses a remote voice control method, a device and a system, which are used for carrying out identity verification on voiceprints of voice commands and improving the safety of remote voice control. The application provides a remote voice control method, which comprises the following steps: when a user successfully logs in a local server through a terminal, receiving a user voice instruction sent by the terminal; and carrying out voiceprint verification on the voice command, and when the voiceprint verification passes, realizing remote control on the intelligent household equipment through the voice command.

Description

Remote voice control method, device and system

Technical Field

The application relates to the technical field of smart home, in particular to a remote voice control method, device and system.

Background

With the development of scientific technology, voice remote control also begins to appear slowly, and a user sends a voice instruction by using a mobile phone, so that intelligent household equipment is remotely controlled, and the like.

However, the existing remote voice control only transmits the voice command and does not carry out identity verification on the voice command, so that the problem of insecurity of the voice command exists.

Disclosure of Invention

The embodiment of the application provides a remote voice control method, a device and a system, which are used for carrying out identity verification on voiceprints of voice commands and improving the safety of remote voice control.

The embodiment of the application provides a remote voice control method, which comprises the following steps:

when a user successfully logs in a local server through a terminal, receiving a user voice instruction sent by the terminal;

and carrying out voiceprint verification on the voice command, and when the voiceprint verification passes, realizing remote control on the intelligent household equipment through the voice command.

By the method, when a user successfully logs in a local server through a terminal, a user voice instruction sent by the terminal is received; and carrying out voiceprint verification on the voice command, and when the verification is passed, realizing remote control on the intelligent household equipment through the voice command, thereby realizing identity verification on the voiceprint of the voice command and improving the safety of remote voice control.

Optionally, the method further comprises:

receiving a face image and voice information acquired by a user through a terminal;

extracting the face features of the face image and the voiceprint features of the voice information;

and fusing the voiceprint features and the face features, and judging whether the user can log in a local server or not based on a fusion result.

According to the embodiment of the application, aiming at the problems that the individual face verification of a user can be cracked by a photo or a mask, and voiceprints have identification holes for recorded and cloned sounds, the face features and the voiceprint features are fused, so that the reliability of login verification information of the user is improved.

Optionally, the fusing the voiceprint feature and the face feature, and determining whether the user can log in a local server based on a fusion result, specifically including:

constructing a fusion feature vector based on the vector of the face feature and the vector of the voiceprint feature;

determining a weight matrix of the fusion feature vector;

and performing summation operation on each feature weight in the weight matrix to finally obtain a numerical value, and if the numerical value belongs to a preset numerical value range, determining that the user successfully logs in the local server through the terminal.

Optionally, the voice instruction is used to implement remote control on the smart home device, and specifically includes:

and the voice instruction is transmitted to the coordinator and the voice box module respectively, the coordinator controls a power switch of the intelligent household equipment, and the voice box module performs instruction classification transmission according to the keyword of the intelligent household equipment contained in the voice instruction, so that the voice instruction is transmitted to the voice box corresponding to the intelligent household equipment corresponding to the keyword, and the voice instruction is played through the voice box and is given to the intelligent household equipment corresponding to the keyword.

According to the embodiment of the application, the problem that the remote voice control home equipment instruction cannot be fed back in time is solved, the user instruction is classified, two-way communication between a user and a local server and between the user and intelligent home equipment and between the user and the local server is achieved, and timely feedback and optimization of remote control home equipment information are achieved.

Optionally, the method further comprises:

and when the voice instruction cannot be executed by the corresponding intelligent household equipment, feeding error instruction prompt information back to the user terminal.

Optionally, when the verification passes, the method further comprises:

and if the voice instruction can not be processed offline in the local server, calling a cloud server to process the voice instruction.

The embodiment of the application provides a remote voice control device, include:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing any one of the methods according to the obtained program.

The remote voice control system provided by the embodiment of the application comprises the remote voice control device, a coordinator and a sound box module which are respectively connected with the remote voice control device, and at least one sound box connected with the sound box module; wherein the content of the first and second substances,

the coordinator is used for receiving the voice command sent by the remote voice control device and controlling the power switch of the intelligent household equipment based on the voice command;

the sound box module is used for determining keywords of the intelligent home equipment contained in the voice command sent by the remote voice control device, sending the voice command to a sound box correspondingly arranged on the intelligent home equipment corresponding to the keywords, and playing the voice command to the intelligent home equipment corresponding to the keywords through the sound box.

Optionally, the system further includes smart home devices corresponding to the speakers.

Another embodiment of the present application provides a computing device, which includes a memory and a processor, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions stored in the memory and executing any one of the methods according to the obtained program.

Another embodiment of the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform any one of the methods described above.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of an overall system framework provided by an embodiment of the present application;

fig. 2 is a schematic diagram of voiceprint feature acquisition and identification provided in an embodiment of the present application;

fig. 3 is a schematic view of face feature acquisition and recognition provided in the embodiment of the present application;

fig. 4 is a schematic diagram illustrating fusion of human face and voiceprint features according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a process of user identity registration and identity login provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a voice signal remote transmission and voice command parsing framework according to an embodiment of the present application;

fig. 7 is a schematic diagram of a remote voice-controlled smart home device framework provided in an embodiment of the present application;

fig. 8 is a schematic flowchart of a remote voice control method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a remote voice control apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another remote voice control apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The embodiment of the application provides a remote voice control method and a remote voice control device, which are used for verifying the identity of a user voice command sender, increasing the reliability of user login verification information and improving the safety of remote voice control.

The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

Various embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.

The embodiment of the application provides a reliable remote voice control intelligent home system of user, specifically includes:

1. aiming at the problems that the existing remote voice control only transmits a voice command, does not carry out voice command identity verification and has insecurity of the voice command, the embodiment of the application divides the voice signal into a voiceprint signal and a voice command, and carries out identity verification on each step of voice of a user;

2. aiming at the problems that the individual face verification of a user can be cracked by a photo or a mask and voiceprints have identification holes for recorded and cloned sounds, the face features and the voiceprint features are fused, so that the reliability of login verification information of the user is improved;

3. aiming at the problem that the remote voice control household equipment instruction cannot be fed back in time, the embodiment of the application classifies the user instruction, realizes the two-way communication between the user and the local server and between the user and the intelligent household equipment, and realizes the timely feedback and optimization of the remote control household equipment information.

In summary, the embodiments of the present application have the following advantages:

1. the voiceprint and face fusion features are used for constructing a user registration and login page, so that the user login verification function of the fusion features is realized, the loophole of face independent verification or voiceprint independent verification is avoided, and the security of a login system is improved;

2. after the user successfully logs in, each voice is analyzed, and voiceprint verification is performed again at first, so that each piece of information is verified, the legality of the user behavior is monitored in real time, and the safety of remote voice control is further ensured;

3. the intelligent home hardware system is constructed, user information is played to relevant equipment through the loudspeaker function of the intelligent sound box, relevant voice operation is achieved by the relevant equipment, the information can be fed back to the user in time, and the achievement and feedback of the remote control function are achieved.

The embodiment of the application mainly relates to a remote control system based on real-time monitoring. A new remote instruction identification mode can be provided for the user, and the privacy and property safety of the user are protected. The system framework provided by the embodiment of the application is shown in fig. 1, and the main contents of the system framework include:

(1) And acquiring the voiceprint characteristics of the user and constructing a voiceprint database.

(2) And collecting the face characteristics of the user and constructing a face database.

(3) And fusing the voiceprint features and the face features.

(4) And constructing a user identity registration system and an identity login system.

(5) And a voice signal remote transmission and voice command analysis system.

(6) And designing a remote voice control intelligent household equipment framework.

With respect to the above, specific embodiments are described as follows:

(1) Collecting user voiceprint characteristics and constructing a voiceprint database:

the voiceprint feature acquisition and recognition of the embodiment of the application are shown in fig. 2 and are divided into five stages of voice input, preprocessing, voice feature extraction, classifier and voiceprint database construction.

Voice input: and the mobile terminal designates the related text, and the user collects the user voice by reading the text.

Pretreatment: the method comprises the steps of obtaining a sound segment in original voice through time domain analysis, improving the recognition effect of audio through high-pass filter pre-emphasis processing, obtaining an audio frame through window function framing post-processing, detecting the short-time energy and the short-time zero-crossing rate of the audio frame, and determining the starting point and the ending point of a voice signal through setting a threshold.

Wherein, the short-time energy represents the magnitude of the speech signal energy of each frame.

The short-time zero crossing rate represents the number of times that the waveform of each frame of the voice signal passes through a zero axis.

With respect to setting the threshold: the average short-time zero-crossing rate and the short-time energy are used as the threshold of voice signal endpoint detection, which is a basic double-threshold endpoint detection algorithm, wherein the high threshold of the short-time energy is set to be 4-5 times of the low threshold, and the high threshold of the short-time zero-crossing rate is set to be about 2 times of the low threshold, so that the endpoint detection effect is best.

Voice feature extraction: and performing voice feature extraction by using Mel Frequency Cepstrum Coefficient (MFCC) and first-order difference coefficient, processing the preprocessed voice signal into feature vectors by a processing method of adding a background model to a Gaussian mixture model, and forming low-dimensional and discriminative voiceprint features by dimension reduction of the feature vectors.

Wherein, the Mel Frequency Cepstrum Coefficient (MFCC) is a branching frequency coefficient determined based on sensory judgment of pitch variation of human ears.

The mixed Gaussian model is a model which is formed by decomposing things into a plurality of models based on the Gaussian probability density function, wherein the things are accurately quantized by the Gaussian probability density function.

The background model refers to the background sound of the voice, and the background sound can be removed by subtracting two adjacent frames of the voice signal, so that the interference is avoided.

The first order difference coefficient: and (4) between two continuous frames, the difference is made between the characteristic parameters of the next frame and the characteristic parameters of the previous frame, so that the relation between the current speech frame and the previous frame is embodied.

The feature vector is as follows: a vector of a plurality of parameters characterizing a speech signal.

And (3) reducing the dimension: the high-dimensional feature vectors are reduced to low-dimensional by an algorithm (e.g., PCA).

The classifier comprises the following steps: and selecting a voice database VoxForge, and constructing a voiceprint identity recognition classifier by using the extracted voice feature vector through a support vector product (SVM).

The voice database VoxForge: the VOxForge is an open-source voice corpus and an acoustic model library, is commonly used in academia, and has strong robustness on testing voice models in different intonations and accent environments.

The Support Vector Machine (SVM): the classifier is used for performing binary classification on data according to a supervised learning mode.

The constructing of the voiceprint database: and constructing a database by the classifier of the voiceprint recognition in the last step. And then voice input is performed again, and the user voiceprint information is found in the voiceprint database through signal processing, feature extraction and recognition to complete matching.

(2) Collecting the face features of the user and constructing a face database:

the face feature acquisition and recognition in the embodiment of the application are shown in fig. 3 and are divided into five stages of face acquisition, image preprocessing, image feature extraction, classifier and face database construction.

Face acquisition: collecting a face image through a mobile terminal camera;

image preprocessing: carrying out graying processing on the acquired face image data, then carrying out filtering processing on the image to realize noise reduction, and then carrying out histogram equalization on the image to enhance the characteristics of the image.

Extracting the features of the image: describing features of the face image by using a Histogram of Oriented Gradient (HOG), dividing the face image into 3 multiplied by 3 subblocks, extracting HOG features of each subblock, and then performing dimensionality reduction on high-dimensional HOG features by using a Principal Component Analysis (PCA);

a classifier: selecting a face database CAS-PEAL, using the image features extracted in the previous step as classification bases, and constructing a face recognition classifier through a Support Vector Machine (SVM);

constructing a face recognition database: and constructing a face database through the face recognition classifier obtained in the last step, and finding user voiceprint information in the face database through image preprocessing, feature extraction and recognition by inputting the face image again to complete matching.

(3) And fusing the voiceprint features and the face features.

The face feature and voiceprint feature fusion of the embodiment of the application is as shown in fig. 4, and the voiceprint feature and the face feature are subjected to feature level fusion and divided into three parts, namely fusion feature vector construction, weight setting for each dimension and identification.

Constructing a fusion feature vector: with F = (F) ₁ ,f ₂ ,f ₃ ,…,f _m ) A feature vector representing a face, wherein f ₁ ,f ₂ ,f ₃ ,…,f _m Representing feature values of each feature describing a face. With V = (V) ₁ ,v ₂ ,v ₃ ,…,v _n ) Feature vector representing voiceprint, where v ₁ ,v ₂ ,v ₃ ,…,v _n Representing feature values for each feature describing the voiceprint. After dimension normalization, F and V are fused into a new feature vector S = (S) ₁ ,s ₂ ,s ₃ ,…,s _m+n ) Wherein s is ₁ ,s ₂ ,s ₃ ,…,s _m+n A feature value representing each feature describing a fused feature of a face and a voiceprint;

determining a weight: selecting two groups of fused feature vectors S ₁ ＝(s ₁₁ ,s ₁₂ ,s ₁₃ ,…,s _1(m+n) )、S ₂ ＝(s ₂₁ ,s ₂₂ ,s ₂₃ ,…,s _2(m+n) ) Wherein s is ₁₁ ,s ₁₂ ,s ₁₃ ,…,s _1(m+n) A feature value, s, representing each feature of a first set of fused features describing a face and a voiceprint ₂₁ ,s ₂₂ ,s ₂₃ ,…,s _2(m+n A feature value representing each feature of the second set of fused features describing the face and the voiceprint. The average distance between S1 and S2 is calculated according to the following formula:

calculating the weight of the feature vector of the fusion feature of each face and the voiceprint according to the following formula:

obtaining a weight matrix W = (W) ₁ ,w ₂ ,w ₃ ,…,w _m+n ) Repeating the above calculation process, obtaining a plurality of different weight matrixes after a plurality of calculations, and then determining the average value of the weight matrixes as a final weight;

identification: and (3) obtaining a fused feature vector by inputting face and voice information, obtaining a weight matrix of the feature vector through the step (3), enabling each element of the weight matrix to represent a feature weight, performing addition operation on the feature weights to finally obtain a numerical value, and obtaining an identification result of the feature vector according to a numerical value range where the numerical value is located, wherein for example, eighty percent is matched, the identification is considered to be successful.

Specifically, for example, the resulting fused feature vector value is S = (0.8, 0.7, 0.5), the weight matrix of each feature is W = (0.5, 0.4, 0.1), and the feature weights are summed to calculate the degree of matching P =0.8 x 0.5+0.7 x 0.4+0.5 x 0.1=0.73. If the setting standard is that P is greater than or equal to 0.8, the matching is successful, then the identification result is 0.73 and less than 0.8, and the identification is unsuccessful, namely the identification is failed.

(4) And constructing a user identity registration system and an identity login system:

the user identity registration and identity login system is shown in fig. 5, and the embodiment of the application divides the construction of the user identity registration system and the identity login system into two parts, namely training of voiceprint and human face characteristics and identity login verification.

Training the characteristics of voiceprints and faces: collecting voiceprint and face information of a user family member, transmitting the information to a local server database of a family by taking a person as a unit, extracting the characteristics of the voiceprint and the face, training in a local server after characteristic fusion, and storing a training result to a local hard disk after training is finished;

identity login verification: a user logs in at a mobile terminal, the mobile terminal inputs the information of the face and the voiceprint of the user login into a home local server, the personal face and the voiceprint data in the hard disk of the local server are called for matching, if the matching is successful, the user login is successful, otherwise, the login is failed;

in the embodiment of the application, in order to realize information registration of a user, when the user wants to remotely control home equipment, the user needs to log in a mobile terminal App first, and the mobile terminal App needs to perform synchronous verification of voiceprints and human faces during login, so that the reliability of the login identity of the user is ensured.

(5) Voice signal remote transmission and voice command analysis system:

in the embodiment of the present application, a frame for voice signal remote transmission and voice instruction parsing is shown in fig. 6 and is divided into three parts, namely, voice signal remote transmission, voice signal data parsing and identity verification, and voice instruction transmission.

Voice signal remote transmission: inputting a voice signal at a mobile terminal, and transmitting the voice signal to a local server;

voice signal data parsing and identity verification: after a voice signal is obtained, voice data are divided into voiceprint data and voice instruction data, the voiceprint data are firstly transmitted to a local server for voiceprint database verification, if voiceprints exist in the local database, the voiceprints pass the verification, otherwise, the voiceprints fail the verification;

and voice instruction transmission: after the voiceprint information is verified, the voice command is transmitted to the related intelligent home equipment through the local server, and the voice remote control is achieved.

In the embodiment of the application, after the user logs in successfully, each instruction of the user is further verified, the condition that the sender of each instruction of the user is the registrant of the family member of the user is ensured, and the condition that each instruction of the user remotely controls the smart home is safe and reliable is ensured.

(6) The frame design of the remote voice control intelligent household equipment is as follows:

the remote voice control intelligent home equipment frame designed in the embodiment of the application is shown in fig. 7, wherein a dotted line frame represents an intelligent household appliance and a sound box device serving as a speaker for the intelligent household appliance, for example, a sound box 1 serves as a speaker of an indoor environment acquisition module, and the indoor environment acquisition module includes a humidifier, an air purifier and the like; the sound box 2 is used as a loudspeaker of a household equipment module, and the household equipment module comprises an air conditioner, a refrigerator, a television and the like; the loudspeaker box 3 is used as a loudspeaker of a basic module, and the basic module comprises a lamp, a curtain and the like. The security module in fig. 7 is only used to show the integrity of the home device, and is not necessarily in the range of remote control.

According to the remote voice control intelligent household equipment framework designed by the embodiment of the application, a user inputs voice information to the local server at the terminal, the local server processes the information which can be processed in an off-line mode locally, and the information which cannot be processed locally needs a cloud terminal to call related services.

The services that need to be acquired in the cloud include, for example:

internet resources such as songs, voices, etc.;

in speech recognition, the computing power of the local server is insufficient, and the computing power of a cloud server needs to be utilized, such as sound cloning, sound synthesis and the like.

The local server can realize two-way communication, a voice module in the local server transmits a processing result of a voice signal input by a user (for example, the user inputs 'opening a curtain', the local server executes operation of opening the curtain, and then transmits a result of whether the curtain is opened to the user) to a coordinator and a voice box module through a control module in the local server respectively (the voice box module can be independently used as an intelligent voice box to realize basic intelligent voice box operation and can also transmit information of remotely controlling household appliances by the user), and the coordinator is responsible for opening a power switch of related equipment of the intelligent home to realize the power-on of the equipment (if the equipment is powered on, the operation is not needed); the audio amplifier module adopts distributed audio amplifier (dispose in a plurality of rooms, can realize communication each other), the audio amplifier module receives relevant instruction, carry out instruction classification transmission (for example, "open the (window) curtain", contain "the (window) curtain" according to the key word of the tame electric installation that relevant instruction contains, the instruction will be divided to basic module), the audio amplifier that the transmission is close to for corresponding equipment, the audio instruction of different grade type sends different audio amplifiers to and plays promptly, audio amplifier 1 this moment, audio amplifier 2, audio amplifier 3 acts as the function of speaker this time, the relevant audio instruction of speaker audio amplifier broadcast, thereby the audio module of relevant tame electric installation has received the instruction and has realized the control of pronunciation.

When the voice module in the local server detects that the voice instruction sent by the user terminal is not the instruction of the intelligent home equipment, the voice instruction is fed back to the user terminal, and the user terminal is prompted that the instruction fails.

In the embodiment of the application, according to the type of the received instruction, the remote voice control smart home is divided into three modules:

a basic module: the system comprises a lamp and a curtain, only relevant opening and closing instructions are received, other instructions (except the opening and closing instructions) are returned to a local server, and the local server feeds back to a user to prompt the user that the instructions fail; the other instructions, namely the instructions which cannot be identified by the basic module, belong to illegal instructions; the other instructions return to the local server, and the instructions which can not be executed by the basic module return a signal which can not be executed and the signal returns to the local server;

indoor environment collection module: the system comprises a humidifier and an air purifier, wherein the humidifier and the air purifier receive an on-off instruction and a mode instruction, and return to a local server when other instructions appear and feed back to a user; the other instructions are the instructions which cannot be identified by the indoor environment acquisition module; the other instructions return to the local server, namely, the instructions which cannot be executed by the indoor environment acquisition module return a signal which cannot be executed and the signal returns to the local server;

a household equipment module: the system comprises an air conditioner, a television, a refrigerator and the like, and relates to other complex instructions besides instructions such as on, off, modes and the like, if the relevant instructions do not exist, the instructions are also fed back to a user through a local server; the related instruction does not exist, and the related instruction is fed back to the user through the local server, namely, the instruction which cannot be executed by the home equipment module returns a signal which cannot be executed, and the signal is returned to the local server and fed back to the user through the local server.

The above classification is only an example, and the technical solution provided in the embodiment of the present application is not limited to this classification manner.

To sum up, in order to solve the problem that the local server cannot process the related voice instruction, the local server protects the privacy of the user to a certain extent, and transmits part of the instruction to the cloud and can ensure the realization of the related instruction, for example, when the user needs to clone voice, the training process of a voice clone model with high calculation force on the server is realized by the cloud, and the voice model obtained by the cloud through the voice clone training is put into the local server;

the embodiment of the application also solves the problem of power-on starting of the related household electrical appliance equipment, and the related point contact switch is turned on through the coordinator to connect the related equipment with a power supply;

according to the embodiment of the application, the distributed sound boxes are used as the loudspeakers, so that the problem that household equipment is distributed in different rooms to receive instructions is solved; the intelligent household equipment is classified, and the user error instruction can be conveniently fed back to the user in time.

Referring to fig. 8, on the local server side, a remote voice control method provided in the embodiment of the present application includes:

s101, when a user successfully logs in a local server through a terminal, receiving a user voice command sent by the terminal;

and S102, performing voiceprint verification on the voice command, and when the voiceprint verification is passed, realizing remote control on the intelligent household equipment through the voice command.

In the embodiment of the application, after the user successfully logs in the local server through the terminal, voiceprint verification can be further performed on each voice command, and subsequent operation can be executed only after the voiceprint verification is passed, so that the safety of remote voice control is improved.

Optionally, the method further comprises:

Optionally, the fusing the voiceprint features and the face features, and determining whether the user can log in a local server based on a fusion result, specifically including:

constructing a fusion feature vector based on the vector of the face feature and the vector of the voiceprint feature; for example, with F = (F) ₁ ,f ₂ ,f ₃ ,…,f _m ) And V = (V) ₁ ,v ₂ ,v ₃ ,…,v _n ) Obtaining the above S = (S) ₁ ,s ₂ ,s ₃ ,…,s _m+n )；

Determining a weight matrix of the fusion feature vector; for example, W = (W) as described above ₁ ,w ₂ ,w ₃ ,…,w _m+n )；

And performing summation operation on each feature weight in the weight matrix to finally obtain a numerical value, and if the numerical value belongs to a preset numerical value range, determining that the user successfully logs in the local server through the terminal. For example, the obtained fused feature vector value is S = (0.8, 0.7, 0.5), the obtained weight matrix of each feature is W = (0.5, 0.4, 0.1), and the feature weights are summed to obtain a numerical value, i.e., the matching degree P =0.8 x 0.5+0.7 x 0.4 x 0.5 x 0.1=0.73 is calculated. If the setting standard is that P is greater than or equal to 0.8, the matching is successful, then the identification result is 0.73 and less than 0.8, and the identification is unsuccessful, namely the identification is failed.

Optionally, the voice instruction is used to implement remote control on the smart home device, which specifically includes:

and respectively transmitting the voice instruction to a coordinator and a sound box module, controlling a power switch of the intelligent household equipment through the coordinator, and performing instruction classification transmission through the sound box module according to a keyword of the intelligent household equipment contained in the voice instruction, so that the voice instruction is transmitted to a sound box which is correspondingly arranged for the intelligent household equipment corresponding to the keyword, and the voice instruction is played through the sound box and is given to the intelligent household equipment corresponding to the keyword.

Optionally, the method further comprises:

For example, if a signal that the basic module cannot execute the instruction return is received, an error instruction prompt message is fed back to the user terminal.

Optionally, when the verification passes, the method further comprises:

For example, the local server processes information that can be processed offline locally, and for information that cannot be processed locally, the local server needs to call the cloud server to retrieve related services, which includes:

obtaining internet resources such as songs, voices and the like from a cloud server;

the voice recognition is performed by the cloud server, and the local server is not powerful enough, and processing such as voice cloning, voice synthesis, and the like needs to be performed by using the power of the cloud server.

Referring to fig. 9, an embodiment of the present application provides a remote voice control apparatus, including:

a memory 520 for storing program instructions;

a processor 500 for calling the program instructions stored in the memory, and executing, according to the obtained program:

Optionally, the processor 500 is further configured to call the program instructions stored in the memory, and execute, according to the obtained program:

receiving a face image and voice information collected by a user through a terminal;

determining a weight matrix of the fusion feature vector;

Optionally, when the check is passed, the processor 500 is further configured to call the program instructions stored in the memory, and execute, according to the obtained program:

A transceiver 510 for receiving and transmitting data under the control of the processor 500.

Wherein in fig. 9 the bus architecture may comprise any number of interconnected buses and bridges, in particular one or more processors, represented by the processor 500, and various circuits, represented by the memory 520, linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 510 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 500 is responsible for managing the bus architecture and general processing, and the memory 520 may store data used by the processor 500 in performing operations.

The processor 500 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Complex Programmable Logic Device (CPLD).

A remote voice control system provided in an embodiment of the present application may refer to fig. 7 (but is not limited to the structure shown in fig. 7), and includes the remote voice control device (i.e., a local server), a coordinator and a speaker module respectively connected to the remote voice control device, and at least one speaker connected to the speaker module; wherein the content of the first and second substances,

the coordinator is used for receiving the voice command sent by the remote voice control device and controlling a power switch of the intelligent household equipment based on the voice command;

Optionally, the system further includes an intelligent home device corresponding to each sound box, for example, an indoor environment acquisition module corresponding to the sound box 1, a home device module corresponding to the sound box 2, and a base module corresponding to the sound box 3 in fig. 7.

It should be noted that, each sound box and the corresponding smart home device may not have a connection relationship, and only serve as a speaker, that is, play a voice instruction to the corresponding smart home device, and the smart home device executes a corresponding operation after receiving the voice instruction.

Referring to fig. 10, another remote voice control apparatus provided in an embodiment of the present application includes:

the first unit 11 is used for receiving a user voice instruction sent by a terminal when a user successfully logs in a local server through the terminal;

and the second unit 12 is used for performing voiceprint verification on the voice command, and when the voiceprint verification is passed, the remote control of the intelligent household equipment is realized through the voice command.

Optionally, the first unit 11 is further configured to:

determining a weight matrix of the fusion feature vector;

Optionally, the second unit 12 is further configured to:

Optionally, when the verification passes, the first unit 11 is further configured to:

It should be noted that, in the embodiment of the present application, the division of the unit is schematic, and is only one logic function division, and when the actual implementation is realized, another division manner may be provided. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present application provides a computing device, which may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The computing device may include a Central Processing Unit (CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a Display device, such as a Liquid Crystal Display (LCD), cathode Ray Tube (CRT), etc.

The memory may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiments of the present application, the memory may be used for storing a program of any one of the methods provided by the embodiments of the present application.

The processor is used for executing any one of the methods provided by the embodiment of the application according to the obtained program instructions by calling the program instructions stored in the memory.

Embodiments of the present application provide a computer storage medium for storing computer program instructions for an apparatus provided in the embodiments of the present application, which includes a program for executing any one of the methods provided in the embodiments of the present application.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs)), etc.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A remote voice control method, the method comprising:

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the fusing the voiceprint features and the face features, and determining whether the user can log in a local server based on a fusion result includes:

determining a weight matrix of the fusion feature vector;

4. The method according to claim 1, wherein the remote control of the smart home device is realized through the voice instruction, and specifically comprises:

5. The method of claim 4, further comprising:

and when the voice instruction cannot be executed by the corresponding intelligent household equipment, feeding back error instruction prompt information to the user terminal.

6. The method of claim 1, wherein when the verification passes, the method further comprises:

7. A remote voice control apparatus, comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory for executing the method of any of claims 1 to 6 in accordance with the obtained program.

8. A remote voice control system comprising the apparatus of claim 7, and a coordinator and a speaker module respectively connected to the apparatus, and at least one speaker connected to the speaker module; wherein the content of the first and second substances,

the coordinator is used for receiving the voice instruction sent by the device and controlling the power switch of the intelligent household equipment based on the voice instruction;

the sound box module is used for determining keywords of the intelligent home equipment contained in the voice instruction sent by the device, sending the voice instruction to a sound box correspondingly arranged on the intelligent home equipment corresponding to the keywords, and playing the voice instruction to the intelligent home equipment corresponding to the keywords through the sound box.

9. The system according to claim 8, further comprising smart home devices corresponding to each of the sound boxes.

10. A computer storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 6.