CN115701562A

CN115701562A - Equipment control method and server

Info

Publication number: CN115701562A
Application number: CN202110882844.3A
Authority: CN
Inventors: 汪瀛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2023-02-10

Abstract

The embodiment of the application provides a device control method and a server. The method can preset a first number of secret languages, can determine the intention expressed by the first voice, namely the target field, according to the first voice after the first voice containing a second number of secret languages is obtained through pickup equipment, and can match the second number of secret languages with a pre-configured secret language library to obtain a target control instruction and output the target control instruction to further control corresponding equipment when the target field is the home field. From this, control intelligent house through the secret words to make and protect user's privacy at the speech control in-process, and then promoted intelligent house control's security.

Description

Equipment control method and server

Technical Field

The present application relates to the field of information processing, and in particular, to an apparatus control method and a server.

Background

With the application and development of intelligent voice in the internet of things (IoT) home field, a user can control the intelligent home device through a voice command. Although the intelligent home is controlled through voice, great convenience is brought to user interaction experience, privacy information of a user is easily revealed in the voice control process at present, and the safety of the voice-controlled intelligent home is low; in addition, it does not provide entertainment and playability to the user.

Disclosure of Invention

In order to achieve the technical purpose, the application provides an equipment control method, an electronic device, a computer-readable storage medium and a computer program product, which can protect user privacy and improve security of smart home control in a voice control process.

In a first aspect, the present application provides an apparatus control method applied to a server. The method can comprise the following steps: acquiring first voice sent by a user through at least one pickup device, and processing the first voice to obtain m scores of m fields, wherein the m fields are fields corresponding to intentions expressed by the first voice, the m fields comprise household fields, and m is a positive integer greater than or equal to 1; according to the first voice, correcting the score of the home field; determining a target field according to the modified score of the home field and the scores of the m-1 fields except the home field; when the target field is the home field, obtaining a target control instruction according to the first voice and a pre-configured secret language library; and outputting the target control instruction. Like this, control the intelligent house through the whisper to make and protect user's privacy at the speech control in-process, and then promoted the security of intelligent house control.

According to a first aspect, according to a first voice, a score of a home domain is corrected, specifically including: respectively matching p words contained in the first voice with the dark words in the dark word library to obtain the number of the dark words contained in the first voice, wherein p is a positive integer greater than or equal to 1; and correcting the score of the home field according to the number of the dark words contained in the first voice. Therefore, the scores of the corresponding skills can be determined by matching the words contained in the target voice with the hot words in the hot word library, and whether the intention of the user is the household skill can be determined based on the scores of the skills.

According to the first aspect, or according to any one implementation manner of the first aspect, determining the target field according to the modified score of the home domain and the scores of m-1 domains other than the home domain specifically includes: and selecting the field with the highest scoring result as the target field. Therefore, the scores of the household skills are corrected according to the number of the dark words, so that the expression of the dark words is more easily matched with the household skills, and the failure of control of the dark words is avoided.

According to the first aspect, or according to any one implementation manner of the first aspect, obtaining the target control instruction according to the first voice and a pre-configured dark language library specifically includes: determining j secret languages contained in the first voice, wherein j is a positive integer greater than or equal to 1; and matching the j dark languages with a dark language library to obtain a target control instruction, wherein the dark language library comprises a corresponding relation between the dark languages and target data, and the target data comprises one or more of equipment, functions, services and scenes.

According to the first aspect, or according to any implementation manner of the first aspect, the matching j secret languages with the secret language library to obtain the target control instruction specifically includes: matching words corresponding to j secret languages to home skill slot positions corresponding to the home field according to the corresponding relation between the secret languages in the secret language library and the target data to obtain home skill slot position information; and obtaining a target control instruction according to the home skill slot position information.

According to the first aspect, or according to any implementation manner of the first aspect, before the first voice uttered by the user is acquired through at least one sound pickup apparatus, the method further includes: acquiring n secret languages configured by a user through electronic equipment, wherein n is a positive integer greater than or equal to 1; detecting the legality of the n dark phrases, and storing the legal dark phrases into a dark phrase library; and outputting k secret languages when k secret languages are illegal in the n secret languages. Thus, the corresponding control instruction can be obtained by matching the secret words to the corresponding slot positions of the home skills.

According to the first aspect, or according to any one implementation manner of the first aspect, detecting the legitimacy of an ith bilingual in n bilingual speeches specifically includes: detecting whether the ith dark phrase meets a preset condition, wherein the preset condition is one or more of the following conditions: the dark language is multi-syllable, different from the awakening word or different from the hot word in the preset hot word library; if the ith lingo meets the preset condition, the ith lingo is legal; and if the ith secret phrase does not meet the preset condition, the ith secret phrase is illegal. In this way, n secret languages are configured in advance, the legality of each secret language is detected, so that the legal secret language is determined, and when the illegal secret language exists, the user is prompted, so that the user can reset the secret language.

According to the first aspect, or according to any implementation manner of the first aspect, the first voice includes an identity of a controlled device and at least one secret language.

According to the first aspect, or according to any implementation manner of the first aspect, the identity of the controlled device is expressed by a secret language.

In a second aspect, the present application provides a server. The server comprises at least one memory for storing programs; at least one processor for executing the programs stored in the memory. The memory stores a program that, when executed, is adapted to perform the method of the first aspect or any of the ways of the first aspect.

In a third aspect, the present application provides a computer-readable storage medium. The computer readable storage medium stores a computer program that, when run on an electronic device, causes the electronic device to perform the method of the first aspect or any of the ways of the first aspect.

In a fourth aspect, the present application provides a computer program product. The computer program product, when run on an electronic device, causes the electronic device to perform the first aspect or the method of any of the ways of the first aspect.

Any one of the implementation manners of the second aspect and the second aspect, any one of the implementation manners of the third aspect and the third aspect, and any one of the implementation manners of the fourth aspect and the fourth aspect correspond to any one of the implementation manners of the first aspect and the first aspect, respectively. For technical effects corresponding to any one implementation manner of the second aspect and the second aspect, reference may be made to the technical effects corresponding to any one implementation manner of the first aspect and the first aspect, and details are not repeated here.

Drawings

FIG. 1 is a schematic diagram of an intention and a slot position relationship provided by an embodiment of the present application;

FIG. 2 is a schematic view of a scenario provided by an embodiment of the present application;

fig. 3 is a schematic diagram of a hardware structure of a sound pickup apparatus according to an embodiment of the present application;

fig. 4 is a schematic hardware structure diagram of a server according to an embodiment of the present application;

fig. 5 is a schematic flowchart of an apparatus control method according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a whisper configuration step in the device control method according to the embodiment of the present application;

fig. 7 is a schematic diagram of an interface change of an electronic device in a whisper configuration process according to an embodiment of the present application;

fig. 8 is a schematic flowchart illustrating a step of identifying and responding to a collected audio signal containing a whisper in the device control method according to the embodiment of the present application;

fig. 9 is a schematic view of an application process in a scenario provided by an embodiment of the present application;

fig. 10 is a schematic view of an application process in another scenario provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe the association relationship of the associated objects, and means that there may be three relationships; for example, a and/or B, may represent: a exists singly, A and B exist simultaneously, and B exists singly, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless otherwise noted.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.

For the sake of understanding, the related terms and related concepts related to the embodiments of the present application will be described below.

(1) Intention and slot position

Intent refers to the electronic device identifying what the user's actual or potential needs are. Fundamentally, intent recognition is a classifier that classifies user needs into a certain type; alternatively, the intent recognition is a ranker that ranks the set of potential needs of the user by likelihood.

Together, the intent and slot constitute a "user action," and the electronic device cannot directly understand the natural language, so the intent recognition serves to map the natural language or operation into a machine-understandable structured semantic representation.

As the name implies, the idea recognition (SUC) classifies natural language conversations input by users into categories, and the categories of the classification correspond to user intentions. For example, "how is the weather today," which is intended to be "ask for weather. Naturally, intent recognition can be seen as a typical classification problem. For example, the intended classification and definition may refer to the ISO-24617-2 standard, where there are 56 detailed definitions. The definition of the intention has a great relationship with the location of the system itself and the knowledge base it has, i.e. the definition of the intention has a very strong domain relevance. It is to be understood that in the examples of the present application, the intended classification and definition is not limited to the ISO-24617-2 standard.

Slot position, i.e. the parameter with which the intent is taken. An intent may correspond to several slot positions. For example, when inquiring about a bus route, necessary parameters such as departure place, destination, time, and the like need to be given. The above parameters are slots corresponding to the intention of "inquiring bus route".

For example, the main goal of the semantic slot filling task is to extract the values of predefined semantic slots in a semantic frame (semantic frame) from an input sentence on the premise that the semantic frame is known for a specific domain or a specific intent. The semantic slot filling task can be converted into a sequence labeling task, namely, a classical IOB labeling method is used for labeling a word as the beginning (begin), the continuation (inside) or the non-semantic slot (outside) of a certain semantic slot.

To make a system work properly, the intent and slot are first designed. The intent and slot position allow the system to know which particular task should be performed and to give the type of parameters needed to perform the task.

Taking a specific requirement of 'inquiring weather' as an example, the design of intentions and slots in a task-oriented dialog system is introduced:

an example of user input: "how much the weather is today in the Shanghai";

the user intent defines: ask Weather, ask Weather;

slot definition: a first slot position: time, date; a second slot position: location, location.

Illustratively, as shown in FIG. 1, in this example, two necessary slots are defined for the "ask for weather" task, which are "time" and "place", respectively.

(2) Intent recognition and slot filling

After the intent and slot position are defined, the user intent and the slot value corresponding to the corresponding slot can be identified from the user input.

The goal of intent recognition is to recognize user intent from input, and a single task can be modeled simply as a two-class question, such as a "ask weather" intent, which can be modeled as a "ask weather" or "not ask weather" two-class question at the time of intent recognition. When it comes to requiring a system to handle multiple tasks, the system needs to be able to discriminate between the various intents, in which case the two-classification problem translates into a multiple-classification problem.

The task of slot filling is to extract information from the data and fill in slots defined in advance, for example, in fig. 1, intentions and corresponding slots have been defined, and for the user input "what is the weather today and shanghai" the system should be able to extract and fill "today" and "shanghai" slots to "time" and "location" slots, respectively.

For illustrative purposes, the sound pickup apparatus 100 is taken as a smart sound box, the controlled apparatus 200 is taken as a smart door lock, and the smart sound box has a sound pickup, and an application scenario of the present application is explained with reference to fig. 2. It should be noted that the sound pickup apparatus 100 in the present application is not limited to the smart speaker, and other sound pickup apparatuses having a sound pickup are within the scope of the sound pickup apparatus in the present application. In addition, the controlled device 200 in the present application is not limited to the smart door lock, and other controlled devices that can be controlled are within the scope of the controlled device in the present application.

Exemplarily, fig. 2 is a schematic view of a scenario provided in an embodiment of the present application. As shown in fig. 2, the sound pickup apparatus 100 is provided with sound pickups 11, and the number of the sound pickups 11 may be any number, which is not limited herein. The sound pickup inlet of the sound pickup 11 may be located on the upper surface of the sound pickup apparatus 100. Alternatively, the sound pickup inlet of the sound pickup 11 may be located on another surface of the sound pickup apparatus 100, which is not limited herein.

Illustratively, the user may control the controlled device 200 through voice. When the user performs voice control on the controlled apparatus 200, the sound pickup apparatus 100 may pick up an audio signal including a control instruction through the sound pickup 11. Thereafter, the sound pickup apparatus 100 may transmit an audio signal containing the control instruction to the server 300. After analyzing the sound signal containing the control command, the server 300 may determine the control command for the controlled apparatus 200 and send the control command to the controlled apparatus 200. Thereafter, the controlled device 200 may perform corresponding operations (e.g., unlocking, locking, etc.) according to the control instructions.

Alternatively, after the sound pickup apparatus 100 analyzes and processes the sound signal containing the control command, the control command for the controlled apparatus 200 may be determined, and the control command may be directly sent to the controlled apparatus 200. After that, the controlled device 200 can perform corresponding operations (e.g., unlocking, locking, etc.) according to the control instruction.

For example, in the process of performing voice control on the controlled device 200 by the user, if the interaction process between the user and the controlled device 200 is known by a criminal, the criminal may enter the home to steal after knowing that the user leaves the home, thereby causing a great potential safety hazard to the user. In addition, when a user encounters an illegal intrusion at home, if the user needs the intelligent device at home to execute an alarm command, the user needs to accurately send out a voice containing a vivid alarm word (for example, xx please help me to alarm), and this inevitably irritates criminals, thereby causing great potential safety hazards to the user.

In order to reduce security risks caused by an interaction process between a user and intelligent equipment and guarantee privacy security of the user, the application provides an equipment control method.

Fig. 3 is a schematic diagram illustrating a hardware structure of a sound pickup apparatus 100 according to an embodiment of the present application. Illustratively, as shown in fig. 3, the sound pickup apparatus 100 may be an intelligent home apparatus such as a smart speaker, a smart television, a smart air conditioner, a smart refrigerator, a smart lamp, a smart door, a smart lock, a smart curtain, a wearable electronic apparatus such as smart glasses, a smart watch, a smart bracelet, a smart phone, a tablet computer, a laptop computer, a Personal Digital Assistant (PDA), an in-vehicle apparatus, a virtual reality apparatus, an augmented reality apparatus, and the like. This is not limited by the present application.

As shown in fig. 3, the sound pickup apparatus 100 may include a processor 110, a memory 120, an audio module 130, a sound pickup 140, and a communication module 150.

The processor 110 may be a general-purpose processor or a special-purpose processor, among others. For example, the processor 110 may include a Central Processing Unit (CPU) and/or a baseband processor. The baseband processor may be configured to process communication data, and the CPU may be configured to implement corresponding control and processing functions, execute a software program, and process data of the software program.

Illustratively, the processor 110 may include one or more processing units. For example, the processor 110 may include one or more of an Application Processor (AP), a modem (modem), a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), among others. In some embodiments, the sound pickup apparatus 100 may include one or more processors 110. The different processing units may be separate devices or may be integrated into one or more processors. Illustratively, the processor 110 may comprise an application processor AP and a digital signal processor DSP. The data output by the audio module 130 may be first transmitted to the DSP for processing, and then transmitted to the AP for processing. Thus, the data is preprocessed by the digital signal processor DSP, for example, noise reduction, etc., to increase the processing speed.

The memory 120 may store a program that may be executed by the processor 110. The memory 120 may also store data. The processor 110 may read data stored in the memory 120. The memory 120 and the processor 110 may be separately provided. Optionally, the memory 120 may also be integrated in the processor 110.

The microphone 140, which may also be referred to as a "microphone," is used to convert sound signals into electrical signals. The sound pickup apparatus 100 may include one or more sound pickups 140. The microphone 140 may capture sounds in the environment in which the sound pickup apparatus 100 is located. Alternatively, when the sound pickup apparatus 100 includes a plurality of sound pickups 140, a part of the plurality of sound pickups 140 is a microphone, and another part is a microphone array. Alternatively, the plurality of microphones 140 may be microphones or microphone arrays. Alternatively, the pickup inlets of the plurality of pickups 140 may be located on the upper, lower, or side surfaces of the pickup apparatus 100. Alternatively, a portion of the pickup inlets of the plurality of pickups 24 may be located on the upper, lower, or side surfaces of the pickup apparatus 100, and another portion may be located on another surface of the pickup apparatus 100.

The audio module 130 is configured to sample the analog sound signal collected by the sound collector 140 according to a preset sampling frequency (which may be set by the processor, or may be set at factory or default), convert the sampled analog sound signal into a digital audio signal, and input the digital audio signal to the processor 110. Alternatively, the audio module 130 may convert the digital audio signal into an analog sound signal and output the analog sound signal to the speaker 150. The audio module 130 is also used to encode and decode an audio signal. In some examples, the audio module 130 may be integrated in the processor 110, or part of the functionality of the audio module 130 may be integrated in the processor 110. In some examples, the audio module 130 may include an analog to digital converter (ADC), i.e., an a/D converter.

The communication module 150 may include at least one of a mobile communication module and a wireless communication module. Wherein, when the communication module 150 comprises a mobile communication module, the communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the electronic device. For example, global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (TD-SCDMA), long Term Evolution (LTE), new radio interface (NR), etc.

The communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The communication module 150 may receive electromagnetic waves through at least one antenna, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem for demodulation. The communication module 150 may also amplify the signal modulated by the modem, and convert the signal into electromagnetic wave through the antenna to radiate the electromagnetic wave. In some examples, at least some of the functional modules of the communication module 150 may be disposed in the processor 110. In some examples, at least some of the functional modules of the communication module 150 may be disposed in the same device as at least some of the modules of the processor 110. When the communication module 150 includes a wireless communication module, the communication module 150 may provide a solution for wireless communication applied to the sound pickup apparatus 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), short-range wireless communication technology (NFC), infrared technology (infrared, IR), and the like. The communication module 150 may be one or more devices integrating at least one communication processing module. The communication module 150 receives electromagnetic waves via an antenna, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The communication module 150 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves via the antenna to radiate it. Illustratively, the communication module 150 may transmit the digital audio signal converted by the audio module 130 to the server 300.

Alternatively, the sound pickup apparatus 100 may communicate with an external apparatus through various interfaces. For example, the communication with the external device is performed through a Universal Serial Bus (USB) interface, an ethernet interface, a firewire interface, or the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the sound pickup apparatus 100. In other embodiments of the present application, the sound pickup apparatus 100 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. In some embodiments, the hardware structure of the controlled device 200 may have the same hardware structure as the sound pickup device 100, and may also have more or fewer components than the sound pickup device 100, which is not limited herein.

Illustratively, fig. 4 shows a hardware structure diagram of a server 300 provided in an embodiment of the present application. As shown in fig. 4, the server 300 may include: a processor 310, a network interface 320, and a memory 330.

The processor 310 may be a general-purpose processor or a special-purpose processor, among others. For example, processor 210 may include a Central Processing Unit (CPU) and/or a baseband processor. The baseband processor may be configured to process communication data, and the CPU may be configured to implement corresponding control and processing functions, execute a software program, and process data of the software program. For example, the processor 310 may perform Speech To Text (STT), natural speech understanding (NLU), and determining the controlled devices to be controlled, the functions to be performed, and/or the actions to be performed, and generating control instructions, etc. on the audio signals received by the network interface 320.

The network interface 320 may optionally include a standard wired interface, a wireless interface (e.g., wi-Fi, mobile communication interface, etc.), and is controlled by the processor 310 to send and receive data, for example, to receive digital audio signals from the network, which are sent by the communication module 150 in the sound pickup apparatus 100, or to send control instructions to the controlled apparatus 200.

The memory 330 may store a program that can be executed by the processor 310 to cause the processor 310 to perform the methods provided herein. The memory 330 may also store data (e.g., a user preset corpus of phrases, a domain thesaurus, etc.). The processor 310 may read data stored in the memory 330. The memory 330 and the processor 310 may be separately provided. Optionally, the memory 330 may also be integrated in the processor 310.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the server 300. In other embodiments of the present application, the server 300 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The technical solution provided by the present application is described in detail below with reference to fig. 5. Exemplarily, fig. 5 shows a flowchart of a device control method provided in an embodiment of the present application. As shown in fig. 5, the method may include the steps of:

s1, a secret language configuration step.

In particular, the linger refers to a secret word agreed to interact with the smart device. The whisper configuration step is mainly used for configuring secret words interacting with the intelligent device. Illustratively, the secret word "sesame" may represent "smart door lock", the secret word "flower open" may represent "unlock", the secret word "flower fall" may represent "lock", the secret word "sesame flower open" represents "smart door lock unlock", and the secret word "sesame flower fall" represents "smart door lock locking".

It should be noted that S1 is not an essential step of the method. I.e. S1 is not performed every time the method is performed.

After the whisper configuration is completed, S2 may be performed.

S2, identifying and responding the collected audio signals containing the phrases.

Specifically, the step of identifying and responding the collected audio signal containing the dark language is used for detecting the control information corresponding to the collected audio signal containing the dark language and making a corresponding response after the user sends out the voice containing the dark language. Illustratively, the controlled device to be controlled, the function to be performed, and/or the action to be performed, etc. may be included in the control information. For example, after detecting the control information corresponding to the acquired audio signal containing the whispering, an operation instruction or the like may be sent to the controlled device.

The following describes in detail the following configuration procedure of the whispering phrases in the device control method provided in the present application, with reference to fig. 6. Exemplarily, fig. 6 is a schematic flowchart of a whisper configuration step in the device control method provided in the embodiment of the present application. In the process shown in fig. 6, the electronic device may have a display screen, and an intelligent home client may be installed on the electronic device, where the intelligent home client may be software running on the electronic device and a unified management platform of the intelligent device, and may implement interconnection and intercommunication between the intelligent devices; the smart home client can correspond to a server. Illustratively, the smart home client may be a smart living client. In addition, in the flow shown in fig. 6, the number of the phrases set by the user is n, where n is a positive integer greater than or equal to 1.

As shown in (a) of fig. 6, the whisper configuration step may include:

s11, the electronic equipment receives configuration operation of the user on the n secret languages.

Specifically, the user can start the smart home client on the electronic device, and configure the whisper in the smart home client. When the user configures the whisperes, the electronic device may receive a configuration operation of the user on n whisperes.

Illustratively, as shown in fig. 7 (a), the user may select a smart home application on the electronic device a. Thereafter, as shown in fig. 7 (B), a main interface of the smart home application may be displayed on the electronic device a. In the interface shown in fig. 7 (B), the user may select "my" below the main interface of the smart home application. Thereafter, as shown in fig. 7 (C), a "my" sub-page may be displayed on the electronic device a, in which the user may select a "set" key. Subsequently, an interface as shown in fig. 7 (D) may be displayed on the electronic device a. In the interface shown in fig. 7 (D), the user can select "the dark setting", i.e., can enter into the interface for the dark setting, i.e., the interface shown in fig. 7 (F). In the interface shown in fig. 7 (F), the user can set the whisper, and after the setting is completed, the user selects the "save" button in the area b to complete the whisper configuration operation. In the interface shown in fig. 7 (F), the user may select to delete a set phrase or add a new phrase. Further, in the interface shown in fig. 7 (F), the user can associate a whisper with a device, a function, an action, a scene, and the like.

For example, as shown in table 1: the lingo may be "sesame," the corresponding target device may be "smart desk lamp," the corresponding target function may be "empty" (i.e., not corresponding to any target function), the corresponding target action may be "empty" (i.e., not corresponding to any target action), and the corresponding target scene may be "empty" (i.e., not corresponding to any target scene). The lingo may be "blossoming," its corresponding target device may be "empty" (i.e., not corresponding to any target device), its corresponding target function may be "on/off," its corresponding target action may be "on," and its corresponding target scene may be "empty.

TABLE 1

Secret words	Target device	Target function	Target action	Object scene
					Sesame seed	Intelligent desk lamp	Air conditioner	Air conditioner	Air conditioner
Flower blossom	Air conditioner	Switch with a switch body	Is opened	Air conditioner
					Flower bed	Air conditioner	Switch with a switch body	Close off	Air conditioner
My mother	Security and protection equipment	Air conditioner	Air conditioner	Air conditioner
					Buying fruit	Air conditioner	Alarm device	Air conditioner	Air conditioner
Go out	Air conditioner	Air conditioner	Is opened	Air conditioner
					Go home	Air conditioner	Air conditioner	Close off	Air conditioner
Ou Ye	Air conditioner	Air conditioner	Air conditioner	Leaving home
					Eulala japonica	Air conditioner	Air conditioner	Air conditioner	Go home

S12, the electronic equipment sends n secret languages to the server.

Specifically, after receiving configuration operations of the user on the n lingoes, the electronic device may send the n lingoes to a server corresponding to the smart client. The server may be a cloud server or a local server.

S13, the server detects the legality of the n dark phrases and stores the legal dark phrases into a dark phrase library.

Specifically, after the server receives the n dark phrases sent by the electronic device, the server may detect the n dark phrases to determine whether the n dark phrases are legal, and store the legal dark phrases in the dark phrase library.

Illustratively, the lingo may have the following limitations: 1) Must be greater than or equal to 2 syllables; 2) Intelligent wake-up words cannot be used; 3) Hot words in a hot word bank of mainstream fields cannot be used as a linguist, wherein the mainstream fields can include the fields of home, music, vocal, conversation, alarm clock, encyclopedia and the like. Wherein, when the secret words are monosyllabic words, the phenomenon of mismatching is easy to occur, so the secret words are required to be more than or equal to 2 syllables; when the secret words are the intelligent awakening words, the secret words are easy to be confused with the awakening words, so that the phenomenon of mismatching occurs, and the secret words cannot be the same as the intelligent awakening words; in addition, since words in the mainstream domain hot thesaurus are frequently used, a phenomenon of mismatching is easy to occur, and therefore, the phrases cannot be words in the mainstream hot thesaurus.

After receiving the linguists, the server may detect each of the linguists based on the limiting conditions to determine whether each of the linguists is legal. For example, when the secret language is a monosyllable word such as "day", "ground" or "person", the secret language may be determined to be illegal; when the intelligent awakening word is 'little art', if the secret language is 'little art', the secret language can be determined to be illegal; when the hot words in the hot word library in the mainstream field are 'play' and 'listen to a song', if the lingering language is 'play', the lingering language can be determined to be illegal.

In one example, the server may also send the detected legal linguists to the local device, so that the local device stores the legal linguists in a linguist library on the local device; meanwhile, the server deletes the information related to the legal secret words on the server, so that the data security is improved. The local device may be a local server, or may be other devices, such as a sound pickup device.

S14, the server sends illegal information of k dark languages in the n dark languages to the electronic equipment, wherein k is more than or equal to 0 and less than or equal to n; and storing (n-k) legitimate lingoes.

Specifically, when the server detects that k dark phrases in the n dark phrases are illegal, the server may send information that the k dark phrases are illegal to the electronic device. In addition, the server may also apply (n-k) legitimate lingoes in addition to the k lingoes.

S15, outputting k illegal lingoes by the electronic equipment.

Specifically, after the electronic device receives the information that the k dark phrases are illegal, the electronic device can output the k dark phrases are illegal. Illustratively, the electronic device may present the k pieces of illegal information in the dark language to the user in a text form, and may also broadcast the k pieces of illegal information in the dark language in a voice manner, and so on.

Exemplarily, the specific step of S13 may be as shown in (B) of fig. 6. Specifically, S13 may include:

s131, whether the ith dark language in the n dark languages is monosyllabic or not is detected, and the initial value of i is 1.

Specifically, the server may detect a syllable of an ith dark language of the n dark languages through an Automatic Speech Recognition (ASR) technique, and then determine whether the ith dark language of the n dark languages is a monosyllable, where an initial value of i may be 1. If the ith lingo is not monosyllable, executing S132; otherwise, S135 is performed.

S132, detecting whether the ith dark language is a wakeup word.

Specifically, the server may compare the ith lingo with the wakeup word to detect whether the ith lingo is the wakeup word. If the ith lingo is not a wakeup word, executing S133; otherwise, S135 is executed.

S133, detecting whether the ith dark language is a hot word in a preset hot word bank.

Specifically, the server may compare the ith lingo with a hotword in a hotword bank of a threshold to detect whether the ith lingo is a hotword in a preset hotword bank. If the ith dark phrase is not a hot word in the preset hot word bank, executing S134; otherwise, S135 is executed.

S134, the ith linguist is legal; i = i +1, and determines whether the updated i is less than or equal to n.

Specifically, when the ith lingo satisfies the respective conditions in S131 to S133, the ith lingo is legal. At this time, i = i +1 may be updated to determine whether the next whisper is legal, and at the same time, to determine whether the updated i is less than or equal to n. If the updated i is less than or equal to n, returning to execute S131, otherwise, ending, and obtaining k illegal linguists and (n-k) legal linguists, wherein k is more than or equal to 0 and less than or equal to n.

S135, the ith plain is illegal; i = i +1, and determines whether the updated i is less than or equal to n.

Specifically, when the ith dark phrase does not satisfy any of the conditions of S131 to S133, the ith dark phrase is illegal. At this time, i = i +1 may be updated to determine whether the next whisper is legal, and at the same time, to determine whether the updated i is less than or equal to n. If the updated i is less than or equal to n, returning to execute S131, otherwise, ending, and obtaining k illegal linguists and (n-k) legal linguists, wherein k is more than or equal to 0 and less than or equal to n.

Note that, the execution order of S131, S132, and S133 shown in fig. 6 (B) may be arbitrarily selected, and is not limited herein. For example, S132, S131, and S133 are performed first; or, first execute S133, then execute S132, and finally execute S131; alternatively, S133 is performed first, then S131 is performed, and finally S132 is performed, and so on.

In the device control method provided by the present application, the steps of recognizing and responding to the collected audio signal containing the whispering are described in detail below with reference to fig. 8. Fig. 8 is a schematic flowchart illustrating a step of recognizing and responding to a collected audio signal containing a whisper in the device control method according to the embodiment of the present application. In the flow shown in fig. 8, the sound pickup apparatus may be the sound pickup apparatus 100 described above; the controlled device may be the controlled device 200 described above; the server may be the server 300 described above, wherein the server may be a server corresponding to the smart home client described above.

As shown in fig. 8 (a), the step of identifying and responding to the collected audio signal containing the whispering gallery may include:

s21, the sound pickup equipment receives first voice containing the dark language sent by the user.

In particular, the sound pickup apparatus may continuously or periodically pick up sounds in the environment. After the user utters the first voice containing the dark language, the sound pickup apparatus may receive the first voice containing the dark language uttered by the user, and convert the first voice from the analog sound signal to the digital sound signal.

It is understood that the first voice may include an identity of a controlled device that the user intends to control, which may be, but not limited to, indicated by a dark word. For example, with continued reference to table 1 above, the "sesame" may represent "smart desk lamp," and at this time "sesame" is the identification of "smart desk lamp.

And S22, the sound pickup equipment sends the first voice to the server.

Specifically, after receiving a first voice uttered by the user, the sound pickup apparatus may send the first voice to the server for analysis processing by the server. Illustratively, the first voice may be a digital sound signal.

And S23, the server identifies the first voice and determines a control instruction.

Specifically, after the server receives the first voice, the first voice can be recognized, the intention of the user and the secret words contained in the first voice are determined, and when the intention of the user is to control the smart home, the secret words are matched with home domain slots corresponding to the home domain, so that the corresponding control instruction is determined, and the controlled equipment controlled by the intention of the user is determined.

And S24, the server sends a control instruction to the controlled device.

Specifically, after determining the control instruction, the server may send the control instruction to the controlled device.

And S25, the controlled equipment executes the control instruction.

Specifically, after the controlled device receives the control instruction, the controlled device may execute the control instruction. For example, if the controlled device is an intelligent door lock, the intelligent door lock is opened if the control command is unlocking.

As an example of the domain, the specific step of S23 may be as shown in (B) of fig. 8. Specifically, S23 may include:

and S231, converting the first voice into a text sentence.

Specifically, after receiving the first voice, the server may perform a Speech To Text (STT) operation on the first voice, so as to convert the first voice into a text sentence.

S232, segmenting words of the converted character sentences to obtain p words, wherein p is larger than or equal to 1 and is a positive integer.

Specifically, after obtaining the text sentence corresponding to the first voice, the server may perform word segmentation on the text sentence through Natural Language Understanding (NLU), so as to divide the text sentence into p words.

And S233, performing dark language correlation matching on the p words to obtain words corresponding to j dark languages, wherein j is greater than or equal to 0 and is less than or equal to p.

Specifically, after the server obtains p words, the p words may be respectively matched with the dark phrases in the dark phrase library configured in advance by the user, so as to obtain words corresponding to j dark phrases.

Exemplarily, taking the user pre-configured secret language library as an example, where the secret language library may be table 1 above, when the first voice is "sesame blossoming", the word segmentation is performed on "sesame blossoming", and then two words, namely "sesame" and "blossoming", are obtained, and both the words are in table 1, so that both the words are secret languages, that is, words corresponding to 2 secret languages are obtained.

When the first voice is that the ' mom goes out to buy fruit ', the ' mom goes out to buy fruit ' is segmented, and then five words of ' me ', ' mom ', ' go out ', ' buy fruit ' and ' are obtained, and the ' mom ', ' go out ' and ' buy fruit ' can be known to be secret words from the table 1, so that 3 words corresponding to the secret words can be obtained at the moment.

S234, carrying out field hot word matching scoring on the p words to obtain scores of m fields including the home field, wherein m is larger than or equal to 1.

Specifically, the scores of the m fields including the home field can be obtained by respectively matching the p words with the field hotwords.

Illustratively, the scores of m domains including the home domain may be determined according to the number of domain hotwords matched by the p words. For example, when the p words are "nearby", "where" and "buy", and the corresponding domain hotwords of the shopping domain are "nearby", "where" and "buy", it may be determined that the shopping domain is scored as full (e.g., 10 points) and the other domains are scored as 0. Illustratively, the m domains may include: home, music, shopping, alarm or encyclopedia, etc.

And S235, correcting the score of the home field according to the size of j.

Specifically, after j words are obtained in S233, it can be known that j whisperes are included in the first voice, and at this time, the score of the home domain can be corrected according to the size of j and a preset home domain correction rule.

Illustratively, the preset home area correction rule may be:

a. matching a home secret word keyword: home domain score x 2;

b. matching two home secret words: home area rating x 5;

c. matching three or more home secret words: home domain score x 10.

If the score of the home furnishing domain obtained in S2324 is 0.2 and j is 2, the score of the home furnishing domain after correction is 0.2 × 5=1.

And S236, determining the field with the highest scoring result according to the scores of the m fields to obtain the target field.

Specifically, after the scores of the m fields and the modified scores of the home fields are obtained, the field with the highest score result can be screened out to obtain the target field. Illustratively, this step may be understood as a process of determining the user's intention, i.e., the target area as the user's intention.

And S237, judging whether the target field is the home field.

Specifically, after the target field is obtained, whether the target field is the home field or not can be judged. If not, executing S238, namely executing skills corresponding to the non-home field, such as shopping skills and the like; otherwise, executing the scene corresponding to the home field, namely executing S239. For example, if the obtained target field is an encyclopedia field, it may be determined that the target field is not a home field, and then S238 is executed; if the obtained target field is the home field, it may be determined that the target field is the home field, and at this time, S239 is executed.

And S238, executing the non-home skills.

Specifically, when the target field is determined not to be the home field, the skills corresponding to the non-home field are executed. For example, when the target domain is a shopping domain, a shopping skill corresponding to the shopping domain is performed.

And S239, matching the words corresponding to the j secret languages to the home skill slot positions according to a pre-configured secret language library to obtain home skill slot position information.

Specifically, after determining that the skill corresponding to the home field needs to be executed, words corresponding to j dark phrases can be matched to the home skill slot positions according to the pre-configured dark phrase library, so as to obtain home skill slot position information.

Illustratively, the user pre-configured whisper library may be table 1 above, and the slots of the home skills are: for example, when j terms are "sesame" and "blossom", the "sesame" can be matched with the "intelligent desk lamp", and the matching of the slot position of the target equipment in the household skill is completed; the 'blooming' can be matched with the 'switch', and the matching of target function slots in the household skill is completed; the 'blooming' can be matched with 'opening', and the matching of target action slots in the home skills is completed. At this time, the home skill slot position information is as follows: the target device is an intelligent desk lamp, the target function is a switch, and the target action is turning on. The process of matching the words corresponding to the j linguists to the home skill slot can be understood as a slot filling process.

And S2310, converting the obtained home skill slot position information into a control instruction.

Specifically, after the home skill slot information is obtained, the home skill slot information can be converted into a control instruction. For example, when the home skill slot information is "the target device is an intelligent desk lamp, the target function is a switch, and the target action is on", the control instruction may be "turn on the intelligent desk lamp".

It is understood that S233 and S234 may be performed simultaneously or in a time-sharing manner, and are not limited herein. In addition, in the embodiments of the present application, each field may have a corresponding skill slot.

It can be understood that after the smart home operation is expressed by using the secret words, a plurality of secret words may exist in a complete home control expression, which may result in that hot word information in the smart home field may no longer be included in one expression, and the speech recognition system scores the field to which a sentence belongs according to the hot word probabilities in different fields, so as to determine which field the sentence should be finally recognized. Therefore, when a complete home control expression contains a plurality of secret words, the secret words are easily and preferentially matched with the fields of chatting, encyclopedia and the like, and the function failure of the field of the home control of the secret words can be caused. By introducing the home domain scoring and correcting mechanism, after the user sends the first voice containing the dark language and matches the dark language, the server can score and correct the home domain according to the rules with the text sentences corresponding to the first voice, so that the expression of the dark language is more easily matched with the home domain, and the control failure of the dark language is avoided.

It should be noted that, when the server corresponding to the smart home client sends the detected qualified whisper to the local device in the whisper configuration phase, the server described in fig. 8 may be replaced with the local device, for example, a local server. In addition, if the local device is a sound pickup device, after the sound pickup device acquires the first voice in fig. 8, the sound pickup device may recognize the first voice, determine the control instruction, and send the control instruction to the controlled device.

The above is an introduction to the device control method provided in the present application. For ease of understanding, the following description is presented in terms of exemplary scenarios.

Scene one

Under this scene, the pickup equipment receives the first pronunciation that the user sent and is: "where nearby can buy fruit? ". The pre-configured corpus of phrases is the contents shown in table 1 above. The home domain modification rules are the rules described in S235 above.

As shown in fig. 9, after receiving the first voice sent by the sound pickup apparatus, the server may perform the following steps:

and S901, converting the first voice 'where fruits can be bought nearby' into a text sentence.

S902, the words and sentences are disassembled into five words of 'nearby', 'where', 'can', 'buy' and 'fruit'.

S903, calling a pre-configured dark word library, finding the definition of the word of 'buying fruit' in the word library, and matching the word with 'buying' and 'fruit' of the text word disassembling result, wherein the matching number of the dark words is 1.

And S904, carrying out hot word scoring on the participles generated in the S902, wherein the combination of the near, where and buy is the hot word in the shopping field, the scoring is 0.8, the hot word matching degree of other fields is not strong, the average scoring is not more than 0.2, no matched hot word appears in the household field, and the scoring is 0.1.

S905, correcting the score of the home field, wherein the score is corrected according to a score correction formula due to the fact that 1 dark word is matched: 0.1 × 2=0.2, and the home area score is corrected to 0.2.

And S906, making a decision according to the scoring result of each field, wherein the score of the shopping field is 0.8 which is the highest value.

S907, shopping skill is executed.

Specifically, the server may pass the parsed related terms to shopping skills for further processing. The shopping skill may call a system of a three-party commodity operator, inquire the distribution address of a fruit supplier, and push an information list of nearby fruit suppliers to a user according to family positioning information.

Scene two

In this scenario, the first voice sent by the user and received by the sound pickup device is: "my mom bought fruits when they came out". The pre-configured corpus of phrases is as shown in Table 1 above. The home domain modification rule is the rule described in S235 above.

As shown in fig. 10, after receiving the first voice sent by the sound pickup apparatus, the server may perform the following steps:

s1001, converting the first voice that the mom bought fruits when going to the door into a text sentence.

S1002, the words and sentences are disassembled into six words, namely, I, mom, going out, buying, fruit and removing.

S1003, calling a pre-configured dark word library, finding out the definitions of three words of 'my mom', 'going out', 'buying fruit' in the word library, and matching the 'my', 'mom', 'going out', 'buying' and 'fruit' of the text word disassembling result, wherein the matching number of the dark words is counted as 3.

And S1004, performing hotword scoring on the participles generated in the S1002, wherein no key hotword in any field appears in the expression of the sentence, the ordinary field score is not higher than 0.2, the system default chatting field score is about 0.4 to 0.5, no matching hotword appears in the home field, and the score is 0.1.

S1005, correcting the score of the home field, wherein 3 dark words are matched, and according to a score correction formula: 0.1 × 10=1, and the home area score is corrected to 1.

And S1006, making a decision according to the scoring results of all the fields, wherein the home field score 1 is the highest value.

And S1007, executing the home skill, and replacing the result of the secret language matching with the home skill slot.

Specifically, matching information according to the household dark language lexicon is as follows:

matching the 'my mom' with the 'security equipment', and completing the slot position matching of the target equipment in the home skills;

matching the 'buying fruit' with the 'alarm', and completing the matching of target function slots in the household skill;

and when the door is out, the door is matched with the door to be opened, and the target action slot position in the home skill is matched and completed.

S1008, the expression after completing the matching of the secret language slots is as follows: < target device > < target action > < target function > "go".

And S1009, converting the extracted home control slot position information into an intelligent home control command to be issued and executed.

Specifically, the converted control instruction may be "security equipment starts an alarm". From this the first pronunciation that sends at the user is "my mom goes out and has bought fruit and removed", security protection equipment can be in order to start the warning, and then has realized the secret words of family and report to the police, meets with unexpected invasion back at family, can realize reporting to the police by general secret words mode, avoids frightening the criminal.

Based on the method in the above embodiment, the embodiment of the present application further provides a chip. Referring to fig. 11, fig. 11 is a schematic structural diagram of a chip according to an embodiment of the present disclosure. As shown in fig. 11, chip 1100 includes one or more processors 1101 and interface circuits 1102. Optionally, chip 1100 may also include a bus 1103. Wherein:

the processor 1101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 1101. The processor 1101 described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods, steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The interface circuit 1102 may be used for transmitting or receiving data, instructions or information, and the processor 1101 may perform processing by using the data, instructions or other information received by the interface circuit 1102, and may transmit processing completion information through the interface circuit 1102.

Optionally, the chip further comprises a memory, which may include read only memory and random access memory, and provides operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM). Optionally, the memory stores executable software modules or data structures, and the processor may perform corresponding operations by calling the operation instructions stored in the memory (the operation instructions may be stored in an operating system).

Optionally, the interface circuit 1102 may be used to output the results of the execution by the processor 1101.

It should be noted that the functions corresponding to the processor 1101 and the interface circuit 1102 may be implemented by hardware design, may also be implemented by software design, and may also be implemented by a combination of software and hardware, which is not limited herein.

It will be appreciated that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in a processor. The chip can be applied to the server 300 to implement the method provided in the embodiment of the present application.

It is understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. The general purpose processor may be a microprocessor, but may be any conventional processor.

The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application.

Claims

1. An apparatus control method applied to a server, the method comprising:

acquiring a first voice sent by a user through at least one pickup device;

processing the first voice to obtain m scores of m fields, wherein the m fields are fields corresponding to the intention expressed by the first voice, the m fields comprise home fields, and m is a positive integer greater than or equal to 1;

according to the first voice, the score of the home domain is corrected;

determining a target field according to the corrected score of the home field and the scores of m-1 fields except the home field;

when the target field is a home field, obtaining a target control instruction according to the first voice and a pre-configured secret language library;

and outputting the target control instruction.

2. The method according to claim 1, wherein the modifying the score of the home domain according to the first speech specifically comprises:

respectively matching p words contained in the first voice with the dark words in the dark word library to obtain the number of the dark words contained in the first voice, wherein p is a positive integer greater than or equal to 1;

and correcting the score of the home field according to the number of the dark words contained in the first voice.

3. The method according to claim 1 or 2, wherein the determining a target field according to the modified score of the home field and the scores of m-1 fields other than the home field specifically comprises:

and selecting the field with the highest scoring result as the target field.

4. The method according to any one of claims 1 to 3, wherein obtaining the target control instruction according to the first voice and a pre-configured hidden language library specifically comprises:

determining j secret languages contained in the first voice, wherein j is a positive integer greater than or equal to 1;

and matching the j dark languages with the dark language library to obtain the target control instruction, wherein the dark language library comprises the corresponding relation between the dark languages and target data, and the target data comprises one or more of equipment, functions, services and scenes.

5. The method according to claim 4, wherein the matching the j linguists with the linguist library to obtain the target control instruction specifically comprises:

matching the words corresponding to the j dark words to home skill slot positions corresponding to the home field according to the corresponding relation between the dark words in the dark word library and the target data to obtain home skill slot position information;

and obtaining the target control instruction according to the home skill slot position information.

6. The method according to any one of claims 1 to 5, before acquiring the first voice uttered by the user through at least one sound pickup device, further comprising:

acquiring n secret languages configured by a user through electronic equipment, wherein n is a positive integer greater than or equal to 1;

detecting the legality of the n dark phrases, and storing the legal dark phrases into the dark phrase library;

and outputting k secret languages when k secret languages are illegal in the n secret languages.

7. The method according to claim 6, wherein the detecting the validity of the ith lingo for the ith lingo of the n lingoes includes:

detecting whether the ith lingo meets a preset condition, wherein the preset condition is one or more of the following conditions: the dark language is multi-syllable, different from the awakening word or different from the hot word in the preset hot word library;

if the ith secret word meets the preset condition, the ith secret word is legal;

and if the ith secret phrase does not meet the preset condition, the ith secret phrase is illegal.

8. The method according to any one of claims 1-7, wherein the first voice comprises an identity of the controlled device and at least one secret word.

9. The method of claim 8, wherein the identity of the controlled device is indicated by a lingo.

10. A server, comprising:

at least one memory for storing a program;

at least one processor for executing the memory-stored program, the processor being configured to perform the method of any of claims 1-9 when the memory-stored program is executed.

11. A computer-readable storage medium storing a computer program which, when run on an electronic device, causes the electronic device to perform the method of any one of claims 1-9.

12. A computer program product, characterized in that it causes an electronic device to perform the method according to any of claims 1-9, when it is run on the electronic device.