WO2020014899A1 - Voice control method, central control device, and storage medium - Google Patents

Voice control method, central control device, and storage medium Download PDF

Info

Publication number
WO2020014899A1
WO2020014899A1 PCT/CN2018/096150 CN2018096150W WO2020014899A1 WO 2020014899 A1 WO2020014899 A1 WO 2020014899A1 CN 2018096150 W CN2018096150 W CN 2018096150W WO 2020014899 A1 WO2020014899 A1 WO 2020014899A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
instruction
instructions
cloud server
execution instruction
Prior art date
Application number
PCT/CN2018/096150
Other languages
French (fr)
Chinese (zh)
Inventor
谢冠宏
廖明进
高铭坤
Original Assignee
深圳魔耳智能声学科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳魔耳智能声学科技有限公司 filed Critical 深圳魔耳智能声学科技有限公司
Priority to CN201880000938.XA priority Critical patent/CN109074808B/en
Priority to PCT/CN2018/096150 priority patent/WO2020014899A1/en
Publication of WO2020014899A1 publication Critical patent/WO2020014899A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the present application relates to the technical field of voice recognition, and in particular, to a voice control method, a central control device, and a storage medium.
  • voice recognition has played an increasingly important role.
  • multi-point speech recognition technology such as smart home systems
  • multiple pickup devices are usually deployed in the corresponding space to collect voice signals from users to obtain voice instructions, and then the recognition device performs multiple voice instructions. Perform identification to control the corresponding device to perform the operation corresponding to the instruction.
  • the voice instructions obtained are different, and the control instructions obtained based on the recognition of the voice instructions are also different, which makes it difficult to achieve accurate control of the smart home.
  • a voice control method a central control device, and a storage medium are provided.
  • a voice control method includes:
  • a central control device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • One or more non-volatile storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • FIG. 1 is an application environment diagram of a voice control method in an embodiment
  • FIG. 2 is a schematic flowchart of a voice control method according to an embodiment
  • FIG. 3 is a schematic flowchart of steps of selecting and sending a voice instruction according to an embodiment
  • FIG. 4 is a schematic diagram of an interaction process of a voice control method according to an embodiment
  • FIG. 5 is a structural block diagram of a voice control device according to an embodiment
  • FIG. 6 is a structural block diagram of a central control device in an embodiment.
  • FIG. 1 is a schematic diagram of an application environment of a voice control method according to an embodiment.
  • the application environment includes a pickup device 102, a central control device 104, and a cloud server 106.
  • each of the sound pickup device 102 and the central control device 104 is connected through a network
  • the central control device 104 and the cloud server 106 are connected through a network.
  • the central control device 104 may specifically be a terminal device, such as a desktop terminal or a mobile terminal such as a gateway device having a voice processing capability, a central management device, or a smart home device.
  • the cloud server 106 is a server or a server cluster having a speech recognition function and capable of implementing complex speech recognition.
  • the sound pickup device 102 is configured to receive a sound signal sent by a user, convert it into a corresponding voice instruction, and send it to the central control device 104.
  • the sound pickup device refers to an electroacoustic instrument that converts sound into a voice signal by receiving sound vibration.
  • the voice signal refers to a signal carrying voice data obtained by collecting a voice signal sent by a user through a sound pickup device, wherein the voice data refers to data used to represent a voice signal. Facing different speech recognition needs, the sound pickup device collects the sound signals in the current environment to obtain a speech signal, and subsequently recognizes the speech signal and performs the corresponding function.
  • the voice instruction refers to a voice signal carrying a control instruction, and the relevant equipment in the smart home system can be controlled by the voice instruction. Taking a smart home system as an example, the voice commands collected by the sound pickup device include a wake-up command or a switch command.
  • multiple sound pickup devices 102 are deployed at different positions in the same space to collect sound signals from different positions to ensure that users or other personnel can be collected when they emit sound signals at different positions.
  • the central control device 104 is connected to each of the sound pickup devices 102 through a network, and is configured to receive the sound instructions collected by the sound pickup device 102, analyze each sound instruction, and send the sound instruction that meets the volume condition to the cloud server 106.
  • the volume condition is a preset volume limitation condition according to a requirement for accuracy of speech recognition.
  • the volume condition is a preset number of voice commands with the highest volume; or, the volume condition is greater than a set volume threshold.
  • the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to the cloud server 106.
  • the central control device 104 filters the received voice instructions based on the volume conditions, and filters out the relatively poor quality voice instructions, and sends the filtered voice instructions that meet the volume conditions to the cloud server 106 for identification, avoiding voice instructions.
  • the poor quality results in too large an error in the recognition result.
  • it can also reduce the voice recognition tasks of the cloud server, thereby accelerating the speed of obtaining recognition results.
  • the cloud server 106 receives the voice command sent by the central control device 104, and recognizes the received voice command, obtains the recognition result corresponding to each voice command, and returns each recognition result to the central control device 104, so that the central control device 104 is based on each The recognition results determine what needs to be done.
  • the recognition result refers to an output result corresponding to the voice instruction after the cloud server 106 recognizes the received voice instruction based on a preset voice recognition model.
  • the speech recognition model is a traditional speech recognition model, such as a neural network-based speech recognition model.
  • the central control device 104 receives each recognition result returned by the cloud server 106, and determines whether each recognition result satisfies a consistency condition. When the number of recognition results satisfying the consistency condition reaches a preset threshold, execution of a The operation corresponding to the recognition result.
  • the consistency condition refers to a condition that needs to be satisfied when the recognition result of the comparison is judged to be consistent.
  • the condition may be the same recognition result of the comparison, or the similarity of the recognition result of the comparison reaches a preset value, etc., which may be specifically set according to requirements.
  • each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction. It can be understood that after a voice command is recognized by the voice recognition model, the output result includes multiple control commands matching the voice command, and the similarity between the voice command and the matched control command.
  • the cloud server 106 stores control instructions in advance, and the cloud server 106 recognizes the voice instructions to obtain multiple control instructions that match the voice instructions and their similarities.
  • the central control device 104 receives the recognition result returned by the cloud server, and determines an execution instruction from each recognition result according to the similarity of the control instructions.
  • the similarity can effectively represent the correlation between the control instruction and the voice instruction.
  • the execution instruction is finally determined based on the similarity, which can ensure the accuracy of the execution instruction.
  • the execution instruction refers to an instruction that finally controls a controlled device to perform an operation.
  • the central control device 104 is further configured to control the controlled device to perform an operation corresponding to the execution instruction according to the determined execution instruction. Specifically, when the central control device 104 is a controlled device, after determining an execution instruction according to the control instruction and its similarity, the central control device 104 controls itself to perform an operation corresponding to the execution instruction according to the determined execution instruction. Taking the central control device 104 as an example of a smart home device, such as a smart speaker or a smart TV, when the smart home device obtains a determined execution instruction, it controls the operation corresponding to the execution instruction. For example, when the execution instruction is an "on" instruction, the smart home device is caused to perform an opening operation.
  • a smart home device such as a smart speaker or a smart TV
  • the central control device 104 is further connected to the controlled device, and is configured to control the controlled device to perform an operation corresponding to the execution instruction according to the determined execution instruction.
  • the central control device 104 may be a gateway device or other central management device.
  • the central control device 104 determines the controlled device to be controlled according to the determined execution instruction, and controls the determined controlled device to perform related operations according to the execution instruction, or sends the execution instruction to the determined controlled device, and the controlled device performs the execution according to the execution The instruction performs the relevant operation.
  • the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner.
  • the central control device 104 determines that the currently controlled device to be controlled is a smart speaker, and then controls the smart speaker to be turned on; or sends a “speaker on” instruction to the smart speaker, and the smart speaker An internal control unit controls the opening operation.
  • the sound pickup device 102 is further configured to perform noise reduction and compression processing on the collected voice instructions, and send the voice instructions after the noise reduction and compression processing to the central control device 104.
  • the central control device 104 decompresses the received voice instructions, analyzes the decompressed voice instructions, and sends the voice instructions that meet the volume conditions to the cloud server 106.
  • the sound pickup device 102 is further configured to perform compression processing on the collected voice instructions, and send the compressed voice instructions to the central control device 104.
  • the central control device 104 decompresses and reduces noise of the received voice instructions, analyzes the voice instructions after decompression and noise reduction processing, and sends the voice instructions that meet the volume conditions to the cloud server 106.
  • the central control device 104 itself includes a pickup device.
  • the central control device 104 autonomously collects voice instructions through its own pickup device, receives voice instructions collected by the pickup device 102 and its own pickup device, and responds to each voice.
  • the instructions are analyzed, and a voice instruction that satisfies the volume condition is sent to the cloud server 106.
  • a voice control method is provided.
  • the method is applied to the central control device 104 in FIG. 1 as an example for description.
  • the method includes the following steps:
  • the sound pickup device includes a sound pickup device provided separately from the central control device, and a sound pickup device provided in the central control device itself. That is to say, the voice instructions collected by each pickup device received by the central control device include the voice instructions collected by each pickup device independently set, and the voice instructions collected by the central control device itself.
  • S204 Analyze each voice command, and send the voice command that meets the volume condition to the cloud server, and the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command.
  • the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification. After the voice command is recognized by the voice recognition model of the cloud server, the recognition result corresponding to each voice command is obtained.
  • the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification.
  • the cloud server stores control instructions in advance.
  • the cloud server recognizes the voice instructions to obtain the control instructions that match the voice instructions.
  • the matching control instructions and related information form the recognition result.
  • the cloud server returns the recognition result of each voice instruction to the central control device through the network.
  • the central control device receives each recognition result returned by the cloud server, so as to determine an operation to be performed based on each recognition result.
  • each received recognition result is judged to determine whether each recognition result meets the consistency condition, and whether the number of recognition results satisfying the consistency condition reaches a preset threshold. If the number of recognition results of the sexual condition reaches a preset threshold, the corresponding operation is performed according to the recognition results satisfying the consistency condition.
  • the cloud server by receiving and analyzing the voice instructions collected by each pickup device, and sending the voice instructions that meet the volume condition to the cloud server, so that the cloud server recognizes the relatively clear voice instructions received, Get more accurate recognition results.
  • the recognition results are further filtered.
  • the operation corresponding to the recognition results that meet the consistency conditions is performed, so that the recognition results corresponding to the last operation performed can effectively represent the voice instruction. Key information, which improves the accuracy of multipoint voice control.
  • each voice command is analyzed, and the voice command that meets the volume condition is sent to the cloud server.
  • the cloud server recognizes each voice command and obtains the recognition result corresponding to each voice command, including: performing each voice command.
  • the volume coefficient of each voice command is obtained through analysis; the voice command that satisfies the volume condition is determined and sent to the cloud server according to the volume coefficient, and the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
  • the volume coefficient refers to a coefficient used to indicate a volume level, that is, a strength of a sound, and a unit is “decibel (dB)”. Because the position where the sound is generated is different from the distance of each pickup device, the volume of the sound signal collected by each pickup device is also different. Specifically, by analyzing the vibration amplitude parameter of the voice instruction, the volume coefficient of each voice instruction is obtained, it is determined whether the volume coefficient of each voice instruction satisfies a preset volume condition, and the voice instruction meeting the volume condition is sent to the cloud server.
  • a voice command that satisfies a volume condition is determined and sent to a cloud server.
  • the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command, including:
  • Each voice command received corresponds to a volume factor, and the voice commands are arranged according to the magnitude of the volume factor, for example, in order from large to small, or in order from small to large.
  • a voice command with a smaller volume factor is usually not clear enough, and it is easy to cause misrecognition in the process of voice recognition, and get wrong recognition results.
  • a preset number of voice instructions with the largest volume coefficients are selected and sent to the cloud server for recognition. For example, select the three voice commands with the highest volume coefficients, or select the two voice commands with the highest volume coefficients.
  • the preset number can be set based on the requirements for the accuracy of the recognition results.
  • S306 Send a preset number of voice instructions to the cloud server, and the cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
  • analyzing each voice command, and sending the voice command that satisfies the volume condition to the cloud server, and before the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command it further includes: receiving each voice command Perform integrity check to determine whether each voice command is complete, and if not, delete non-complete voice commands. As a result, only the complete voice instructions are analyzed, and the voice instructions that meet the volume conditions are sent to the cloud server to further ensure the accuracy of the recognition results.
  • the voice instruction sent by the sound pickup device includes voice data and a check value calculated according to the voice data.
  • the central control device parses the received voice instruction to obtain the voice data and check value, and based on the same check value calculation method as the pickup device, calculates a check value based on the parsed voice data, and judges and calculates Is the parity check value equal to the parity check value? If yes, it indicates that the received voice instruction is complete, otherwise it indicates that the received voice instruction is incomplete and data loss has occurred. Through the integrity check, the accuracy of the recognized voice instructions is guaranteed.
  • each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction.
  • performing operations corresponding to the recognition results satisfying the consistency condition includes: when the control instructions with the highest similarity among at least two recognition results are the same, the similarity is maximized
  • the same control instruction is determined to be an execution instruction; according to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
  • control instructions with the highest similarity among the recognition results are respectively taken, and the extracted control instructions are compared to determine whether they are the same. If they are the same, the same control instruction is taken as the final determined execution instruction. It can be understood that the control command with the highest similarity in the recognition result is the control command that most closely matches the voice command. If the control command that most closely matches the voice command indicates the accuracy of the control command to a certain extent, the control command As the final execution instruction.
  • the voice commands sent to the cloud server for recognition include voice commands I, II, and III.
  • Recognition results I, II, and III are obtained through recognition.
  • the control with the highest similarity in the recognition results I and II is obtained.
  • the instructions are the same and both are A. Therefore, the control instruction A is taken as the final execution instruction.
  • performing an operation corresponding to the recognition results satisfying the consistency condition includes: when at least three control instructions with the highest similarity among the recognition results At the same time, the same control instruction with the highest similarity is determined as the execution instruction; according to the execution instruction, the controlled device is controlled to perform the operation corresponding to the execution instruction. That is, when determining the execution instruction according to the control instruction with the highest similarity among the recognition results, the number of the same control instruction can be set according to requirements.
  • a preset number of voice commands with a relatively high volume is selected and sent to the cloud server for recognition, so as to avoid that the poor voice command causes the recognition result to be too large, which will affect the accuracy of voice recognition.
  • the same control instruction with the highest similarity among the recognition results is preferentially used as the execution instruction to ensure the accuracy of the voice control.
  • the voice control method further includes: when the control instructions with the highest similarity among any two recognition results are different, obtaining the control instructions with the highest similarity among all the recognition results; and determining the control command with the highest similarity among all the recognition results as Execution instruction; according to the execution instruction, control the controlled device to perform the operation corresponding to the execution instruction.
  • control instructions with the highest similarity among the recognition results are compared.
  • the control instructions in the respective recognition results are merged, and the control instruction with the highest similarity in the combined control instruction set is taken.
  • the final execution instruction and control the controlled device to perform the operation corresponding to the execution instruction.
  • the controlled device is the central control device itself, and controlling the controlled device to perform the operation corresponding to the execution instruction according to the execution instruction includes: performing the operation corresponding to the execution instruction according to the execution instruction.
  • the central control device as an example of a smart home device, such as a smart speaker or a smart TV
  • the smart home device executes the operation corresponding to the execution instruction.
  • the execution instruction is an "on" instruction
  • the smart home device is caused to perform an opening operation.
  • controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction includes: determining the controlled device to be controlled according to the execution instruction; and controlling the determined controlled device to perform the operation corresponding to the execution instruction.
  • controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction includes: determining the controlled device to be controlled according to the execution instruction; sending the execution instruction to the determined controlled device, and the controlled device Perform related operations according to the execution instructions.
  • the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner.
  • the determined execution instruction is a “speaker on” instruction
  • determine that the controlled device to be controlled is a smart speaker, and then control the smart speaker to turn on; or send the “speaker on” instruction to the smart speaker, and the control unit inside the smart speaker Controls the start operation.
  • each pickup device collects a voice signal sent by a user to obtain a voice instruction, and compresses the voice instruction and sends the voice instruction to the central control device.
  • the central control device receives the voice instructions sent by each pickup device and the voice instructions collected by itself, performs decompression and noise reduction processing on the voice instructions, analyzes the voice instructions after decompression and noise reduction processing, and obtains the volume coefficient of each voice instruction. . Then, the voice instructions are sorted according to the volume coefficient, and a preset number of voice instructions with the largest volume coefficient is selected and sent to the cloud server.
  • the cloud server separately recognizes each voice instruction, obtains the recognition result corresponding to each voice instruction, and returns it to the central control device.
  • Each recognition result includes at least one control instruction obtained by recognizing the voice instruction and the similarity of each control instruction.
  • the central control device receives each recognition result, and judges whether there is at least two recognition results with the most similar control instructions being the same. If it exists, it determines the same control instruction having the highest similarity as the execution instruction; otherwise, it merges all the recognition results, and The control instruction with the highest similarity in the merged recognition result set is determined as the execution instruction.
  • the central control device determines the controlled device to be controlled according to the execution instruction, and controls the determined controlled device to perform the operation corresponding to the execution instruction.
  • a voice command collected by each pickup device is received and analyzed, and a preset number of voice commands with the largest volume coefficient are sent to the cloud server, so that the cloud server responds to the relatively clear voice command received.
  • Perform recognition to get more accurate recognition results and reduce the interference of wrong recognition results.
  • the control instructions in the recognition result are further filtered according to the similarity to determine the execution instructions. Based on the consideration of similarity, it fully reflects the correlation between the control instruction and the voice instruction, so that the final execution instruction can accurately match the voice instruction, and effectively represent the key information of the voice instruction, which improves the multi-point voice control. Accuracy.
  • a voice control device includes a signal receiving module 502, a volume analysis module 504, a feedback receiving module 506, and an execution module 508. among them:
  • the signal receiving module 502 is configured to receive a voice instruction collected by each sound pickup device. Specifically, voice instructions collected by each pickup device and collected by the central control device itself are received.
  • the volume analysis module 504 is configured to analyze each voice instruction, and send the voice instruction that meets the volume condition to the cloud server.
  • the cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
  • the volume analysis module 504 analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification. After the voice command is recognized by the voice recognition model of the cloud server, the recognition result corresponding to each voice command is obtained.
  • the feedback receiving module 506 is configured to receive each recognition result returned by the cloud server.
  • the cloud server returns the recognition result of each voice instruction to the central control device through the network.
  • the central control device receives each recognition result returned by the cloud server to determine the operations to be performed based on each recognition result.
  • the execution module 508 is configured to execute an operation corresponding to the recognition result satisfying the consistency condition when the number of recognition results meeting the consistency condition reaches a preset threshold.
  • the execution module 508 judges each recognition result received based on a preset consistency condition, determines whether each recognition result meets the consistency condition, and whether the number of recognition results satisfying the consistency condition reaches a preset Threshold, if the number of recognition results that meet the consistency condition reaches a preset threshold, then perform its corresponding operation according to the recognition results that meet the consistency condition.
  • the above voice control device receives the voice instructions collected by each pickup device and analyzes them, and sends the voice instructions that meet the volume condition to the cloud server, so that the cloud server recognizes the relatively clear voice instructions received. Get more accurate recognition results.
  • the recognition results are further filtered. When the number of recognition results that meet the consistency conditions reaches a preset threshold, the operation corresponding to the recognition results that meet the consistency conditions is performed, so that the recognition results corresponding to the last operation performed can effectively represent the voice instruction. Key information, which improves the accuracy of multipoint voice control.
  • the volume analysis module 504 further includes: a volume coefficient acquisition module and a determination module.
  • the volume coefficient acquisition module is used to analyze each voice instruction to obtain the volume coefficient of each voice instruction; the determination module is used to determine the voice instruction that meets the volume condition according to the volume coefficient and send it to the cloud server.
  • the volume coefficient acquisition module analyzes the vibration amplitude parameters of the voice instructions to obtain the volume coefficient of each voice instruction, and then the determination module determines whether the volume coefficient of each voice instruction meets a preset volume condition, and will satisfy the volume condition. Voice instructions are sent to the cloud server.
  • the determining module further includes a sorting module, an instruction obtaining module, and a sending module. among them:
  • the sorting module is used to sort each voice instruction according to the volume coefficient. For example, they are arranged in descending order, or in descending order. The larger the volume coefficient, the clearer and more accurate the corresponding voice command.
  • the instruction obtaining module is configured to obtain a preset number of voice instructions with the largest volume coefficient according to the sorting result.
  • a voice command with a smaller volume factor is usually not clear enough, and it is easy to cause misrecognition in the process of voice recognition, and get wrong recognition results.
  • the instruction acquisition module sorts the results according to the volume coefficient, selects a preset number of voice instructions with the largest volume coefficient, and sends it to the cloud server for recognition. For example, select the three voice commands with the highest volume coefficients, or select the two voice commands with the highest volume coefficients.
  • the preset number can be set on the basis of the requirements for the accuracy of the recognition result.
  • the sending module is configured to send a preset number of voice instructions to the cloud server.
  • the cloud server recognizes the preset number of voice commands to obtain a recognition result corresponding to each voice command.
  • the execution module includes an execution instruction determination module and an execution sub-module.
  • the execution instruction determination module is used to determine the same control instruction with the largest similarity as the execution instruction when at least two control instructions with the highest similarity are the same; the execution submodule is used to control the controlled device according to the execution instruction. Perform the operation corresponding to the execution instruction.
  • the execution instruction determination module is configured to separately obtain the control instructions with the highest similarity among the recognition results, compare the retrieved control instructions to determine whether they are the same, and if they are the same, use the same control instruction as the final determined execution. instruction. It can be understood that the control command with the highest similarity in the recognition result is the control command that most closely matches the voice command. If the control command that most closely matches the voice command indicates the accuracy of the control command to a certain extent, the control command As the final execution instruction.
  • execution instruction determination module is further configured to obtain the control instruction with the highest similarity among all the recognition results when the control instructions with the highest similarity among any two recognition results are different; and determine the control instruction with the highest similarity among all the recognition results. For executing instructions.
  • control instructions with the highest similarity among the recognition results are compared.
  • the control instructions in the respective recognition results are merged, and the control instruction with the highest similarity in the combined control instruction set is taken.
  • the final execution instruction and control the controlled device to perform the operation corresponding to the execution instruction.
  • the execution sub-module is further configured to execute an operation corresponding to the execution instruction according to the execution instruction.
  • the central control device as an example of a smart home device, such as a smart speaker or a smart TV
  • the smart home device obtains a certain execution instruction, it executes the operation corresponding to the execution instruction.
  • the execution instruction is an "on" instruction
  • the smart home device is caused to perform an opening operation.
  • the execution sub-module is further configured to determine the controlled device to be controlled according to the execution instruction; control the determined controlled device to perform an operation corresponding to the execution instruction.
  • the execution sub-module is further configured to determine the controlled device to be controlled according to the execution instruction; send the execution instruction to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
  • the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner.
  • the execution sub-module determines that the controlled device to be controlled is a smart speaker, and then controls the smart speaker to be turned on; or sends the “speaker on” instruction to the smart speaker, which is internal to the smart speaker
  • the control unit controls the opening operation.
  • the above voice control device receives and analyzes the voice commands collected by each pickup device, and sends a preset number of voice commands with the largest volume coefficient to the cloud server, so that the cloud server responds to the relatively clear voice commands received. Perform recognition to get more accurate recognition results and reduce the interference of wrong recognition results.
  • the control instructions in the recognition result are further filtered according to the similarity to determine the execution instructions. Based on the consideration of similarity, it fully reflects the correlation between the control instruction and the voice instruction, so that the final execution instruction can accurately match the voice instruction, and effectively represent the key information of the voice instruction, which improves the multi-point voice control. Accuracy.
  • Each module in the above-mentioned voice control device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a central control device in one embodiment, is provided, and its internal structure diagram may be as shown in FIG. 6.
  • the central control device includes a processor, a memory, a network interface, and a microphone connected through a system bus.
  • the processor of the central control device is used to provide computing and control capabilities.
  • the memory of the central control device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for running an operating system and computer programs in a non-volatile storage medium.
  • the network interface of the central control device is used to communicate with external terminals through a network connection.
  • the computer program is executed by a processor to implement a voice control method.
  • FIG. 6 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the central control equipment to which the scheme of the present application is applied.
  • the device may include more or fewer components than shown in the figure, or some components may be combined, or have different component arrangements.
  • a central control device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • the computer-readable instructions further cause the processor to perform the following steps:
  • a voice command that satisfies the volume condition is determined and sent to the cloud server.
  • the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
  • the computer-readable instructions further cause the processor to perform the following steps:
  • each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction.
  • the computer-readable instructions further cause the processor to perform the following steps:
  • the controlled device is controlled to perform an operation corresponding to the execution instruction.
  • the computer-readable instructions further cause the processor to perform the following steps:
  • the controlled device is controlled to perform an operation corresponding to the execution instruction.
  • the computer-readable instructions further cause the processor to perform the following steps:
  • the computer-readable instructions further cause the processor to perform the following steps:
  • the execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
  • one or more non-volatile storage media storing computer-readable instructions are provided.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
  • the one or more processors execute the following steps:
  • a voice command that satisfies the volume condition is determined and sent to the cloud server, and the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
  • the one or more processors execute the following steps:
  • each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction.
  • the computer-readable instructions are executed by one or more processors, the one or more processors are Perform the following steps:
  • the controlled device is controlled to perform an operation corresponding to the execution instruction.
  • the one or more processors execute the following steps:
  • the controlled device is controlled to perform an operation corresponding to the execution instruction.
  • the one or more processors execute the following steps:
  • the one or more processors execute the following steps:
  • the execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
  • steps in the embodiments of the present application are not necessarily performed sequentially in the order indicated by the step numbers. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present application relates to a voice control method, a central control device, and a storage medium. The method comprises: receiving voice commands acquired by sound pick-up devices; analyzing the voice commands, sending to a cloud server voice commands that meet a volume condition, and recognizing each voice command by the cloud server to obtain a recognition result corresponding to each voice command; receiving each recognition result returned by the cloud server; and if the number of recognition results that meet a consistency condition reaches a preset threshold, performing operations corresponding to the recognition results that meet the consistency condition. The voice commands that meet the volume condition are recognized, the recognition results are screened, when the number of recognition results that meet the consistency condition reaches the preset threshold, the operations corresponding to the recognition results that meet the consistency condition are performed, so that the recognition results corresponding to the finally performed operations can effectively represent key information of the voice commands, thereby improving the accuracy of multi-point voice control.

Description

语音控制方法、中控设备和存储介质Voice control method, central control device and storage medium 技术领域Technical field
本申请涉及语音识别技术领域,特别是涉及一种语音控制方法、中控设备和存储介质。The present application relates to the technical field of voice recognition, and in particular, to a voice control method, a central control device, and a storage medium.
背景技术Background technique
随着移动互联网、车联网和智能家居的发展,语音识别发挥了越来越重要的作用。特别是在多点语音识别技术中,比如智能家居***,通常在相应空间内部署有多个拾音设备,以对用户发出的声音信号进行采集得到语音指令,而后由识别设备对多个语音指令进行识别,以控制对应设备执行指令对应的操作。然而,由于多个拾音设备部署于不同的空间位置,得到的语音指令存在差异,基于对语音指令的识别得到的控制指令也存在不同,从而难以实现对智能家居的准确控制。With the development of mobile Internet, connected car and smart home, voice recognition has played an increasingly important role. Especially in multi-point speech recognition technology, such as smart home systems, multiple pickup devices are usually deployed in the corresponding space to collect voice signals from users to obtain voice instructions, and then the recognition device performs multiple voice instructions. Perform identification to control the corresponding device to perform the operation corresponding to the instruction. However, because multiple pickup devices are deployed in different spatial locations, the voice instructions obtained are different, and the control instructions obtained based on the recognition of the voice instructions are also different, which makes it difficult to achieve accurate control of the smart home.
因此,在多点语音控制技术中,如何从多个语音指令中有效识别出关键信息并进行准确的控制,成为当前语音控制技术发展所面临的重点及难点。Therefore, in multi-point voice control technology, how to effectively identify key information from multiple voice instructions and perform accurate control has become the focus and difficulty of the current development of voice control technology.
申请内容Application content
根据本申请提供的各种实施例,提供一种语音控制方法、中控设备和存储介质。According to various embodiments provided in the present application, a voice control method, a central control device, and a storage medium are provided.
一种语音控制方法,包括:A voice control method includes:
接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云 服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet the volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
一种中控设备,包括存储器和处理器,存储器中存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:A central control device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the processor causes the processor to perform the following steps:
接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet a volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:One or more non-volatile storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet a volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.
图1为一个实施例中语音控制方法的应用环境图;FIG. 1 is an application environment diagram of a voice control method in an embodiment; FIG.
图2为一个实施例中语音控制方法的流程示意图;2 is a schematic flowchart of a voice control method according to an embodiment;
图3为一个实施例中语音指令选取并发送的步骤的流程示意图;3 is a schematic flowchart of steps of selecting and sending a voice instruction according to an embodiment;
图4为一个实施例中语音控制方法的交互流程示意图;FIG. 4 is a schematic diagram of an interaction process of a voice control method according to an embodiment; FIG.
图5为一个实施例中语音控制装置的结构框图;5 is a structural block diagram of a voice control device according to an embodiment;
图6为一个实施例中中控设备的结构框图。FIG. 6 is a structural block diagram of a central control device in an embodiment.
具体实施方式detailed description
为使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本申请,并不限定本申请的保护范围。In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and do not limit the protection scope of the application.
图1为一个实施例提供的语音控制方法的应用环境示意图。如图1所示,该应用环境包括拾音设备102、中控设备104和云服务器106。其中,各拾音设备102和中控设备104通过网络连接,中控设备104和云服务器106通过网络连接。中控设备104具体可以是终端设备,比如具有语音处理能力的网 关设备、中心管理设备或智能家居设备等台式终端或移动终端。云服务器106为具有语音识别功能、能够实现复杂的语音识别的服务器或服务器集群。FIG. 1 is a schematic diagram of an application environment of a voice control method according to an embodiment. As shown in FIG. 1, the application environment includes a pickup device 102, a central control device 104, and a cloud server 106. Among them, each of the sound pickup device 102 and the central control device 104 is connected through a network, and the central control device 104 and the cloud server 106 are connected through a network. The central control device 104 may specifically be a terminal device, such as a desktop terminal or a mobile terminal such as a gateway device having a voice processing capability, a central management device, or a smart home device. The cloud server 106 is a server or a server cluster having a speech recognition function and capable of implementing complex speech recognition.
具体地,拾音设备102用于接收用户发出的声音信号,并转换为对应的语音指令后发送至中控设备104。Specifically, the sound pickup device 102 is configured to receive a sound signal sent by a user, convert it into a corresponding voice instruction, and send it to the central control device 104.
其中,拾音设备是指通过接收声音震动,将声音转换成语音信号的电声学仪器。语音信号是指通过拾音设备采集用户发出的声音信号得到的携带有语音数据的信号,其中,语音数据是指用于表示声音信号的数据。面对不同的语音识别需求,由拾音设备采集当前环境下的声音信号得到语音信号,通过后续对语音信号识别并执行相应的功能。语音指令是指携带有控制指令的语音信号,通过语音指令可实现对智能家居***中相关设备的控制。以智能家居***为例,拾音设备采集的语音指令包括唤醒指令或是切换指令等。Among them, the sound pickup device refers to an electroacoustic instrument that converts sound into a voice signal by receiving sound vibration. The voice signal refers to a signal carrying voice data obtained by collecting a voice signal sent by a user through a sound pickup device, wherein the voice data refers to data used to represent a voice signal. Facing different speech recognition needs, the sound pickup device collects the sound signals in the current environment to obtain a speech signal, and subsequently recognizes the speech signal and performs the corresponding function. The voice instruction refers to a voice signal carrying a control instruction, and the relevant equipment in the smart home system can be controlled by the voice instruction. Taking a smart home system as an example, the voice commands collected by the sound pickup device include a wake-up command or a switch command.
在本实施例中,将多个拾音设备102部署于同一空间的不同位置,以从不同方位采集声音信号,确保用户或其他人员在不同位置发出声音信号时均能够被采集到。In this embodiment, multiple sound pickup devices 102 are deployed at different positions in the same space to collect sound signals from different positions to ensure that users or other personnel can be collected when they emit sound signals at different positions.
中控设备104与各个拾音设备102通过网络连接,用于接收拾音设备102采集的语音指令,并对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器106。The central control device 104 is connected to each of the sound pickup devices 102 through a network, and is configured to receive the sound instructions collected by the sound pickup device 102, analyze each sound instruction, and send the sound instruction that meets the volume condition to the cloud server 106.
其中,音量条件为根据对语音识别准确度的要求,预先设置的音量限制条件。比如,音量条件为音量最大的预设数量语音指令;又或者,音量条件为大于设定的音量阈值等。具体地,由中控设备对接收到的各语音指令进行分析,以判断各语音指令是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器106。The volume condition is a preset volume limitation condition according to a requirement for accuracy of speech recognition. For example, the volume condition is a preset number of voice commands with the highest volume; or, the volume condition is greater than a set volume threshold. Specifically, the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to the cloud server 106.
中控设备104通过音量条件对接收到的语音指令进行筛选,过滤掉的质 量相对较差的语音指令,将筛选出的、符合音量条件的语音指令发送至云服务器106进行识别,避免因语音指令的质量较差,导致识别结果误差过大。同时,还能够减少云服务器的语音识别任务,进而加快获得识别结果的速度。The central control device 104 filters the received voice instructions based on the volume conditions, and filters out the relatively poor quality voice instructions, and sends the filtered voice instructions that meet the volume conditions to the cloud server 106 for identification, avoiding voice instructions. The poor quality results in too large an error in the recognition result. At the same time, it can also reduce the voice recognition tasks of the cloud server, thereby accelerating the speed of obtaining recognition results.
云服务器106接收中控设备104发送的语音指令,并对接收到语音指令进行识别,得到各语音指令对应的识别结果,并将各识别结果返回至中控设备104,以便中控设备104基于各识别结果确定需要执行的操作。The cloud server 106 receives the voice command sent by the central control device 104, and recognizes the received voice command, obtains the recognition result corresponding to each voice command, and returns each recognition result to the central control device 104, so that the central control device 104 is based on each The recognition results determine what needs to be done.
其中,识别结果是指云服务器106基于预设的语音识别模型对接收到的语音指令进行识别后,语音指令对应的输出结果。其中,语音识别模型为传统的语音识别模型,如基于神经网络的语音识别模型。The recognition result refers to an output result corresponding to the voice instruction after the cloud server 106 recognizes the received voice instruction based on a preset voice recognition model. Among them, the speech recognition model is a traditional speech recognition model, such as a neural network-based speech recognition model.
进一步地,中控设备104接收云服务器106返回的各识别结果,并判断各识别结果是否满足一致性条件,当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作。Further, the central control device 104 receives each recognition result returned by the cloud server 106, and determines whether each recognition result satisfies a consistency condition. When the number of recognition results satisfying the consistency condition reaches a preset threshold, execution of a The operation corresponding to the recognition result.
其中,一致性条件是指将进行比对的识别结果判断为一致时,所需要满足的条件。比如,该条件可以是比对的识别结果相同,也可以是比对的识别结果的相似度达到预设值等,具体可根据需求进行设定。Among them, the consistency condition refers to a condition that needs to be satisfied when the recognition result of the comparison is judged to be consistent. For example, the condition may be the same recognition result of the comparison, or the similarity of the recognition result of the comparison reaches a preset value, etc., which may be specifically set according to requirements.
在一实施例中,各识别结果包括对语音指令进行识别得到的至少一个控制指令及各控制指令的相似度。可以理解,一个语音指令经语音识别模型识别后,输出结果中会包括多个与该语音指令匹配的控制指令,以及该语音指令与所匹配的控制指令之间的相似度。其中,云服务器106中预先存储有控制指令,通过云服务器106对语音指令进行识别,得到与语音指令匹配的多个控制指令及其相似度。In one embodiment, each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction. It can be understood that after a voice command is recognized by the voice recognition model, the output result includes multiple control commands matching the voice command, and the similarity between the voice command and the matched control command. The cloud server 106 stores control instructions in advance, and the cloud server 106 recognizes the voice instructions to obtain multiple control instructions that match the voice instructions and their similarities.
中控设备104接收云服务器返回的识别结果,根据控制指令的相似度,从各识别结果中确定执行指令。相似度可有效表征控制指令与语音指令之间 的关联度,基于相似度最终确定执行指令,能够确保执行指令的准确性。其中,执行指令是指最终控制被控设备执行操作的指令。The central control device 104 receives the recognition result returned by the cloud server, and determines an execution instruction from each recognition result according to the similarity of the control instructions. The similarity can effectively represent the correlation between the control instruction and the voice instruction. The execution instruction is finally determined based on the similarity, which can ensure the accuracy of the execution instruction. Among them, the execution instruction refers to an instruction that finally controls a controlled device to perform an operation.
在一实施例中,中控设备104还用于根据确定的执行指令,控制被控设备执行执行指令对应的操作。具体地,当中控设备104为被控设备时,当根据控制指令及其相似度确定执行指令后,中控设备104根据确定的执行指令,控制自身执行执行指令对应的操作。以中控设备104为智能家居设备为例,比如智能音箱、智能电视机等,当智能家居设备得到确定的执行指令时,控制执行该执行指令对应的操作。比如,当执行指令为“开启”指令时,则使智能家居设备执行开启操作等。In an embodiment, the central control device 104 is further configured to control the controlled device to perform an operation corresponding to the execution instruction according to the determined execution instruction. Specifically, when the central control device 104 is a controlled device, after determining an execution instruction according to the control instruction and its similarity, the central control device 104 controls itself to perform an operation corresponding to the execution instruction according to the determined execution instruction. Taking the central control device 104 as an example of a smart home device, such as a smart speaker or a smart TV, when the smart home device obtains a determined execution instruction, it controls the operation corresponding to the execution instruction. For example, when the execution instruction is an "on" instruction, the smart home device is caused to perform an opening operation.
在另一实施例中,中控设备104还与被控设备连接,用于根据确定的执行指令,控制被控设备执行执行指令对应的操作。比如,中控设备104可以为网关设备或其他中央管理设备等。中控设备104根据确定的执行指令,确定待控制的被控设备,并根据执行指令控制确定的被控设备执行相关操作,或者将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。In another embodiment, the central control device 104 is further connected to the controlled device, and is configured to control the controlled device to perform an operation corresponding to the execution instruction according to the determined execution instruction. For example, the central control device 104 may be a gateway device or other central management device. The central control device 104 determines the controlled device to be controlled according to the determined execution instruction, and controls the determined controlled device to perform related operations according to the execution instruction, or sends the execution instruction to the determined controlled device, and the controlled device performs the execution according to the execution The instruction performs the relevant operation.
以智能家居***为例,假设中控设备为中央管理设备,被控设备可以包括但不限于智能音箱、智能电视机、智能空调等。当确定的执行指令为“音箱开启”指令时,中控设备104确定当前待控制的被控设备为智能音箱,进而控制智能音箱开启;或者将“音箱开启”指令发送至智能音箱,由智能音箱内部的控制单元控制执行开启操作。Taking a smart home system as an example, assuming that the central control device is a central management device, the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner. When the determined execution instruction is a “speaker on” instruction, the central control device 104 determines that the currently controlled device to be controlled is a smart speaker, and then controls the smart speaker to be turned on; or sends a “speaker on” instruction to the smart speaker, and the smart speaker An internal control unit controls the opening operation.
在一实施例中,拾音设备102还用于对采集到的语音指令进行降噪压缩处理,将降噪压缩处理后的语音指令发送至中控设备104。相应地,中控设备104对接收到的语音指令进行解压,对解压后的各语音指令进行分析,将 满足音量条件的语音指令发送至云服务器106。In an embodiment, the sound pickup device 102 is further configured to perform noise reduction and compression processing on the collected voice instructions, and send the voice instructions after the noise reduction and compression processing to the central control device 104. Correspondingly, the central control device 104 decompresses the received voice instructions, analyzes the decompressed voice instructions, and sends the voice instructions that meet the volume conditions to the cloud server 106.
在另一实施例中,拾音设备102还用于对采集到的语音指令进行压缩处理,将压缩处理后的语音指令发送至中控设备104。相应地,中控设备104对接收到的语音指令进行解压以及降噪,对解压降噪处理后的各语音指令进行分析,将满足音量条件的语音指令发送至云服务器106。In another embodiment, the sound pickup device 102 is further configured to perform compression processing on the collected voice instructions, and send the compressed voice instructions to the central control device 104. Correspondingly, the central control device 104 decompresses and reduces noise of the received voice instructions, analyzes the voice instructions after decompression and noise reduction processing, and sends the voice instructions that meet the volume conditions to the cloud server 106.
通过由拾音设备或者中控设备对语音指令进行降噪,滤除噪声干扰,保留有用信号,以进一步提高语音识别准确性。By using a pickup device or a central control device to reduce the noise of voice instructions, filter out noise interference, and retain useful signals to further improve the accuracy of speech recognition.
在一实施例中,中控设备104本身包括拾音设备,中控设备104通过自身的拾音设备自主采集语音指令,接收拾音设备102以及自身拾音设备采集的语音指令,并对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器106。In an embodiment, the central control device 104 itself includes a pickup device. The central control device 104 autonomously collects voice instructions through its own pickup device, receives voice instructions collected by the pickup device 102 and its own pickup device, and responds to each voice. The instructions are analyzed, and a voice instruction that satisfies the volume condition is sent to the cloud server 106.
在一个实施例中,如图2所示,提供了一种语音控制方法,以该方法应用于图1中的中控设备104为例进行说明,该方法包括以下步骤:In one embodiment, as shown in FIG. 2, a voice control method is provided. The method is applied to the central control device 104 in FIG. 1 as an example for description. The method includes the following steps:
S202,接收各拾音设备采集的语音指令。S202. Receive voice instructions collected by each sound pickup device.
在本实施例中,拾音设备包括独立于中控设备设置的拾音设备,以及中控设备本身具备的拾音设备。也就是说,中控设备接收到的各拾音设备采集的语音指令,包括独立设置的各拾音设备采集的语音指令,以及中控设备本身采集的语音指令。In this embodiment, the sound pickup device includes a sound pickup device provided separately from the central control device, and a sound pickup device provided in the central control device itself. That is to say, the voice instructions collected by each pickup device received by the central control device include the voice instructions collected by each pickup device independently set, and the voice instructions collected by the central control device itself.
S204,对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。S204: Analyze each voice command, and send the voice command that meets the volume condition to the cloud server, and the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command.
具体地,由中控设备对接收到的各语音指令进行分析,以判断各语音指令是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器进行识别。语音指令通过云服务器的语音识别模型识别后,得到各语音指令对 应的识别结果。Specifically, the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification. After the voice command is recognized by the voice recognition model of the cloud server, the recognition result corresponding to each voice command is obtained.
以智能家居***为例,中控设备对接收到的各语音指令进行分析,以判断各语音指令是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器进行识别。云服务器中预先存储有控制指令,通过云服务器对语音指令进行识别,得到与语音指令匹配的控制指令,由匹配的控制指令及相关信息组成识别结果。Taking a smart home system as an example, the central control device analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification. The cloud server stores control instructions in advance. The cloud server recognizes the voice instructions to obtain the control instructions that match the voice instructions. The matching control instructions and related information form the recognition result.
S206,接收云服务器返回的各识别结果。S206. Receive each recognition result returned by the cloud server.
云服务器分别将各语音指令的识别结果通过网络返回至中控设备。中控设备接收云服务器返回的各识别结果,以基于各识别结果确定的所需执行的操作。The cloud server returns the recognition result of each voice instruction to the central control device through the network. The central control device receives each recognition result returned by the cloud server, so as to determine an operation to be performed based on each recognition result.
S208,当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作。S208. When the number of recognition results that meet the consistency condition reaches a preset threshold, perform an operation corresponding to the recognition results that meet the consistency condition.
具体地,基于预设的一致性条件,对接收到的各识别结果进行判断,判断各识别结果是否满足一致性条件,以及满足一致性条件的识别结果的数量是否达到预设阈值,若满足一致性条件的识别结果的数量达到预设阈值,则根据满足一致性条件的识别结果执行其对应的操作。Specifically, based on the preset consistency conditions, each received recognition result is judged to determine whether each recognition result meets the consistency condition, and whether the number of recognition results satisfying the consistency condition reaches a preset threshold. If the number of recognition results of the sexual condition reaches a preset threshold, the corresponding operation is performed according to the recognition results satisfying the consistency condition.
上述语音控制方法,通过接收各拾音设备采集的语音指令,并对其进行分析,将满足音量条件的语音指令发送至云服务器,以使云服务器对接收到的相对清楚的语音指令进行识别,得到较为准确的识别结果。进一步对识别结果进行筛选,当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作,使得最后所执行操作对应的识别结果能够有效表征语音指令的关键信息,进而提高了多点语音控制的准确率。In the above voice control method, by receiving and analyzing the voice instructions collected by each pickup device, and sending the voice instructions that meet the volume condition to the cloud server, so that the cloud server recognizes the relatively clear voice instructions received, Get more accurate recognition results. The recognition results are further filtered. When the number of recognition results that meet the consistency conditions reaches a preset threshold, the operation corresponding to the recognition results that meet the consistency conditions is performed, so that the recognition results corresponding to the last operation performed can effectively represent the voice instruction. Key information, which improves the accuracy of multipoint voice control.
在一实施例中,对各语音指令进行分析,将满足音量条件的语音指令发 送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果,包括:对各语音指令进行分析,得到各语音指令的音量系数;根据音量系数,确定满足音量条件的语音指令并发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。In one embodiment, each voice command is analyzed, and the voice command that meets the volume condition is sent to the cloud server. The cloud server recognizes each voice command and obtains the recognition result corresponding to each voice command, including: performing each voice command. The volume coefficient of each voice command is obtained through analysis; the voice command that satisfies the volume condition is determined and sent to the cloud server according to the volume coefficient, and the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
其中,音量系数是指用于表示音量大小的系数,也即表示声音的强弱,单位为“分贝(dB)”。由于声音产生的位置与各拾音设备的距离不同,因此,各拾音设备所采集到的声音信号的音量大小也不同。具体地,通过对语音指令的振动幅度参数进行分析,得到各语音指令的音量系数,判断各语音指令的音量系数是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器。The volume coefficient refers to a coefficient used to indicate a volume level, that is, a strength of a sound, and a unit is “decibel (dB)”. Because the position where the sound is generated is different from the distance of each pickup device, the volume of the sound signal collected by each pickup device is also different. Specifically, by analyzing the vibration amplitude parameter of the voice instruction, the volume coefficient of each voice instruction is obtained, it is determined whether the volume coefficient of each voice instruction satisfies a preset volume condition, and the voice instruction meeting the volume condition is sent to the cloud server.
具体地,如图3所示,根据音量系数,确定满足音量条件的语音指令并发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果的步骤,包括:Specifically, as shown in FIG. 3, according to the volume coefficient, a voice command that satisfies a volume condition is determined and sent to a cloud server. The cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command, including:
S302,将各语音指令按照音量系数大小进行排序。S302. Sort each voice instruction according to the volume coefficient.
接收到的每一语音指令都对应有一个音量系数,将各语音指令按照音量系数大小进行排列,比如按照从大到小的顺序排列,或者按照从小到大的顺序排列。音量系数越大,所对应的语音指令越清楚、准确。Each voice command received corresponds to a volume factor, and the voice commands are arranged according to the magnitude of the volume factor, for example, in order from large to small, or in order from small to large. The larger the volume coefficient, the clearer and more accurate the corresponding voice command.
S304,根据排序结果,获取音量系数最大的预设数量的语音指令。S304. Acquire a preset number of voice instructions with the largest volume coefficient according to the ranking result.
音量系数越小的语音指令,通常不够清楚,在语音识别过程中容易导致误识别,得到错误的识别结果。为保证识别结果的准确性,尽可能减少错误识别结果的干扰,根据音量系数排序结果,选取音量系数最大的预设数量的语音指令,以发送至云服务器进行识别。比如,选取音量系数最大的3条语音指令,或者选取音量系数最大的2条语音指令。预设数量可基于对识别结 果准确度的要求自行进行设置。A voice command with a smaller volume factor is usually not clear enough, and it is easy to cause misrecognition in the process of voice recognition, and get wrong recognition results. In order to ensure the accuracy of the recognition result and minimize the interference of incorrect recognition results, according to the ranking results of the volume coefficients, a preset number of voice instructions with the largest volume coefficients are selected and sent to the cloud server for recognition. For example, select the three voice commands with the highest volume coefficients, or select the two voice commands with the highest volume coefficients. The preset number can be set based on the requirements for the accuracy of the recognition results.
S306,将预设数量的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。S306: Send a preset number of voice instructions to the cloud server, and the cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
将选取的预设数量的语音指令发送至云服务器,由云服务器对该预设数量的语音指令进行识别,得到每条语音指令对应的识别结果。通过按照音量系数大小,选取音量系数最大的几组语音指令并发送至云服务器进行识别,一定程度上保证了所得识别结果的准确性。Send the selected preset number of voice commands to the cloud server, and the cloud server recognizes the preset number of voice commands to obtain the recognition result corresponding to each voice command. By selecting several sets of voice instructions with the largest volume coefficient according to the size of the volume coefficient and sending them to the cloud server for recognition, the accuracy of the obtained recognition results is guaranteed to a certain extent.
进一步地,对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果之前,还包括:对接收的各语音指令进行完整性校验,判断各语音指令是否完整,若否,则删除非完整的语音指令。从而使得仅对完整的各语音指令进行分析,将满足音量条件的语音指令发送至云服务器,进一步确保识别结果的准确性。Further, analyzing each voice command, and sending the voice command that satisfies the volume condition to the cloud server, and before the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command, it further includes: receiving each voice command Perform integrity check to determine whether each voice command is complete, and if not, delete non-complete voice commands. As a result, only the complete voice instructions are analyzed, and the voice instructions that meet the volume conditions are sent to the cloud server to further ensure the accuracy of the recognition results.
在一实施例中,拾音设备发送的语音指令中包括语音数据以及根据语音数据计算出的一个校验值。中控设备对接收到的语音指令进行解析,得到语音数据及校验值,并基于与拾音设备相同的校验值计算方法,根据解析得到的语音数据计算出一个校验值,判断计算出的校验值与解析得到的校验值是否相同,若是,则说明接收的语音指令是完整的,否则说明接收的语音指令是非完整的,发生了数据的丢失。通过进行完整性检验,保证进行识别的语音指令的准确性。In an embodiment, the voice instruction sent by the sound pickup device includes voice data and a check value calculated according to the voice data. The central control device parses the received voice instruction to obtain the voice data and check value, and based on the same check value calculation method as the pickup device, calculates a check value based on the parsed voice data, and judges and calculates Is the parity check value equal to the parity check value? If yes, it indicates that the received voice instruction is complete, otherwise it indicates that the received voice instruction is incomplete and data loss has occurred. Through the integrity check, the accuracy of the recognized voice instructions is guaranteed.
在一实施例中,各识别结果包括对语音指令进行识别得到的至少一个控制指令及各控制指令的相似度。当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作,包括:当至少两个 识别结果中相似度最大的控制指令相同时,将相似度最大的相同控制指令确定为执行指令;根据执行指令,控制被控设备执行执行指令对应的操作。In one embodiment, each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction. When the number of recognition results satisfying the consistency condition reaches a preset threshold, performing operations corresponding to the recognition results satisfying the consistency condition includes: when the control instructions with the highest similarity among at least two recognition results are the same, the similarity is maximized The same control instruction is determined to be an execution instruction; according to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
具体地,分别取各识别结果中相似度最大的控制指令,将取出的控制指令进行比对,判断其是否相同,若相同,则将相同的控制指令作为最终确定的执行指令。可以理解为,识别结果中相似度最大的控制指令为与语音指令最为匹配的控制指令,若与语音指令最为匹配的控制指令一致,一定程度说明了该控制指令的准确性,则将该控制指令作为最终确定的执行指令。Specifically, the control instructions with the highest similarity among the recognition results are respectively taken, and the extracted control instructions are compared to determine whether they are the same. If they are the same, the same control instruction is taken as the final determined execution instruction. It can be understood that the control command with the highest similarity in the recognition result is the control command that most closely matches the voice command. If the control command that most closely matches the voice command indicates the accuracy of the control command to a certain extent, the control command As the final execution instruction.
假设发送至云服务器进行识别的语音指令包括语音指令I、II、III,通过识别分别得到识别结果I、II、III,识别结果I中包括控制指令A、B和C,其相似度分别为98%、90%和87%,可表示为I={A,B,C;98%,90%,87%}。根据同样的表述方式得到,II={A,C,B;90%,85%,80%},III={B,D,C;90%,86%,70%}。分别取识别结果I、II、III中相似度最大的控制指令,可得A,A,B,将取出的三个控制指令进行比对可知,识别结果I和识别结果II中相似度最大的控制指令相同,均为A,因此,将控制指令A作为最终确定的执行指令。Assume that the voice commands sent to the cloud server for recognition include voice commands I, II, and III. Recognition results I, II, and III are obtained through recognition. The recognition results I include control commands A, B, and C, and their similarities are 98. %, 90%, and 87% can be expressed as I = {A, B, C; 98%, 90%, 87%}. According to the same expression, II = {A, C, B; 90%, 85%, 80%}, III = {B, D, C; 90%, 86%, 70%}. Take the control commands with the highest similarity among the recognition results I, II, and III, respectively, to get A, A, and B. Compare the three control commands taken out. The control with the highest similarity in the recognition results I and II is obtained. The instructions are the same and both are A. Therefore, the control instruction A is taken as the final execution instruction.
在一实施例中,当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作,包括:当至少三个识别结果中相似度最大的控制指令相同时,将相似度最大的相同控制指令确定为执行指令;根据执行指令,控制被控设备执行执行指令对应的操作。也就是说,在根据各识别结果中相似度最大的控制指令,确定执行指令时,相同控制指令的个数可根据需求进行设定。In an embodiment, when the number of recognition results satisfying the consistency condition reaches a preset threshold, performing an operation corresponding to the recognition results satisfying the consistency condition includes: when at least three control instructions with the highest similarity among the recognition results At the same time, the same control instruction with the highest similarity is determined as the execution instruction; according to the execution instruction, the controlled device is controlled to perform the operation corresponding to the execution instruction. That is, when determining the execution instruction according to the control instruction with the highest similarity among the recognition results, the number of the same control instruction can be set according to requirements.
上述语音控制方法,通过选取音量较大的预设数量的语音指令,并发送至云服务器进行识别,避免因语音指令较差,导致识别结果误差过大,进而 影响语音识别的准确性。并且,通过对得到的多个控制指令进行比对分析,优先将各识别结果中相似度最大的相同控制指令作为执行指令,保证语音控制的准确性。In the above voice control method, a preset number of voice commands with a relatively high volume is selected and sent to the cloud server for recognition, so as to avoid that the poor voice command causes the recognition result to be too large, which will affect the accuracy of voice recognition. In addition, by comparing and analyzing the obtained multiple control instructions, the same control instruction with the highest similarity among the recognition results is preferentially used as the execution instruction to ensure the accuracy of the voice control.
进一步地,语音控制方法还包括:当任意两个识别结果中相似度最大的控制指令不同时,获取全部识别结果中相似度最大的控制指令;将全部识别结果中相似度最大的控制指令确定为执行指令;根据执行指令,控制被控设备执行执行指令对应的操作。Further, the voice control method further includes: when the control instructions with the highest similarity among any two recognition results are different, obtaining the control instructions with the highest similarity among all the recognition results; and determining the control command with the highest similarity among all the recognition results as Execution instruction; according to the execution instruction, control the controlled device to perform the operation corresponding to the execution instruction.
具体地,比较各识别结果中相似度最大的控制指令,当不存在相同控制指令时,则将各识别结果中的控制指令进行合并,取合并后的控制指令集合中相似度最大的控制指令,作为最终确定的执行指令,并控制被控设备执行执行指令对应的操作。Specifically, the control instructions with the highest similarity among the recognition results are compared. When the same control instruction does not exist, the control instructions in the respective recognition results are merged, and the control instruction with the highest similarity in the combined control instruction set is taken. As the final execution instruction, and control the controlled device to perform the operation corresponding to the execution instruction.
可以理解为,当各语音指令的识别结果均不一致,或者满足一致性条件的识别结果的数量未达到预设阈值时,则将识别结果中的所有控制指令进行合并,将全部识别结果中相似度最大的控制指令作为执行指令,确保语音控制的准确性。It can be understood that when the recognition results of each voice instruction are inconsistent, or when the number of recognition results satisfying the consistency condition does not reach a preset threshold, all control instructions in the recognition result are merged, and the similarity in all recognition results is similar. The largest control instruction is used as the execution instruction to ensure the accuracy of voice control.
在一实施例中,被控设备为中控设备本身,则根据执行指令,控制被控设备执行执行指令对应的操作,包括:根据执行指令,执行执行指令对应的操作。In one embodiment, the controlled device is the central control device itself, and controlling the controlled device to perform the operation corresponding to the execution instruction according to the execution instruction includes: performing the operation corresponding to the execution instruction according to the execution instruction.
以中控设备为智能家居设备为例,比如智能音箱、智能电视机等,当智能家居设备得到确定的执行指令时,执行该执行指令对应的操作。比如,当执行指令为“开启”指令时,则使智能家居设备执行开启操作等。Taking the central control device as an example of a smart home device, such as a smart speaker or a smart TV, when the smart home device obtains a certain execution instruction, it executes the operation corresponding to the execution instruction. For example, when the execution instruction is an "on" instruction, the smart home device is caused to perform an opening operation.
在一实施例中,根据执行指令,控制被控设备执行执行指令对应的操作,包括:根据执行指令确定待控制的被控设备;控制确定的被控设备执行执行 指令对应的操作。In one embodiment, controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction includes: determining the controlled device to be controlled according to the execution instruction; and controlling the determined controlled device to perform the operation corresponding to the execution instruction.
在另一实施例中,根据执行指令,控制被控设备执行执行指令对应的操作,包括:根据执行指令确定待控制的被控设备;将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。In another embodiment, controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction includes: determining the controlled device to be controlled according to the execution instruction; sending the execution instruction to the determined controlled device, and the controlled device Perform related operations according to the execution instructions.
以智能家居***为例,假设中控设备为中央管理设备,被控设备可以包括但不限于智能音箱、智能电视机、智能空调等。当确定的执行指令为“音箱开启”指令时,确定当前待控制的被控设备为智能音箱,进而控制智能音箱开启;或者将“音箱开启”指令发送至智能音箱,由智能音箱内部的控制单元控制执行开启操作。Taking a smart home system as an example, assuming that the central control device is a central management device, the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner. When the determined execution instruction is a “speaker on” instruction, determine that the controlled device to be controlled is a smart speaker, and then control the smart speaker to turn on; or send the “speaker on” instruction to the smart speaker, and the control unit inside the smart speaker Controls the start operation.
下面结合应用环境对本申请的语音控制方法进行说明。在一实施例中,如图4所示,各拾音设备采集用户发出的声音信号得到语音指令,并将语音指令压缩后发送至中控设备。中控设备接收各拾音设备发送的语音指令以及自身采集得到的语音指令,对语音指令进行解压及降噪处理,对解压及降噪处理后的语音指令进行分析,得到各语音指令的音量系数。而后按照音量系数大小对语音指令进行排序,选取音量系数最大的预设数量的语音指令,并发送至云服务器。云服务器分别对各语音指令进行识别,得到各语音指令对应的识别结果并返回至中控设备,各识别结果包括对语音指令进行识别得到的至少一个控制指令及各控制指令的相似度。中控设备接收各识别结果,判断是否存在至少两个识别结果中相似度最大的控制指令相同,若存在,则将相似度最大的相同控制指令确定为执行指令;否则,合并全部识别结果,将合并后的识别结果集合中相似度最大的控制指令确定为执行指令。中控设备根据执行指令确定待控制的被控设备,并控制确定的被控设备执行执行指令对应的操作。The following describes the voice control method of the present application in combination with an application environment. In an embodiment, as shown in FIG. 4, each pickup device collects a voice signal sent by a user to obtain a voice instruction, and compresses the voice instruction and sends the voice instruction to the central control device. The central control device receives the voice instructions sent by each pickup device and the voice instructions collected by itself, performs decompression and noise reduction processing on the voice instructions, analyzes the voice instructions after decompression and noise reduction processing, and obtains the volume coefficient of each voice instruction. . Then, the voice instructions are sorted according to the volume coefficient, and a preset number of voice instructions with the largest volume coefficient is selected and sent to the cloud server. The cloud server separately recognizes each voice instruction, obtains the recognition result corresponding to each voice instruction, and returns it to the central control device. Each recognition result includes at least one control instruction obtained by recognizing the voice instruction and the similarity of each control instruction. The central control device receives each recognition result, and judges whether there is at least two recognition results with the most similar control instructions being the same. If it exists, it determines the same control instruction having the highest similarity as the execution instruction; otherwise, it merges all the recognition results, and The control instruction with the highest similarity in the merged recognition result set is determined as the execution instruction. The central control device determines the controlled device to be controlled according to the execution instruction, and controls the determined controlled device to perform the operation corresponding to the execution instruction.
上述语音控制方法,通过接收各拾音设备采集的语音指令,并对其进行分析,音量系数最大的预设数量的语音指令发送至云服务器,以使云服务器对接收到的相对清楚的语音指令进行识别,得到较为准确的识别结果,减少错误识别结果的干扰。进一步根据相似度对识别结果中的控制指令进行筛选,以确定执行指令。而基于对相似度的考虑,充分体现了控制指令与语音指令之间的关联度,使得最后确定的执行指令能够准确匹配语音指令,并有效表征语音指令的关键信息,提高了多点语音控制的准确率。In the above voice control method, a voice command collected by each pickup device is received and analyzed, and a preset number of voice commands with the largest volume coefficient are sent to the cloud server, so that the cloud server responds to the relatively clear voice command received. Perform recognition to get more accurate recognition results and reduce the interference of wrong recognition results. The control instructions in the recognition result are further filtered according to the similarity to determine the execution instructions. Based on the consideration of similarity, it fully reflects the correlation between the control instruction and the voice instruction, so that the final execution instruction can accurately match the voice instruction, and effectively represent the key information of the voice instruction, which improves the multi-point voice control. Accuracy.
在一实施例中,如图5所示,提供一种语音控制装置,该装置包括:信号接收模块502、音量分析模块504、反馈接收模块506和执行模块508。其中:In one embodiment, as shown in FIG. 5, a voice control device is provided. The device includes a signal receiving module 502, a volume analysis module 504, a feedback receiving module 506, and an execution module 508. among them:
信号接收模块502,用于接收各拾音设备采集的语音指令。具体地,接收各拾音设备采集的以及中控设备自身采集的语音指令。The signal receiving module 502 is configured to receive a voice instruction collected by each sound pickup device. Specifically, voice instructions collected by each pickup device and collected by the central control device itself are received.
音量分析模块504,用于对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器。使得由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。The volume analysis module 504 is configured to analyze each voice instruction, and send the voice instruction that meets the volume condition to the cloud server. The cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
具体地,音量分析模块504对接收到的各语音指令进行分析,以判断各语音指令是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器进行识别。语音指令通过云服务器的语音识别模型识别后,得到各语音指令对应的识别结果。Specifically, the volume analysis module 504 analyzes each received voice instruction to determine whether each voice instruction meets a preset volume condition, and sends the voice instruction that meets the volume condition to a cloud server for identification. After the voice command is recognized by the voice recognition model of the cloud server, the recognition result corresponding to each voice command is obtained.
反馈接收模块506,用于接收云服务器返回的各识别结果。The feedback receiving module 506 is configured to receive each recognition result returned by the cloud server.
云服务器分别将各语音指令的识别结果通过网络返回至中控设备。中控设备接收云服务器返回的各识别结果,以基于各识别结果确定的所需执行的 操作。The cloud server returns the recognition result of each voice instruction to the central control device through the network. The central control device receives each recognition result returned by the cloud server to determine the operations to be performed based on each recognition result.
执行模块508,用于当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作。The execution module 508 is configured to execute an operation corresponding to the recognition result satisfying the consistency condition when the number of recognition results meeting the consistency condition reaches a preset threshold.
本实施例中,执行模块508基于预设的一致性条件,对接收到的各识别结果进行判断,判断各识别结果是否满足一致性条件,以及满足一致性条件的识别结果的数量是否达到预设阈值,若满足一致性条件的识别结果的数量达到预设阈值,则根据满足一致性条件的识别结果执行其对应的操作。In this embodiment, the execution module 508 judges each recognition result received based on a preset consistency condition, determines whether each recognition result meets the consistency condition, and whether the number of recognition results satisfying the consistency condition reaches a preset Threshold, if the number of recognition results that meet the consistency condition reaches a preset threshold, then perform its corresponding operation according to the recognition results that meet the consistency condition.
上述语音控制装置,通过接收各拾音设备采集的语音指令,并对其进行分析,将满足音量条件的语音指令发送至云服务器,以使云服务器对接收到的相对清楚的语音指令进行识别,得到较为准确的识别结果。进一步对识别结果进行筛选,当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作,使得最后所执行操作对应的识别结果能够有效表征语音指令的关键信息,进而提高了多点语音控制的准确率。The above voice control device receives the voice instructions collected by each pickup device and analyzes them, and sends the voice instructions that meet the volume condition to the cloud server, so that the cloud server recognizes the relatively clear voice instructions received. Get more accurate recognition results. The recognition results are further filtered. When the number of recognition results that meet the consistency conditions reaches a preset threshold, the operation corresponding to the recognition results that meet the consistency conditions is performed, so that the recognition results corresponding to the last operation performed can effectively represent the voice instruction. Key information, which improves the accuracy of multipoint voice control.
进一步地,音量分析模块504还包括:音量系数获取模块和确定模块。其中,音量系数获取模块用于对各语音指令进行分析,得到各语音指令的音量系数;确定模块用于根据音量系数,确定满足音量条件的语音指令并发送至云服务器。Further, the volume analysis module 504 further includes: a volume coefficient acquisition module and a determination module. The volume coefficient acquisition module is used to analyze each voice instruction to obtain the volume coefficient of each voice instruction; the determination module is used to determine the voice instruction that meets the volume condition according to the volume coefficient and send it to the cloud server.
具体地,音量系数获取模块通过对语音指令的振动幅度参数进行分析,得到各语音指令的音量系数,进而由确定模块判断各语音指令的音量系数是否满足预设的音量条件,将满足音量条件的语音指令发送至云服务器。Specifically, the volume coefficient acquisition module analyzes the vibration amplitude parameters of the voice instructions to obtain the volume coefficient of each voice instruction, and then the determination module determines whether the volume coefficient of each voice instruction meets a preset volume condition, and will satisfy the volume condition. Voice instructions are sent to the cloud server.
在一实施例中,确定模块还包括:排序模块、指令获取模块和发送模块。其中:In an embodiment, the determining module further includes a sorting module, an instruction obtaining module, and a sending module. among them:
排序模块用于将各语音指令按照音量系数大小进行排序。比如按照从大 到小的顺序排列,或者按照从小到大的顺序排列。音量系数越大,所对应的语音指令越清楚、准确。The sorting module is used to sort each voice instruction according to the volume coefficient. For example, they are arranged in descending order, or in descending order. The larger the volume coefficient, the clearer and more accurate the corresponding voice command.
指令获取模块用于根据排序结果,获取音量系数最大的预设数量的语音指令。音量系数越小的语音指令,通常不够清楚,在语音识别过程中容易导致误识别,得到错误的识别结果。为保证识别结果的准确性,尽可能减少错误识别结果的干扰,指令获取模块根据音量系数排序结果,选取音量系数最大的预设数量的语音指令,以发送至云服务器进行识别。比如,选取音量系数最大的3条语音指令,或者选取音量系数最大的2条语音指令。预设数量可基于对识别结果准确度的要求自行进行设置。The instruction obtaining module is configured to obtain a preset number of voice instructions with the largest volume coefficient according to the sorting result. A voice command with a smaller volume factor is usually not clear enough, and it is easy to cause misrecognition in the process of voice recognition, and get wrong recognition results. In order to ensure the accuracy of the recognition result and minimize the interference of incorrect recognition results, the instruction acquisition module sorts the results according to the volume coefficient, selects a preset number of voice instructions with the largest volume coefficient, and sends it to the cloud server for recognition. For example, select the three voice commands with the highest volume coefficients, or select the two voice commands with the highest volume coefficients. The preset number can be set on the basis of the requirements for the accuracy of the recognition result.
发送模块用于将预设数量的语音指令发送至云服务器。通过将选取的预设数量的语音指令发送至云服务器,由云服务器对该预设数量的语音指令进行识别,得到每条语音指令对应的识别结果。通过按照音量系数大小,选取音量系数最大的几组语音指令并发送至云服务器进行识别,一定程度上保证了所得识别结果的准确性。The sending module is configured to send a preset number of voice instructions to the cloud server. By sending the selected preset number of voice commands to the cloud server, the cloud server recognizes the preset number of voice commands to obtain a recognition result corresponding to each voice command. By selecting several sets of voice instructions with the largest volume coefficient according to the size of the volume coefficient and sending them to the cloud server for recognition, the accuracy of the obtained recognition results is guaranteed to a certain extent.
在一实施例中,执行模块包括执行指令确定模块和执行子模块。其中,执行指令确定模块用于当至少两个识别结果中相似度最大的控制指令相同时,将相似度最大的相同控制指令确定为执行指令;执行子模块用于根据执行指令,控制被控设备执行执行指令对应的操作。In an embodiment, the execution module includes an execution instruction determination module and an execution sub-module. The execution instruction determination module is used to determine the same control instruction with the largest similarity as the execution instruction when at least two control instructions with the highest similarity are the same; the execution submodule is used to control the controlled device according to the execution instruction. Perform the operation corresponding to the execution instruction.
具体地,执行指令确定模块用于分别取各识别结果中相似度最大的控制指令,将取出的控制指令进行比对,判断其是否相同,若相同,则将相同的控制指令作为最终确定的执行指令。可以理解为,识别结果中相似度最大的控制指令为与语音指令最为匹配的控制指令,若与语音指令最为匹配的控制指令一致,一定程度说明了该控制指令的准确性,则将该控制指令作为最终 确定的执行指令。Specifically, the execution instruction determination module is configured to separately obtain the control instructions with the highest similarity among the recognition results, compare the retrieved control instructions to determine whether they are the same, and if they are the same, use the same control instruction as the final determined execution. instruction. It can be understood that the control command with the highest similarity in the recognition result is the control command that most closely matches the voice command. If the control command that most closely matches the voice command indicates the accuracy of the control command to a certain extent, the control command As the final execution instruction.
进一步地,执行指令确定模块还用于当任意两个识别结果中相似度最大的控制指令不同时,获取全部识别结果中相似度最大的控制指令;将全部识别结果中相似度最大的控制指令确定为执行指令。Further, the execution instruction determination module is further configured to obtain the control instruction with the highest similarity among all the recognition results when the control instructions with the highest similarity among any two recognition results are different; and determine the control instruction with the highest similarity among all the recognition results. For executing instructions.
具体地,比较各识别结果中相似度最大的控制指令,当不存在相同控制指令时,则将各识别结果中的控制指令进行合并,取合并后的控制指令集合中相似度最大的控制指令,作为最终确定的执行指令,并控制被控设备执行执行指令对应的操作。Specifically, the control instructions with the highest similarity among the recognition results are compared. When the same control instruction does not exist, the control instructions in the respective recognition results are merged, and the control instruction with the highest similarity in the combined control instruction set is taken. As the final execution instruction, and control the controlled device to perform the operation corresponding to the execution instruction.
在一实施例中,执行子模块还用于根据执行指令,执行执行指令对应的操作。以中控设备为智能家居设备为例,比如智能音箱、智能电视机等,当智能家居设备得到确定的执行指令时,执行该执行指令对应的操作。比如,当执行指令为“开启”指令时,则使智能家居设备执行开启操作等。In an embodiment, the execution sub-module is further configured to execute an operation corresponding to the execution instruction according to the execution instruction. Taking the central control device as an example of a smart home device, such as a smart speaker or a smart TV, when the smart home device obtains a certain execution instruction, it executes the operation corresponding to the execution instruction. For example, when the execution instruction is an "on" instruction, the smart home device is caused to perform an opening operation.
在一实施例中,执行子模块还用于根据执行指令确定待控制的被控设备;控制确定的被控设备执行执行指令对应的操作。In an embodiment, the execution sub-module is further configured to determine the controlled device to be controlled according to the execution instruction; control the determined controlled device to perform an operation corresponding to the execution instruction.
在另一实施例中,执行子模块还用于根据执行指令确定待控制的被控设备;将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。In another embodiment, the execution sub-module is further configured to determine the controlled device to be controlled according to the execution instruction; send the execution instruction to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
以智能家居***为例,假设中控设备为中央管理设备,被控设备可以包括但不限于智能音箱、智能电视机、智能空调等。当确定的执行指令为“音箱开启”指令时,执行子模块确定当前待控制的被控设备为智能音箱,进而控制智能音箱开启;或者将“音箱开启”指令发送至智能音箱,由智能音箱内部的控制单元控制执行开启操作。Taking a smart home system as an example, assuming that the central control device is a central management device, the controlled device may include, but is not limited to, a smart speaker, a smart TV, and a smart air conditioner. When the determined execution instruction is the “speaker on” instruction, the execution sub-module determines that the controlled device to be controlled is a smart speaker, and then controls the smart speaker to be turned on; or sends the “speaker on” instruction to the smart speaker, which is internal to the smart speaker The control unit controls the opening operation.
上述语音控制装置,通过接收各拾音设备采集的语音指令,并对其进行 分析,音量系数最大的预设数量的语音指令发送至云服务器,以使云服务器对接收到的相对清楚的语音指令进行识别,得到较为准确的识别结果,减少错误识别结果的干扰。进一步根据相似度对识别结果中的控制指令进行筛选,以确定执行指令。而基于对相似度的考虑,充分体现了控制指令与语音指令之间的关联度,使得最后确定的执行指令能够准确匹配语音指令,并有效表征语音指令的关键信息,提高了多点语音控制的准确率。The above voice control device receives and analyzes the voice commands collected by each pickup device, and sends a preset number of voice commands with the largest volume coefficient to the cloud server, so that the cloud server responds to the relatively clear voice commands received. Perform recognition to get more accurate recognition results and reduce the interference of wrong recognition results. The control instructions in the recognition result are further filtered according to the similarity to determine the execution instructions. Based on the consideration of similarity, it fully reflects the correlation between the control instruction and the voice instruction, so that the final execution instruction can accurately match the voice instruction, and effectively represent the key information of the voice instruction, which improves the multi-point voice control. Accuracy.
关于语音控制装置的具体限定可以参见上文中对于语音控制方法的限定,在此不再赘述。上述语音控制装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the voice control device, reference may be made to the foregoing limitation on the voice control method, and details are not described herein again. Each module in the above-mentioned voice control device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
在一个实施例中,提供了一种中控设备,其内部结构图可以如图6所示。该中控设备包括通过***总线连接的处理器、存储器、网络接口、和麦克风。其中,该中控设备的处理器用于提供计算和控制能力。该中控设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***和计算机程序。该内存储器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该中控设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种语音控制方法。In one embodiment, a central control device is provided, and its internal structure diagram may be as shown in FIG. 6. The central control device includes a processor, a memory, a network interface, and a microphone connected through a system bus. The processor of the central control device is used to provide computing and control capabilities. The memory of the central control device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for running an operating system and computer programs in a non-volatile storage medium. The network interface of the central control device is used to communicate with external terminals through a network connection. The computer program is executed by a processor to implement a voice control method.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的中控设备的限定,具体的中控设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the central control equipment to which the scheme of the present application is applied. The device may include more or fewer components than shown in the figure, or some components may be combined, or have different component arrangements.
在一实施例中,提供一种中控设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行如 下步骤:In one embodiment, a central control device is provided, including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:
接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果;Analyze each voice command, send the voice command that meets the volume condition to the cloud server, and the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command;
接收云服务器返回的各识别结果;Receiving the identification results returned by the cloud server;
当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作。When the number of recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
在一实施例中,计算机可读指令还使得处理器执行如下步骤:In an embodiment, the computer-readable instructions further cause the processor to perform the following steps:
对各语音指令进行分析,得到各语音指令的音量系数;Analyze each voice command to obtain the volume coefficient of each voice command;
根据音量系数,确定满足音量条件的语音指令并发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。According to the volume coefficient, a voice command that satisfies the volume condition is determined and sent to the cloud server. The cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
在一实施例中,计算机可读指令还使得处理器执行如下步骤:In an embodiment, the computer-readable instructions further cause the processor to perform the following steps:
将各语音指令按照音量系数大小进行排序;Sort each voice instruction according to the volume coefficient;
根据排序结果,获取音量系数最大的预设数量的语音指令;Obtaining a preset number of voice instructions with the largest volume coefficient according to the sorting result;
将预设数量的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。Send a preset number of voice instructions to the cloud server, and the cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
在一实施例中,各识别结果包括对语音指令进行识别得到的至少一个控制指令及各控制指令的相似度,计算机可读指令还使得处理器执行如下步骤:In one embodiment, each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction. The computer-readable instructions further cause the processor to perform the following steps:
当至少两个识别结果中相似度最大的控制指令相同时,将相似度最大的相同控制指令确定为执行指令;When at least two control instructions with the highest similarity in the recognition results are the same, determining the same control command with the highest similarity as the execution instruction;
根据执行指令,控制被控设备执行执行指令对应的操作。According to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
在一实施例中,计算机可读指令还使得处理器执行如下步骤:In an embodiment, the computer-readable instructions further cause the processor to perform the following steps:
当任意两个识别结果中相似度最大的控制指令不同时,获取全部识别结 果中相似度最大的控制指令;When the control commands with the highest similarity in any two recognition results are different, obtain the control command with the highest similarity in all the recognition results;
将全部识别结果中相似度最大的控制指令确定为执行指令;Determine the control instruction with the highest similarity among all the recognition results as the execution instruction;
根据执行指令,控制被控设备执行执行指令对应的操作。According to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
在一实施例中,计算机可读指令还使得处理器执行如下步骤:In an embodiment, the computer-readable instructions further cause the processor to perform the following steps:
根据执行指令确定待控制的被控设备;Determining the controlled equipment to be controlled according to the execution instruction;
控制确定的被控设备执行执行指令对应的操作。Control the determined controlled device to perform the operation corresponding to the execution instruction.
在一实施例中,计算机可读指令还使得处理器执行如下步骤:In an embodiment, the computer-readable instructions further cause the processor to perform the following steps:
根据执行指令确定待控制的被控设备;Determining the controlled equipment to be controlled according to the execution instruction;
将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。The execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
在一实施例中,提供一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In one embodiment, one or more non-volatile storage media storing computer-readable instructions are provided. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
对各语音指令进行分析,将满足音量条件的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果;Analyze each voice command, send the voice command that meets the volume condition to the cloud server, and the cloud server recognizes each voice command to obtain the recognition result corresponding to each voice command;
接收云服务器返回的各识别结果;Receiving the identification results returned by the cloud server;
当满足一致性条件的识别结果的数量达到预设阈值时,执行满足一致性条件的识别结果对应的操作。When the number of recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
对各语音指令进行分析,得到各语音指令的音量系数;Analyze each voice command to obtain the volume coefficient of each voice command;
根据音量系数,确定满足音量条件的语音指令并发送至云服务器,由云 服务器对各语音指令进行识别得到各语音指令对应的识别结果。According to the volume coefficient, a voice command that satisfies the volume condition is determined and sent to the cloud server, and the cloud server recognizes each voice command to obtain a recognition result corresponding to each voice command.
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
将各语音指令按照音量系数大小进行排序;Sort each voice instruction according to the volume coefficient;
根据排序结果,获取音量系数最大的预设数量的语音指令;Obtaining a preset number of voice instructions with the largest volume coefficient according to the sorting result;
将预设数量的语音指令发送至云服务器,由云服务器对各语音指令进行识别得到各语音指令对应的识别结果。Send a preset number of voice instructions to the cloud server, and the cloud server recognizes each voice instruction to obtain a recognition result corresponding to each voice instruction.
在一实施例中,各识别结果包括对语音指令进行识别得到的至少一个控制指令及各控制指令的相似度,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, each recognition result includes at least one control instruction obtained by recognizing a voice instruction and the similarity of each control instruction. When the computer-readable instructions are executed by one or more processors, the one or more processors are Perform the following steps:
当至少两个识别结果中相似度最大的控制指令相同时,将相似度最大的相同控制指令确定为执行指令;When at least two control instructions with the highest similarity in the recognition results are the same, determining the same control command with the highest similarity as the execution instruction;
根据执行指令,控制被控设备执行执行指令对应的操作。According to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
当任意两个识别结果中相似度最大的控制指令不同时,获取全部识别结果中相似度最大的控制指令;When the control commands with the highest similarity in any two recognition results are different, obtain the control command with the highest similarity in all recognition results;
将全部识别结果中相似度最大的控制指令确定为执行指令;Determine the control instruction with the highest similarity among all the recognition results as the execution instruction;
根据执行指令,控制被控设备执行执行指令对应的操作。According to the execution instruction, the controlled device is controlled to perform an operation corresponding to the execution instruction.
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
根据执行指令确定待控制的被控设备;Determining the controlled equipment to be controlled according to the execution instruction;
控制确定的被控设备执行执行指令对应的操作。Control the determined controlled device to perform the operation corresponding to the execution instruction.
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:In an embodiment, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
根据执行指令确定待控制的被控设备;Determining the controlled equipment to be controlled according to the execution instruction;
将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。The execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the embodiments of the present application are not necessarily performed sequentially in the order indicated by the step numbers. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium. When the program is executed, it may include the processes of the embodiments of the methods described above. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be the range described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above embodiments only express several implementation manners of the present application, and the description thereof is more specific and detailed, but it cannot be understood as a limitation on the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (20)

  1. 一种语音控制方法,其特征在于,包括:A voice control method, comprising:
    接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
    对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet a volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
    接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
    当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
  2. 根据权利要求1所述的方法,其特征在于,所述对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果,包括:The method according to claim 1, wherein the analyzing each of the voice instructions, and sending the voice instructions satisfying a volume condition to a cloud server, and each of the voice instructions is performed by the cloud server. Recognizing and obtaining a recognition result corresponding to each of the voice instructions includes:
    对各所述语音指令进行分析,得到各所述语音指令的音量系数;Analyze each of the voice instructions to obtain a volume coefficient of each of the voice instructions;
    根据音量系数,确定满足音量条件的所述语音指令并发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。According to the volume coefficient, the voice instruction that satisfies the volume condition is determined and sent to the cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  3. 根据权利要求2所述的方法,其特征在于,所述根据音量系数,确定满足音量条件的所述语音指令并发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果,包括:The method according to claim 2, wherein the voice instruction that satisfies the volume condition is determined according to a volume coefficient and sent to a cloud server, and each of the voice instructions is identified by the cloud server to obtain each location. The recognition result corresponding to the voice instruction includes:
    将各所述语音指令按照音量系数大小进行排序;Sorting each of the voice instructions according to the volume coefficient;
    根据排序结果,获取音量系数最大的预设数量的所述语音指令;Obtaining a preset number of the voice instructions with the largest volume coefficient according to the ranking result;
    将所述预设数量的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。Sending the preset number of the voice instructions to a cloud server, and the cloud server identifying each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  4. 根据权利要求1所述的方法,其特征在于,各所述识别结果包括对所述语音指令进行识别得到的至少一个控制指令及各控制指令的相似度,所述当满足一致性条件的所述识别结果的数量达到预设阈值时,执行所述满足一致性条件的所述识别结果对应的操作,包括:The method according to claim 1, wherein each of the recognition results comprises at least one control instruction obtained by recognizing the voice instruction and a similarity of each control instruction, and the condition when the consistency condition is satisfied When the number of recognition results reaches a preset threshold, performing an operation corresponding to the recognition results that meets a consistency condition includes:
    当至少两个所述识别结果中相似度最大的所述控制指令相同时,将相似度最大的相同控制指令确定为执行指令;When at least two of the control instructions with the highest similarity among the recognition results are the same, determining the same control instruction with the highest similarity as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, further comprising:
    当任意两个所述识别结果中相似度最大的所述控制指令不同时,获取全部识别结果中相似度最大的所述控制指令;When any two of the control instructions with the highest similarity among the recognition results are different, obtaining the control instruction with the highest similarity among all the recognition results;
    将全部识别结果中相似度最大的所述控制指令确定为执行指令;Determining the control instruction with the highest similarity among all the recognition results as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述执行指令,控制被控设备执行所述执行指令对应的操作,包括:The method according to claim 4, wherein the controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction comprises:
    根据所述执行指令确定待控制的被控设备;Determining the controlled device to be controlled according to the execution instruction;
    控制确定的所述被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction.
  7. 根据权利要求4所述的方法,其特征在于,所述根据所述执行指令,控制被控设备执行所述执行指令对应的操作,包括:The method according to claim 4, wherein the controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction comprises:
    根据所述执行指令确定待控制的被控设备;Determining the controlled device to be controlled according to the execution instruction;
    将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。The execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
  8. 一种中控设备,包括存储器和处理器,存储器中存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时,使得所述处理器执 行如下步骤:A central control device includes a memory and a processor, and computer-readable instructions are stored in the memory, wherein the computer-readable instructions are executed by a processor to cause the processor to perform the following steps:
    接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
    对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet a volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
    接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
    当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
  9. 根据权利要求8所述的中控设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 8, wherein the computer-readable instructions further cause the processor to perform the following steps:
    对各所述语音指令进行分析,得到各所述语音指令的音量系数;Analyze each of the voice instructions to obtain a volume coefficient of each of the voice instructions;
    根据音量系数,确定满足音量条件的所述语音指令并发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。According to the volume coefficient, the voice instruction that satisfies the volume condition is determined and sent to the cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  10. 根据权利要求9所述的中控设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 9, wherein the computer-readable instructions further cause the processor to perform the following steps:
    将各所述语音指令按照音量系数大小进行排序;Sorting each of the voice instructions according to the volume coefficient;
    根据排序结果,获取音量系数最大的预设数量的所述语音指令;Obtaining a preset number of the voice instructions with the largest volume coefficient according to the ranking result;
    将所述预设数量的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。Sending the preset number of the voice instructions to a cloud server, and the cloud server identifying each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  11. 根据权利要求8所述的中控设备,其特征在于,各所述识别结果包括对所述语音指令进行识别得到的至少一个控制指令及各控制指令的相似度,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 8, wherein each of the recognition results includes at least one control instruction obtained by recognizing the voice instruction and a similarity of each control instruction, and the computer-readable instructions further cause The processor performs the following steps:
    当至少两个所述识别结果中相似度最大的所述控制指令相同时,将相似度最大的相同控制指令确定为执行指令;When at least two of the control instructions with the highest similarity among the recognition results are the same, determining the same control instruction with the highest similarity as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  12. 根据权利要求11所述的中控设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 11, wherein the computer-readable instructions further cause the processor to perform the following steps:
    当任意两个所述识别结果中相似度最大的所述控制指令不同时,获取全部识别结果中相似度最大的所述控制指令;When any two of the control instructions with the highest similarity among the recognition results are different, obtaining the control instruction with the highest similarity among all the recognition results;
    将全部识别结果中相似度最大的所述控制指令确定为执行指令;Determining the control instruction with the highest similarity among all the recognition results as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  13. 根据权利要求11所述的中控设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 11, wherein the computer-readable instructions further cause the processor to perform the following steps:
    根据所述执行指令确定待控制的被控设备;Determining the controlled device to be controlled according to the execution instruction;
    控制确定的所述被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction.
  14. 根据权利要求11所述的中控设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The central control device according to claim 11, wherein the computer-readable instructions further cause the processor to perform the following steps:
    根据所述执行指令确定待控制的被控设备;Determining the controlled device to be controlled according to the execution instruction;
    将执行指令发送至确定的被控设备,由被控设备根据执行指令执行相关操作。The execution instruction is sent to the determined controlled device, and the controlled device performs related operations according to the execution instruction.
  15. 一个或多个存储有计算机可读指令的非易失性存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:One or more non-volatile storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    接收各拾音设备采集的语音指令;Receive voice instructions collected by each pickup device;
    对各所述语音指令进行分析,将满足音量条件的所述语音指令发送至云 服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果;Analyze each of the voice instructions, and send the voice instructions that meet the volume condition to a cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions;
    接收所述云服务器返回的各所述识别结果;Receiving each of the identification results returned by the cloud server;
    当满足一致性条件的所述识别结果的数量达到预设阈值时,执行满足一致性条件的所述识别结果对应的操作。When the number of the recognition results satisfying the consistency condition reaches a preset threshold, an operation corresponding to the recognition results satisfying the consistency condition is performed.
  16. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:The storage medium according to claim 15, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    对各所述语音指令进行分析,得到各所述语音指令的音量系数;Analyze each of the voice instructions to obtain a volume coefficient of each of the voice instructions;
    根据音量系数,确定满足音量条件的所述语音指令并发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。According to the volume coefficient, the voice instruction that satisfies the volume condition is determined and sent to the cloud server, and the cloud server recognizes each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:The storage medium according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    将各所述语音指令按照音量系数大小进行排序;Sorting each of the voice instructions according to the volume coefficient;
    根据排序结果,获取音量系数最大的预设数量的所述语音指令;Obtaining a preset number of the voice instructions with the largest volume coefficient according to the ranking result;
    将所述预设数量的所述语音指令发送至云服务器,由所述云服务器对各所述语音指令进行识别得到各所述语音指令对应的识别结果。Sending the preset number of the voice instructions to a cloud server, and the cloud server identifying each of the voice instructions to obtain a recognition result corresponding to each of the voice instructions.
  18. 根据权利要求15所述的存储介质,其特征在于,各所述识别结果包括对所述语音指令进行识别得到的至少一个控制指令及各控制指令的相似度,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:The storage medium according to claim 15, wherein each of the recognition results includes at least one control instruction obtained by recognizing the voice instruction and a similarity of each control instruction, and the computer-readable instruction is replaced by one or When multiple processors execute, make one or more processors perform the following steps:
    当至少两个所述识别结果中相似度最大的所述控制指令相同时,将相似度最大的相同控制指令确定为执行指令;When at least two of the control instructions with the highest similarity among the recognition results are the same, determining the same control instruction with the highest similarity as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:The storage medium according to claim 18, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    当任意两个所述识别结果中相似度最大的所述控制指令不同时,获取全部识别结果中相似度最大的所述控制指令;When any two of the control instructions with the highest similarity among the recognition results are different, obtaining the control instruction with the highest similarity among all the recognition results;
    将全部识别结果中相似度最大的所述控制指令确定为执行指令;Determining the control instruction with the highest similarity among all the recognition results as an execution instruction;
    根据所述执行指令,控制被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction according to the execution instruction.
  20. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:The storage medium according to claim 18, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    根据所述执行指令确定待控制的被控设备;Determining the controlled device to be controlled according to the execution instruction;
    控制确定的所述被控设备执行所述执行指令对应的操作。Controlling the controlled device to perform an operation corresponding to the execution instruction.
PCT/CN2018/096150 2018-07-18 2018-07-18 Voice control method, central control device, and storage medium WO2020014899A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880000938.XA CN109074808B (en) 2018-07-18 2018-07-18 Voice control method, central control device and storage medium
PCT/CN2018/096150 WO2020014899A1 (en) 2018-07-18 2018-07-18 Voice control method, central control device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/096150 WO2020014899A1 (en) 2018-07-18 2018-07-18 Voice control method, central control device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020014899A1 true WO2020014899A1 (en) 2020-01-23

Family

ID=64789414

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096150 WO2020014899A1 (en) 2018-07-18 2018-07-18 Voice control method, central control device, and storage medium

Country Status (2)

Country Link
CN (1) CN109074808B (en)
WO (1) WO2020014899A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885344A (en) * 2021-01-08 2021-06-01 深圳市艾特智能科技有限公司 Offline voice distributed control method, system, storage medium and equipment
CN115148202A (en) * 2022-05-31 2022-10-04 青岛海尔科技有限公司 Voice instruction processing method and device, storage medium and electronic device
CN116685032A (en) * 2023-06-20 2023-09-01 广东雅格莱灯光音响有限公司 Voice control method, device and equipment for stage lamp and storage medium

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032258A (en) * 2018-12-24 2019-07-19 四川易简天下科技股份有限公司 The mating structure of modularization, all-in-one machine and modular unit
CN111833584B (en) * 2019-04-17 2022-03-01 百度在线网络技术(北京)有限公司 Device control method, control device, control system, and storage medium
CN110246495A (en) * 2019-06-28 2019-09-17 联想(北京)有限公司 Information processing method and electronic equipment
CN112151025A (en) * 2019-06-28 2020-12-29 百度在线网络技术(北京)有限公司 Volume adjusting method, device, equipment and storage medium
CN110580904A (en) * 2019-09-29 2019-12-17 百度在线网络技术(北京)有限公司 Method and device for controlling small program through voice, electronic equipment and storage medium
CN110782891B (en) * 2019-10-10 2022-02-18 珠海格力电器股份有限公司 Audio processing method and device, computing equipment and storage medium
CN111294258A (en) * 2020-02-10 2020-06-16 成都捷顺宝信息科技有限公司 Voice interaction system and method for controlling intelligent household equipment
CN111009246A (en) * 2020-03-10 2020-04-14 展讯通信(上海)有限公司 Intelligent sound box and awakening method thereof, gateway, server and readable storage medium
CN111739531B (en) * 2020-06-11 2022-08-09 浙江沁园水处理科技有限公司 Voice control method
CN111951795B (en) * 2020-08-10 2024-04-09 中移(杭州)信息技术有限公司 Voice interaction method, server, electronic device and storage medium
CN114974228B (en) * 2022-05-24 2023-04-11 名日之梦(北京)科技有限公司 Rapid voice recognition method based on hierarchical recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0440439A2 (en) * 1990-01-30 1991-08-07 NEC Corporation Method and system for controlling an external machine by a voice command
CN106328143A (en) * 2015-06-23 2017-01-11 中兴通讯股份有限公司 Voice control method and device and mobile terminal
CN106601248A (en) * 2017-01-20 2017-04-26 浙江小尤鱼智能技术有限公司 Smart home system based on distributed voice control
CN107863106A (en) * 2017-12-12 2018-03-30 长沙联远电子科技有限公司 Voice identification control method and device
CN107886946A (en) * 2017-06-07 2018-04-06 深圳市北斗车载电子有限公司 For controlling the speech control system and method for vehicle mounted guidance volume

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2143079T3 (en) * 1994-11-01 2000-05-01 British Telecomm SPEECH RECOGNITION.
CN101493987B (en) * 2008-01-24 2011-08-31 深圳富泰宏精密工业有限公司 Sound control remote-control system and method for mobile phone
CN103366740B (en) * 2012-03-27 2016-12-14 联想(北京)有限公司 Voice command identification method and device
CN102831894B (en) * 2012-08-09 2014-07-09 华为终端有限公司 Command processing method, command processing device and command processing system
CN102945672B (en) * 2012-09-29 2013-10-16 深圳市国华识别科技开发有限公司 Voice control system for multimedia equipment, and voice control method
CN103106900B (en) * 2013-02-28 2016-05-04 用友网络科技股份有限公司 Speech recognition equipment and audio recognition method
CN104378886A (en) * 2014-11-14 2015-02-25 生迪光电科技股份有限公司 Intelligent illumination control system and method
CN106469558A (en) * 2015-08-21 2017-03-01 中兴通讯股份有限公司 Audio recognition method and equipment
CN109429522A (en) * 2016-12-06 2019-03-05 吉蒂机器人私人有限公司 Voice interactive method, apparatus and system
CN107204185B (en) * 2017-05-03 2021-05-25 深圳车盒子科技有限公司 Vehicle-mounted voice interaction method and system and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0440439A2 (en) * 1990-01-30 1991-08-07 NEC Corporation Method and system for controlling an external machine by a voice command
CN106328143A (en) * 2015-06-23 2017-01-11 中兴通讯股份有限公司 Voice control method and device and mobile terminal
CN106601248A (en) * 2017-01-20 2017-04-26 浙江小尤鱼智能技术有限公司 Smart home system based on distributed voice control
CN107886946A (en) * 2017-06-07 2018-04-06 深圳市北斗车载电子有限公司 For controlling the speech control system and method for vehicle mounted guidance volume
CN107863106A (en) * 2017-12-12 2018-03-30 长沙联远电子科技有限公司 Voice identification control method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885344A (en) * 2021-01-08 2021-06-01 深圳市艾特智能科技有限公司 Offline voice distributed control method, system, storage medium and equipment
CN115148202A (en) * 2022-05-31 2022-10-04 青岛海尔科技有限公司 Voice instruction processing method and device, storage medium and electronic device
CN116685032A (en) * 2023-06-20 2023-09-01 广东雅格莱灯光音响有限公司 Voice control method, device and equipment for stage lamp and storage medium
CN116685032B (en) * 2023-06-20 2024-02-06 广东雅格莱灯光音响有限公司 Voice control method, device and equipment for stage lamp and storage medium

Also Published As

Publication number Publication date
CN109074808A (en) 2018-12-21
CN109074808B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2020014899A1 (en) Voice control method, central control device, and storage medium
US11289072B2 (en) Object recognition method, computer device, and computer-readable storage medium
US11646038B2 (en) Method and system for separating and authenticating speech of a speaker on an audio stream of speakers
US7620547B2 (en) Spoken man-machine interface with speaker identification
WO2017206661A1 (en) Voice recognition method and system
US20190164566A1 (en) Emotion recognizing system and method, and smart robot using the same
CN113035202B (en) Identity recognition method and device
CN109473104A (en) Speech recognition network delay optimization method and device
CN111710344A (en) Signal processing method, device, equipment and computer readable storage medium
CN110837758A (en) Keyword input method and device and electronic equipment
CN111724781A (en) Audio data storage method and device, terminal and storage medium
CN110648669B (en) Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
CN115050372A (en) Audio segment clustering method and device, electronic equipment and medium
CN111710332A (en) Voice processing method and device, electronic equipment and storage medium
WO2021072893A1 (en) Voiceprint clustering method and apparatus, processing device and computer storage medium
CN107767860B (en) Voice information processing method and device
CN109343481B (en) Method and device for controlling device
CN112802498B (en) Voice detection method, device, computer equipment and storage medium
CN110689885A (en) Machine-synthesized speech recognition method, device, storage medium and electronic equipment
KR20190119521A (en) Electronic apparatus and operation method thereof
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
CN111986670A (en) Voice control method, device, electronic equipment and computer readable storage medium
CN111653284A (en) Interaction and recognition method, device, terminal equipment and computer storage medium
CN112992175B (en) Voice distinguishing method and voice recording device thereof
CN103390404A (en) Information processing apparatus, information processing method and information processing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18926660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18926660

Country of ref document: EP

Kind code of ref document: A1