CN117093972A - Voice control method and device, electronic equipment and readable storage medium - Google Patents

Voice control method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN117093972A
CN117093972A CN202210519826.3A CN202210519826A CN117093972A CN 117093972 A CN117093972 A CN 117093972A CN 202210519826 A CN202210519826 A CN 202210519826A CN 117093972 A CN117093972 A CN 117093972A
Authority
CN
China
Prior art keywords
information
user identity
permission
semantic
identity information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210519826.3A
Other languages
Chinese (zh)
Inventor
汪浩
姜顺豹
王宇汉
田进
张震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pateo Connect Nanjing Co Ltd
Original Assignee
Pateo Connect Nanjing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pateo Connect Nanjing Co Ltd filed Critical Pateo Connect Nanjing Co Ltd
Priority to CN202210519826.3A priority Critical patent/CN117093972A/en
Publication of CN117093972A publication Critical patent/CN117093972A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application discloses a voice control method, a voice control device, electronic equipment and a readable storage medium, wherein the voice control method comprises the following steps: uploading the collected voice information to a cloud server, so that the cloud server can be used for identifying semantic information corresponding to the voice information and target user identity information, and acquiring a permission set matched with the semantic information; receiving semantic information, target user identity information and a permission set sent by a cloud server, wherein the permission set comprises at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission for executing operation indicated by the semantic information; determining target authority state information corresponding to the target user identity information from the authority set; and executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.

Description

Voice control method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a voice control method, a voice control device, an electronic device, and a readable storage medium.
Background
Recently, intelligent voice services have been widely used in various electronic devices. For example, the audio information sent by the user can be identified and corresponding operation can be performed to meet the needs of the user. For example, when the user says "turn on radio", the car equipment automatically parses the instruction contained in the audio information "turn on radio" and automatically performs the operation of turning on radio.
At present, the intelligent voice service can execute corresponding operation in response to the detection of the audio information, and even analyze and execute the audio information of passers-by, so that the safety of the vehicle-mounted equipment is not high.
Disclosure of Invention
The embodiment of the application provides a voice control method, a voice control device, voice control equipment and a readable storage medium, which can solve the problem that the safety of the existing vehicle-mounted equipment is not high.
In a first aspect, an embodiment of the present application provides a voice control method, which is applied to a vehicle-mounted device, where the method includes:
uploading the collected voice information to a cloud server, so that the cloud server can be used for identifying semantic information corresponding to the voice information and target user identity information, and acquiring a permission set matched with the semantic information;
receiving semantic information, target user identity information and a permission set sent by a cloud server, wherein the permission set comprises at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission for executing operation indicated by the semantic information;
determining target authority state information corresponding to the target user identity information from the authority set;
and executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
In a possible implementation manner, in a case that the identity information of the target user sent by the cloud server is not received, the method further includes:
acquiring facial image information of a target user corresponding to the audio information;
and determining the identity information of the target user according to the facial image information.
In a second aspect, an embodiment of the present application provides a voice control method, applied to a cloud server, where the method includes:
receiving voice information sent by vehicle machine equipment;
identifying semantic information and target user identity information corresponding to the voice information;
acquiring a permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
the semantic information, the target user identity information and the permission set are sent to the vehicle-mounted device, so that the vehicle-mounted device can determine target permission state information corresponding to the target user identity information from the permission set; and executing the operation of semantic information indication under the condition that the target authority state information meets the preset authority condition.
In one possible implementation, obtaining the set of permissions matching the semantic information includes:
determining a permission set matched with the semantic information through preset configuration information; the preset configuration information comprises a plurality of permission sets respectively corresponding to the semantic information.
In a third aspect, an embodiment of the present application provides a voice control apparatus, applied to a vehicle device, including:
the uploading module is used for uploading the acquired voice information to the cloud server, and is used for the cloud server to identify semantic information and target user identity information corresponding to the voice information and acquire a permission set matched with the semantic information;
the first receiving module is used for receiving semantic information, target user identity information and authority set sent by the cloud server, wherein the authority set comprises at least one group of corresponding user identity information and authority state information, and the authority state information is used for describing whether the user identity information has authority for executing operation indicated by the semantic information;
the determining module is used for determining target authority state information corresponding to the target user identity information from the authority set;
and the execution module is used for executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
In a fourth aspect, an embodiment of the present application provides a voice control device, applied to a cloud server, where the device includes:
the second receiving module is used for receiving the voice information sent by the vehicle machine equipment;
the recognition module is used for recognizing semantic information and target user identity information corresponding to the voice information;
the acquisition module is used for acquiring the permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
the sending module is used for sending the semantic information, the target user identity information and the permission set to the vehicle-mounted equipment so as to be used for determining target permission state information corresponding to the target user identity information from the permission set by the vehicle-mounted equipment; and executing the operation of semantic information indication under the condition that the target authority state information meets the preset authority condition.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the method as in the first aspect or any of the possible implementations of the first aspect.
In a sixth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect or any of the possible implementations of the first aspect.
In the embodiment of the application, the collected voice information is uploaded to the cloud server, and then semantic information, target user identity information corresponding to the voice information and a permission set which comprises at least one group of corresponding user identity information and permission state information are sent by the cloud server, wherein the permission state information is used for describing whether the user identity information has permission for executing operation indicated by the semantic information or not, namely, the user identity information is different, and the permission for executing operation indicated by the semantic information is also different. And determining target authority state information corresponding to the target user identity information from the authority set, wherein the vehicle-mounted equipment can judge the performability of the semantic information according to the target authority state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present application, the drawings that are needed to be used in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.
FIG. 1 is a flow chart of a voice control method provided by an embodiment of the present application;
FIG. 2 is a flow chart of another voice control method provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram of a voice control device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another voice control apparatus according to an embodiment of the present application;
fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are merely configured to illustrate the application and are not configured to limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The following describes the voice control method provided by the embodiment of the present application in detail.
Fig. 1 is a flowchart of a voice control method according to an embodiment of the present application.
As shown in fig. 1, the voice control method may include steps 110 to 140, where the method is applied to a vehicle device, and specifically includes the following steps:
step 110, the collected voice information is uploaded to a cloud server, so that the cloud server can be used for identifying semantic information and target user identity information corresponding to the voice information, and acquiring a permission set matched with the semantic information.
Step 120, receiving semantic information, target user identity information and authority set sent by a cloud server, where the authority set includes at least one group of corresponding user identity information and authority state information, and the authority state information is used to describe whether the user identity information has authority to execute operation indicated by the semantic information.
And 130, determining target authority state information corresponding to the target user identity information from the authority set.
And 140, executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
In the voice control method provided by the application, the collected voice information is uploaded to the cloud server, then the semantic information, the target user identity information corresponding to the voice information and the permission set which are sent by the cloud server are received, the permission set comprises at least one group of corresponding user identity information and permission state information, the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information or not, namely, the user identity information is different, and the permission for executing the operation indicated by the semantic information is also different. And determining target authority state information corresponding to the target user identity information from the authority set, wherein the vehicle-mounted equipment can judge the performability of the semantic information according to the target authority state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
The following describes the contents of steps 110 to 140, respectively:
involving step 110.
The collected voice information is uploaded to a cloud server, so that the cloud server can be used for identifying semantic information corresponding to the voice information and target user identity information, and acquiring a permission set matched with the semantic information.
In one possible embodiment, before step 110, the following steps may be further included:
preprocessing the voice information, and removing mute segments of the voice information to obtain the preprocessed voice information.
Accordingly, in step 110, the method specifically includes:
and uploading the preprocessed voice information to a cloud server.
Therefore, the recognition workload of the cloud server can be reduced by removing the mute segment of the voice information, and the recognition efficiency is improved.
Uploading the preprocessed voice information to a cloud server, so that the cloud server can be used for identifying semantic information and target user identity information corresponding to the voice information, and acquiring a permission set matched with the semantic information.
And after receiving the voice information, the cloud server performs noise reduction processing on the voice information. The voice information is digitized by Mel-frequency cepstral coefficient (Mel-frequency cepstral coefficients, MFCC) feature extraction. And identifying semantic information corresponding to the voice information through a voice identification model consisting of an acoustic model, a language model and a dictionary. The semantic information corresponding to the voice information can be identified by a natural language identification technology.
Wherein, the MFCC, mel frequency, is based on the auditory characteristic of human ear, it has nonlinear correspondence with frequency. The mel-frequency cepstrum coefficient is a spectrum feature calculated by using the relationship between the mel-frequency cepstrum coefficient and the mel-frequency cepstrum coefficient. The method is mainly used for extracting the characteristics of the voice data and reducing the operation dimension. For example: for a frame of 512-dimensional (sampling point) data, the most important 40-dimensional (general) data can be extracted after the MFCC, and the purpose of reducing the dimension is achieved.
The cloud server identifies the target user identity information corresponding to the voice information, and specifically can compare the voiceprint characteristics of the vehicle equipment which are input before with the voiceprint characteristics of the voice information through a voiceprint identification (Voice Print Recognition, VPR) technology to obtain the target user identity information of the target user. Such as: vehicle owners, general users, etc.
VPR, one type of biometric identification, is a service that performs identification based on the acoustic characteristics of a speaker. The identity recognition is independent of accent and language, and can be used for speaker recognition and speaker confirmation.
The cloud server acquires the permission set matched with the semantic information, specifically, the permission set can be obtained by: determining a permission set matched with the semantic information through preset configuration information; the preset configuration information comprises a plurality of permission sets respectively corresponding to the semantic information.
The semantic information obtained by voice recognition is queried in preset configuration information, and the content of the semantic information and the right set matched with the semantic information are obtained. Such as: the semantic information is "open window", and the permission set matched with the semantic information includes: vehicle owner-executable; normal user-unexecutable.
Therefore, the cloud server completes the identification of the semantic information and the target user identity information corresponding to the voice information, and acquires the permission set matched with the semantic information. Illustratively, the vehicle-mounted device receives semantic information "window open", target user identity information "vehicle owner", target authority status information "vehicle owner-executable"; "general user-not executable".
In another possible embodiment, the cloud server may combine the target user identity information, and determine, from the set of rights, target rights state information corresponding to the target user identity information. And returning the target authority state information to the vehicle-mounted equipment. Illustratively, the in-car device receives semantic information "windowing" and target permission status information (0 unknown, 1 executable, 2 not executable).
Involving step 120.
And receiving semantic information, target user identity information and a permission set sent by the cloud server, wherein the permission set comprises at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission for executing operation indicated by the semantic information.
Illustratively, the vehicle-mounted device receives semantic information of "window opening", target user identity information of "vehicle owner", and target authority state information of "vehicle owner-1; ordinary user-2 ".
The permission status information is used to describe whether the user identity information has permission to perform an operation indicated by the semantic information, such as: 1 executable, 2 non-executable.
In a possible embodiment, in a case that the target user identity information sent by the cloud server is not received, the method further includes the following steps:
acquiring facial image information of a target user corresponding to the audio information;
and determining the identity information of the target user according to the facial image information.
If the target user identity information sent by the cloud server is not received, acquiring the target user identity information of the identified speaker through a face recognition technology, and if the target authority state information meets the preset authority condition, executing semantic information indicating operation; if the target authority state information meets the preset authority condition, the operation indicated by the semantic information is not executed, and the content with insufficient authority is broadcasted through voice. Such as "you do not have the right to open a window".
Therefore, under the condition that the target user identity information sent by the cloud server is not received, the target user identity information can be accurately determined by collecting the face image information of the target user corresponding to the audio information and determining the target user identity information according to the face image information, and further the corresponding target authority state information is determined according to the target user identity information, namely, the target user is accurately determined to have the authority for executing the operation indicated by the semantic information, so that the safety of voice control can be improved.
Involving step 130.
And determining target authority state information corresponding to the target user identity information from the authority set.
Illustratively, the target user identity information "owner", the target authority status information "owner-executable; ordinary user-not executable). According to the identity information of the target user, the owner can know that the corresponding target authority state information is executable.
In another possible embodiment, the vehicle-mounted device identifies semantic information and target user identity information corresponding to the voice information, and uploads the identified semantic information and target user identity information to the cloud server for the cloud server to obtain a permission set matched with the semantic information. Therefore, the data transmission quantity can be reduced, and the data transmission speed can be improved.
Involving step 140.
And under the condition that the target authority state information meets the preset authority condition, directly executing the operation indicated by the semantic information. If the target authority state information meets the preset authority condition, the operation indicated by the semantic information is not executed, and the content with insufficient authority is broadcasted through voice. If the target authority state information meets the preset authority condition, the voice information is executable, and the operation indicated by the semantic information is executed. Therefore, the safety of voice control can be improved.
In summary, in the embodiment of the present application, the collected voice information is uploaded to the cloud server, and then the semantic information, the target user identity information corresponding to the voice information, and the permission set, which are sent by the cloud server, are received, where the permission set includes at least one group of corresponding user identity information and permission state information, and the permission state information is used to describe whether the user identity information has permission to execute the operation indicated by the semantic information, that is, the user identity information is different, and the permission to execute the operation indicated by the semantic information is also different. And determining target authority state information corresponding to the target user identity information from the authority set, wherein the vehicle-mounted equipment can judge the performability of the semantic information according to the target authority state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
The application also provides a voice control method which is applied to the cloud server, and fig. 2 is a flowchart of another voice control method provided by the embodiment of the application.
As shown in fig. 2, the voice control method may include steps 210 to 240, and the method is applied to a voice control apparatus, as follows:
and step 210, receiving voice information sent by the vehicle machine equipment.
Step 220, identifying semantic information and target user identity information corresponding to the voice information.
Step 230, acquiring a right set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information.
Step 240, the semantic information, the target user identity information and the permission set are sent to the vehicle-mounted device, so that the vehicle-mounted device can determine target permission state information corresponding to the target user identity information from the permission set; and executing the operation of semantic information indication under the condition that the target authority state information meets the preset authority condition.
In the voice control method provided by the application, semantic information and target user identity information corresponding to voice information are identified through receiving the voice information sent by the vehicle machine equipment, and a permission set matched with the semantic information is obtained; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission to execute the operation indicated by the semantic information, namely, the permission to execute the operation indicated by the semantic information is different when the user identity information is different. And sending the semantic information, the target user identity information and the permission set to the vehicle-mounted equipment, so that the vehicle-mounted equipment can judge the executable of the semantic information according to the target permission state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
Involving step 210.
And receiving voice information sent by the vehicle machine equipment, wherein the voice information can be voice information with silence fragments removed.
Involving step 220.
And identifying semantic information and target user identity information corresponding to the voice information.
After receiving the voice information, the cloud server can perform noise reduction processing on the voice information. The voice information is digitized by MFCC feature extraction. And identifying semantic information corresponding to the voice information through a voice identification model consisting of an acoustic model, a language model and a dictionary.
The cloud server identifies the target user identity information corresponding to the voice information, and specifically can compare the voice print characteristics of the vehicle equipment which are input before with the voice print characteristics of the voice information through a voice print identification technology to obtain the target user identity information of the target user. Such as: vehicle owners, general users, etc.
Involving step 230.
Acquiring a permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information.
In a possible embodiment, in step 230, the following steps may be specifically included:
determining a permission set matched with the semantic information through preset configuration information; the preset configuration information comprises a plurality of permission sets respectively corresponding to the semantic information.
The preset configuration information may be editable, for example: the semantic information and the permission set matched with the semantic information can be newly added, or the permission set corresponding to the semantic information is modified, so that the vehicle-mounted equipment does not need to be modified, and the flexibility of voice control is improved.
The semantic information obtained by voice recognition is queried in preset configuration information, and the content of the semantic information and the right set matched with the semantic information are obtained. Such as: the semantic information is "open window", and the permission set matched with the semantic information includes: vehicle owner-executable; normal user-unexecutable.
Here, the cloud server completes identifying the semantic information and the target user identity information corresponding to the voice information, and obtains the permission set matched with the semantic information. Illustratively, the following information is sent to the in-vehicle device: semantic information of 'windowing', target user identity information of 'car owner', target authority state information of 'car owner-executable'; "general user-not executable".
Because the preset configuration information comprises a plurality of authority sets respectively corresponding to the semantic information, the authority set matched with the semantic information is rapidly and accurately determined through the preset configuration information.
In another possible embodiment, the cloud server may combine the target user identity information, and determine, from the set of rights, target rights state information corresponding to the target user identity information. And returning the target authority state information to the vehicle-mounted equipment. Illustratively, the following information is sent to the in-vehicle device: semantic information "windowed", target rights state information (0 unknown, 1 executable, 2 not executable).
Involving step 240.
The cloud server sends the semantic information, the target user identity information and the permission set to the vehicle-mounted device, so that the vehicle-mounted device can determine target permission state information corresponding to the target user identity information from the permission set, the vehicle-mounted device can conveniently judge the performability of the semantic information according to the target permission state information, and the operation of semantic information indication is executed under the condition that the target permission state information meets the preset permission condition.
In summary, in the embodiment of the application, by receiving voice information sent by a vehicle-mounted device, semantic information and target user identity information corresponding to the voice information are identified, and a permission set matched with the semantic information is obtained; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission to execute the operation indicated by the semantic information, namely, the permission to execute the operation indicated by the semantic information is different when the user identity information is different. And sending the semantic information, the target user identity information and the permission set to the vehicle-mounted equipment, so that the vehicle-mounted equipment can judge the executable of the semantic information according to the target permission state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
Based on the above voice control method shown in fig. 1, an embodiment of the present application further provides a voice control apparatus, as shown in fig. 3, the apparatus 300 may include:
the uploading module 310 is configured to upload the collected voice information to the cloud server, so that the cloud server can identify semantic information and target user identity information corresponding to the voice information, and obtain a permission set matched with the semantic information.
The first receiving module 320 is configured to receive semantic information, target user identity information, and a permission set sent by the cloud server, where the permission set includes at least one set of corresponding user identity information and permission status information, and the permission status information is used to describe whether the user identity information has permission to execute an operation indicated by the semantic information.
The determining module 330 is configured to determine, from the rights set, target rights state information corresponding to the target user identity information.
And the execution module 340 is configured to execute the operation indicated by the semantic information when the target permission status information meets the preset permission condition.
In one possible implementation, the apparatus 300 may further include:
and the acquisition module is used for acquiring the facial image information of the target user corresponding to the audio information.
The determining module 330 is further configured to determine target user identity information according to the facial image information.
In summary, in the embodiment of the present application, by uploading the collected voice information to the cloud server, and then receiving the semantic information sent by the cloud server, the target user identity information corresponding to the voice information, and the permission set, the permission set includes at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission to execute the operation indicated by the semantic information, that is, the permission of executing the operation indicated by the semantic information is different when the user identity information is different. And determining target authority state information corresponding to the target user identity information from the authority set, wherein the vehicle-mounted equipment can judge the performability of the semantic information according to the target authority state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
Based on the above voice control method shown in fig. 2, an embodiment of the present application further provides a voice control apparatus, as shown in fig. 4, the apparatus 400 may include:
the second receiving module 410 is configured to receive voice information sent by the vehicle device.
The recognition module 420 is configured to recognize semantic information and target user identity information corresponding to the voice information.
An obtaining module 430, configured to obtain a permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information.
The sending module 440 is configured to send the semantic information, the target user identity information, and the permission set to the vehicle-to-vehicle device, so that the vehicle-to-vehicle device determines target permission status information corresponding to the target user identity information from the permission set; and executing the operation of semantic information indication under the condition that the target authority state information meets the preset authority condition.
In one possible implementation, the obtaining module 430 is specifically configured to:
determining a permission set matched with the semantic information through preset configuration information; the preset configuration information comprises a plurality of permission sets respectively corresponding to the semantic information.
In summary, in the embodiment of the application, by receiving voice information sent by a vehicle-mounted device, semantic information and target user identity information corresponding to the voice information are identified, and a permission set matched with the semantic information is obtained; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission to execute the operation indicated by the semantic information, namely, the permission to execute the operation indicated by the semantic information is different when the user identity information is different. And sending the semantic information, the target user identity information and the permission set to the vehicle-mounted equipment, so that the vehicle-mounted equipment can judge the executable of the semantic information according to the target permission state information. And under the condition that the target authority state information meets the preset authority condition, the voice information is executable, and the operation of semantic information indication is executed. Therefore, the safety of voice control can be improved.
Fig. 5 shows a schematic hardware structure of an electronic device according to an embodiment of the present application.
A processor 501 and a memory 502 storing computer program instructions may be included in an electronic device.
In particular, the processor 501 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 502 may include removable or non-removable (or fixed) media, where appropriate. Memory 502 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 502 is a non-volatile solid state memory. In a particular embodiment, the memory 502 includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.
The processor 501 reads and executes the computer program instructions stored in the memory 502 to implement any one of the voice control methods of the embodiments shown in the figures.
In one example, the electronic device may also include a communication interface 503 and a bus 510. As shown in fig. 5, the processor 501, the memory 502, and the communication interface 503 are connected to each other by a bus 510 and perform communication with each other.
The communication interface 503 is mainly used to implement communication between each module, apparatus, unit and/or device in the embodiments of the present application.
Bus 510 includes hardware, software, or both that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 510 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device may execute the voice control method in the embodiment of the present application, thereby implementing the voice control method described in connection with fig. 1 to 2.
In addition, in combination with the voice control method in the above embodiment, the embodiment of the present application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement the speech control method of fig. 1-2.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present application, and they should be included in the scope of the present application.

Claims (8)

1. A voice control method, applied to a vehicle-mounted device, comprising:
uploading the collected voice information to a cloud server, and using the cloud server to identify semantic information and target user identity information corresponding to the voice information and obtain a permission set matched with the semantic information;
receiving the semantic information, the target user identity information and the permission set sent by the cloud server, wherein the permission set comprises at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
determining target authority state information corresponding to the target user identity information from the authority set;
and executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
2. The method of claim 1, wherein in the event that the target user identity information sent by the cloud server is not received, the method further comprises:
acquiring facial image information of a target user corresponding to the audio information;
and determining the identity information of the target user according to the facial image information.
3. A voice control method, applied to a cloud server, comprising:
receiving voice information sent by vehicle machine equipment;
identifying semantic information and target user identity information corresponding to the voice information;
acquiring a permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
the semantic information, the target user identity information and the permission set are sent to the vehicle-mounted equipment so as to be used for determining target permission state information corresponding to the target user identity information from the permission set by the vehicle-mounted equipment; and executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
4. A method according to claim 3, wherein said obtaining a set of rights matching said semantic information comprises:
determining a permission set matched with the semantic information through the preset configuration information; the preset configuration information comprises a plurality of authority sets respectively corresponding to the semantic information.
5. A voice control apparatus, characterized by being applied to a vehicle-mounted device, comprising:
the uploading module is used for uploading the acquired voice information to a cloud server, and is used for the cloud server to identify semantic information and target user identity information corresponding to the voice information and acquire a permission set matched with the semantic information;
the first receiving module is used for receiving the semantic information, the target user identity information and the permission set sent by the cloud server, wherein the permission set comprises at least one group of corresponding user identity information and permission state information, and the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
the determining module is used for determining target authority state information corresponding to the target user identity information from the authority set;
and the execution module is used for executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
6. A voice control apparatus for use with a cloud server, the apparatus comprising:
the second receiving module is used for receiving the voice information sent by the vehicle machine equipment;
the recognition module is used for recognizing semantic information and target user identity information corresponding to the voice information;
the acquisition module is used for acquiring the permission set matched with the semantic information; the permission set comprises at least one group of corresponding user identity information and permission state information, wherein the permission state information is used for describing whether the user identity information has permission for executing the operation indicated by the semantic information;
the sending module is used for sending the semantic information, the target user identity information and the permission set to the vehicle-mounted equipment so as to be used for determining target permission state information corresponding to the target user identity information from the permission set by the vehicle-mounted equipment; and executing the operation indicated by the semantic information under the condition that the target authority state information meets the preset authority condition.
7. An electronic device, the device comprising: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the speech control method according to any one of claims 1-4.
8. A readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program instructions, which when executed by a processor, implement the speech control method according to any of claims 1-4.
CN202210519826.3A 2022-05-13 2022-05-13 Voice control method and device, electronic equipment and readable storage medium Pending CN117093972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519826.3A CN117093972A (en) 2022-05-13 2022-05-13 Voice control method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519826.3A CN117093972A (en) 2022-05-13 2022-05-13 Voice control method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117093972A true CN117093972A (en) 2023-11-21

Family

ID=88770381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519826.3A Pending CN117093972A (en) 2022-05-13 2022-05-13 Voice control method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117093972A (en)

Similar Documents

Publication Publication Date Title
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
US10643605B2 (en) Automatic multi-performance evaluation system for hybrid speech recognition
CN110265037B (en) Identity verification method and device, electronic equipment and computer readable storage medium
CN108335694B (en) Far-field environment noise processing method, device, equipment and storage medium
CN105989836B (en) Voice acquisition method and device and terminal equipment
US20140379332A1 (en) Identification of a local speaker
CN109410938A (en) Control method for vehicle, device and car-mounted terminal
CN111081279A (en) Voice emotion fluctuation analysis method and device
CN109326305B (en) Method and system for batch testing of speech recognition and text synthesis
CN109920435B (en) Voiceprint recognition method and voiceprint recognition device
CN112397065A (en) Voice interaction method and device, computer readable storage medium and electronic equipment
WO2021042537A1 (en) Voice recognition authentication method and system
CN210489237U (en) Vehicle-mounted intelligent terminal voice control system
CN112382300A (en) Voiceprint identification method, model training method, device, equipment and storage medium
CN115467787A (en) Motor state detection system and method based on audio analysis
CN113112992A (en) Voice recognition method and device, storage medium and server
CN109817224A (en) A kind of voice sensitive word monitor system and method
CN117093972A (en) Voice control method and device, electronic equipment and readable storage medium
CN111128198B (en) Voiceprint recognition method, voiceprint recognition device, storage medium, server and voiceprint recognition system
CN111862946B (en) Order processing method and device, electronic equipment and storage medium
CN112382266A (en) Voice synthesis method and device, electronic equipment and storage medium
CN117095698A (en) Alarm sound identification method and device, electronic equipment and storage medium
CN112992175B (en) Voice distinguishing method and voice recording device thereof
CN114360515A (en) Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product
US20030046036A1 (en) Time-series segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination