WO2021103788A1 - 智能音箱设置方法和装置、控制方法和装置、智能音箱 - Google Patents

智能音箱设置方法和装置、控制方法和装置、智能音箱 Download PDF

Info

Publication number
WO2021103788A1
WO2021103788A1 PCT/CN2020/117180 CN2020117180W WO2021103788A1 WO 2021103788 A1 WO2021103788 A1 WO 2021103788A1 CN 2020117180 W CN2020117180 W CN 2020117180W WO 2021103788 A1 WO2021103788 A1 WO 2021103788A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice information
content
guide word
custom setting
Prior art date
Application number
PCT/CN2020/117180
Other languages
English (en)
French (fr)
Inventor
吴晓洋
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2021103788A1 publication Critical patent/WO2021103788A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein

Definitions

  • the present disclosure relates to the field of control, and in particular to a method and device for setting a smart speaker, a control method and device, and a smart speaker.
  • users give voice instructions to the smart speakers, and the smart speakers parse the voice instructions, and according to the analysis results, query the corresponding content from a preset knowledge base and play it to the user.
  • a smart speaker control method which includes: extracting instruction content from the collected first user voice information; if the instruction content is a preset startup setting guide word, enter the custom Setting mode; collect the second user’s voice information; use the content extracted from the second user’s voice information as the custom setting guide word; collect the third user’s voice information as the custom setting content; set the custom setting guide word and self Define the setting content for associated storage.
  • the smart speaker control method further includes: after the content extracted from the second user's voice information is used as a custom setting guide word, detecting whether the custom setting guide word conflicts with an existing guide word; and , In the case that the custom setting guide word does not conflict with the existing guide word, the third user's voice information is collected as the custom setting content.
  • the smart speaker control method further includes: in the case that the custom setting guide word conflicts with the existing guide word, collecting the second user's voice information again, so that the second user's voice information will be collected from the second user's voice information.
  • the extracted content is used as a guide word for custom settings.
  • the smart speaker control method further includes: after entering the custom setting mode, collecting the fourth user's voice information; using the content extracted from the fourth user's voice information as the scene instruction, and enter the corresponding scene instruction according to the scene instruction. Scene, where, after entering the corresponding scene, the second user's voice information is collected.
  • associating and storing the custom setting guide word and the custom setting content includes: storing the custom setting guide word and the custom setting content in a cloud server in an associative manner.
  • the smart speaker control method further includes: extracting query information from the collected voice information of the fifth user; using the guide word in the query information to query the custom setting content associated with the guide word; Define the setting content.
  • the query information further includes scene information, and the customized setting content that is queried is also associated with the scene information.
  • a smart speaker control device including: an instruction extraction module configured to extract instruction content from the collected first user voice information; a mode control module configured to If the content is the preset startup setting guide word, it enters the custom setting mode; the guide word collection module is configured to collect the second user’s voice information, and use the content extracted from the second user’s voice information as the custom setting guide Words; the content collection module is configured to collect the third user's voice information as the custom setting content; the storage module is configured to associate the custom setting guide word and the custom setting content for storage.
  • the smart speaker control device further includes: an information extraction module configured to extract query information from the collected voice information of the fifth user; and a query module configured to query the guidance words using the guidance words in the query information The associated custom setting content; the playback module is configured to play the queried custom setting content.
  • a smart speaker control device including: a memory configured to store instructions; a processor coupled to the memory, and the processor is configured to execute any of the above based on instructions stored in the memory. The method described in an embodiment.
  • a smart speaker including the smart speaker control device as described in any of the foregoing embodiments.
  • a computer-readable storage medium wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, a method related to any of the above-mentioned embodiments is implemented.
  • FIG. 1 is a schematic flowchart of a method for setting a smart speaker according to some embodiments of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for setting smart speakers according to other embodiments of the present disclosure
  • Fig. 3 is a schematic structural diagram of a smart speaker setting device according to some embodiments of the present disclosure.
  • Fig. 4 is a schematic structural diagram of a smart speaker setting device according to other embodiments of the present disclosure.
  • Fig. 5 is a schematic flowchart of a method for controlling a smart speaker according to some embodiments of the present disclosure
  • Fig. 6 is a schematic structural diagram of a smart speaker control device according to some embodiments of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a smart speaker control device according to other embodiments of the present disclosure.
  • the knowledge base used by smart speakers is usually a pre-set knowledge base, such as public knowledge bases such as encyclopedia websites and music websites. Smart speakers will not use the private knowledge base corresponding to the user to provide users with personalized services.
  • the present disclosure provides a solution for smart speakers using a private knowledge base to provide users with personalized services.
  • Fig. 1 is a schematic flowchart of a method for setting a smart speaker according to some embodiments of the present disclosure. In some embodiments, the following steps of the smart speaker setting method are executed by the smart speaker setting device.
  • step 101 user voice information is collected, so as to extract instruction content from the collected user voice information.
  • step 102 if the instruction content is a preset startup setting guide word, enter the custom setting mode.
  • the voice information used to extract the startup setting guide word is the voice information of the first user.
  • the smart speaker enters the custom setting mode.
  • step 103 user voice information is collected, so that the content extracted from the collected user voice information is used as a self-defined guide word.
  • the voice information used to extract the self-defined guide word is the voice information of the second user.
  • step 104 the user's voice information is collected as custom setting content.
  • the voice information as the content of the custom setting is the third user voice information.
  • content such as fairy tales and children's songs can be entered according to the prompts of the smart speaker. For example, after the user says “Snow White”, he can start telling the story of Snow White according to the prompt of the smart speaker. You can pause during the telling, and after the story is finished, you can perform corresponding operations to end the entry. For example, click the end button of the smart speaker, or say "the story is over" to the smart speaker and other preset voice end instructions, so that the smart speaker ends the voice recording.
  • step 105 the custom setting guide word and the custom setting content are associated and stored.
  • the custom setting guide word and the custom setting content are associated and stored in the cloud server.
  • the user enters the custom setting guide word and the custom setting content through the custom setting mode, so as to construct a personalized knowledge base, so as to provide the user with personalized service.
  • Fig. 2 is a schematic flowchart of a method for setting a smart speaker according to some embodiments of the present disclosure. In some embodiments, the following steps of the smart speaker setting method are executed by the smart speaker setting device.
  • step 201 user voice information is collected, so as to extract instruction content from the collected user voice information.
  • step 202 if the instruction content is a preset startup setting guide word, enter the custom setting mode.
  • the smart speaker enters the custom setting mode.
  • step 203 user voice information is collected, so that the content extracted from the collected user voice information is used as a scene instruction, and the corresponding scene is entered according to the scene instruction.
  • the voice information used to extract the scene instruction is the fourth user voice information.
  • the scene includes a shared domain scene and a private domain scene.
  • the smart speaker enters the sharing field.
  • the content in the sharing field can be used to share via the network, thereby helping to accumulate fans.
  • the smart speaker enters the private sphere.
  • Content located in the private domain is for personal, family, or specific group use.
  • step 204 user voice information is collected, so that the content extracted from the collected user voice information is used as a self-defined guide word.
  • step S205 it is detected whether the user-defined guide word conflicts with the existing guide word, for example, through step S205.
  • step 205 it is detected whether the customized guide word conflicts with the existing guide word.
  • step 206 is executed; if the custom setting guide word conflicts with the existing guide word, step 204 is repeated.
  • step 206 the user's voice information is collected as custom setting content.
  • content such as fairy tales and children's songs can be entered according to the prompts of the smart speaker. For example, after the user says “Snow White”, he can start telling the story of Snow White according to the prompt of the smart speaker. You can pause during the telling, and after the story is finished, you can perform corresponding operations to end the entry. For example, click the end button of the smart speaker, or say "the story is over" to the smart speaker and other preset voice end instructions, so that the smart speaker ends the voice recording.
  • step 207 the custom setting guide word and the custom setting content are associated and stored.
  • the custom setting guide word and the custom setting content are associated and stored in the cloud server.
  • Fig. 3 is a schematic structural diagram of a smart speaker setting device according to some embodiments of the present disclosure.
  • the smart speaker setting device includes an instruction extraction module 31, a mode control module 32, a guide word acquisition module 33, a content acquisition module 34, and a storage module 35.
  • the instruction extraction module 31 is configured to collect user voice information, so as to extract instruction content from the collected user voice information.
  • the mode control module 32 is configured to enter the custom setting mode if the instruction content is a preset startup setting guide word.
  • the voice information used to extract the startup setting guide word is the voice information of the first user.
  • the smart speaker enters the custom setting mode.
  • the guide word collection module 33 is configured to collect user voice information, so that the content extracted from the collected user voice information is used as a self-defined guide word.
  • the voice information used to extract the self-defined guide word is the voice information of the second user.
  • the content collection module 34 is configured to collect user voice information as custom setting content.
  • the voice information as the content of the custom setting is the third user voice information.
  • content such as fairy tales and children's songs can be entered according to the prompts of the smart speaker. For example, after the user says “Snow White”, he can start telling the story of Snow White according to the prompt of the smart speaker. You can pause during the telling, and after the story is finished, you can perform corresponding operations to end the entry. For example, click the end button of the smart speaker, or say "the story is over" to the smart speaker and other preset voice end instructions, so that the smart speaker ends the voice recording.
  • the storage module 35 is configured to associate and store the custom setting guide word and the custom setting content.
  • the storage module 35 is configured to store the custom setting guide word and the custom setting content in a cloud server.
  • the guide word collection module 33 is configured to detect the custom setting guide words after the content extracted from the collected user voice information (for example, the second user voice information) is used as a custom setting guide word Whether it conflicts with existing guiding words. If the custom setting guide word does not conflict with the existing guide word, the guide word collection module 33 instructs the content collection module 34 to perform the operation of collecting user voice information (for example, the third user voice information) as the custom setting content. If the custom setting guide word conflicts with the existing guide word, the guide word collection module 33 collects the user voice information again, so that the content extracted from the collected user voice information is used as the custom setting guide word.
  • the guide word collection module 33 is configured to collect user voice information, such as the fourth user voice information, after entering the custom setting mode, so as to use the content extracted from the collected user voice information as a scene Instruction, and enter the corresponding scene according to the scene instruction, and then perform the operation of collecting user voice information, so that the content extracted from the collected user voice information is used as the operation of custom setting guide words.
  • Fig. 4 is a schematic structural diagram of a smart speaker setting device according to some embodiments of the present disclosure. As shown in FIG. 4, the device includes a memory 41 and a processor 42.
  • the memory 41 is used to store instructions.
  • the processor 42 is coupled to the memory 41.
  • the processor 42 is configured to execute the method related to any one of the embodiments in FIG. 1 or FIG. 2 based on instructions stored in the memory.
  • the device also includes a communication interface 43 for information exchange with other devices.
  • the device also includes a bus 44, and the processor 42, the communication interface 43, and the memory 41 communicate with each other through the bus 44.
  • the memory 41 may include high-speed RAM (Random Access Memory, random access memory), and may also include NVM (Non-Volatile Memory, non-volatile memory). For example, at least one disk storage.
  • the memory 41 may also be a memory array.
  • the memory 41 may also be divided into blocks, and the blocks may be combined into a virtual volume according to certain rules.
  • the processor 42 may be a central processing unit, or may be an ASIC (Application Specific Integrated Circuit, application specific integrated circuit), or be configured as one or more integrated circuits for implementing the embodiments of the present disclosure.
  • ASIC Application Specific Integrated Circuit, application specific integrated circuit
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the instructions are executed by the processor, the method involved in any one of the embodiments in FIG. 1 or FIG. 2 is implemented.
  • Fig. 5 is a schematic flowchart of a method for controlling a smart speaker according to some embodiments of the present disclosure.
  • the smart speaker is set using the method involved in any one of the embodiments in FIG. 1 or FIG. 2.
  • the following steps of the smart speaker control method are executed by the smart speaker control device.
  • step 501 user voice information is collected, so as to extract query information from the collected user voice information.
  • the voice information used to extract the query information is the fifth voice information.
  • step 502 the guide word in the query information is used to query the custom setting content associated with the guide word.
  • step 503 the inquired custom setting content is played.
  • the query information also includes scene information.
  • the content of the customized settings queried is also associated with the scene information.
  • the child’s parents have pre-entered the story of Snow White.
  • One day because the parents were on a business trip, they could not tell the story to the child before the child went to bed.
  • the grandmother of the child can say "Private Scene, Snow White” to the smart speaker, and the smart speaker queries and plays the audio content associated with the guide word "Snow White” in the private scene. In this way, even when the parents are not with the children, they can also tell stories to the children.
  • Fig. 6 is a schematic structural diagram of a smart speaker control device according to some embodiments of the present disclosure.
  • the smart speaker is set using the method involved in any one of the embodiments in FIG. 1 or FIG. 2.
  • the control device includes an information extraction module 61, a query module 62, and a playback module 63.
  • the information extraction module 61 is configured to collect user voice information, so as to extract query information from the collected user voice information.
  • the voice information used to extract the query information is the fifth voice information.
  • the query module 62 is configured to use the guide word in the query information to query the custom setting content associated with the guide word.
  • the playing module 63 is configured to play the customized setting content found out.
  • the query information also includes scene information.
  • the content of the inquired custom setting is also associated with the scene information.
  • Fig. 7 is a schematic structural diagram of a smart speaker control device according to other embodiments of the present disclosure.
  • the device includes a memory 71, a processor 72, a communication interface 73 and a bus 74.
  • the difference between FIG. 7 and FIG. 4 is that in the embodiment shown in FIG. 7, the processor 72 executes the method involved in any of the embodiments in FIG. 6 based on the execution of instructions stored in the memory 71.
  • the present disclosure also provides a smart speaker.
  • the smart speaker includes at least one of the smart speaker setting device related to any one of the embodiments in FIG. 3 or FIG. 4 and the smart speaker control device related to any one of the embodiments in FIG. 6 or FIG. 7.
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the instructions are executed by the processor, the method involved in any of the embodiments in FIG. 5 is implemented.
  • the above-mentioned functional modules may be implemented as general-purpose processors, programmable logic controllers (Programmable Logic Controller, PLC for short), and digital signal processors (Digital Signal Processor, for short) for performing the functions described in the present disclosure. : DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware Components or any appropriate combination thereof.
  • DSP programmable logic controllers
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种智能音箱设置方法和装置、控制方法和装置、智能音箱,涉及控制领域。智能音箱设置方法包括:采集第一用户语音信息,以便从第一用户语音信息中提取出指示内容(101);若指示内容是预设的启动设置引导词,则进入自定义设置模式(102);采集第二用户语音信息,并将从第二用户语音信息中提取出的内容作为自定义设置引导词(103);采集第三用户语音信息以作为自定义设置内容(104);将自定义设置引导词和自定义设置内容进行关联存储(105)。可根据用户语音将与自定义设置引导词相关联的自定义设置内容进行播放,从而有效提升了用户体验。

Description

智能音箱设置方法和装置、控制方法和装置、智能音箱
相关申请的交叉引用
本申请是以CN申请号为201911171447.4,申请日为2019年11月26日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及控制领域,特别涉及一种智能音箱设置方法和装置、控制方法和装置、智能音箱。
背景技术
目前,在智能音箱领域中,用户通过向智能音箱发出语音指示,智能音箱通过对语音指示进行解析,并根据解析结果从预先设置的知识库中查询相应内容并播放给用户。
发明内容
根据本公开实施例的第一方面,提供一种智能音箱控制方法,包括:从采集的第一用户语音信息中提取出指示内容;若指示内容是预设的启动设置引导词,则进入自定义设置模式;采集第二用户语音信息;将从第二用户语音信息中提取出的内容作为自定义设置引导词;采集第三用户语音信息以作为自定义设置内容;将自定义设置引导词和自定义设置内容进行关联存储。
在一些实施例中,智能音箱控制方法还包括:在将从第二用户语音信息中提取出的内容作为自定义设置引导词后,检测自定义设置引导词是否与已有引导词发生冲突;并且,在自定义设置引导词不与已有引导词发生冲突的情况下,采集第三用户语音信息以作为自定义设置内容。
在一些实施例中,智能音箱控制方法还包括:在自定义设置引导词与已有引导词发生冲突的情况下,再次采集第二用户语音信息,以便将从再次采集的第二用户语音信息中提取出的内容作为自定义设置引导词。
在一些实施例中,智能音箱控制方法还包括:在进入自定义设置模式后,采集第四用户语音信息;将从第四用户语音信息中提取出的内容作为场景指令,并根据场景 指令进入相应场景,其中,在进入相应场景后,采集第二用户语音信息。
在一些实施例中,将自定义设置引导词和自定义设置内容进行关联存储包括:将自定义设置引导词和自定义设置内容在云服务器中进行关联存储。
在一些实施例中,智能音箱控制方法还包括:从采集的第五用户语音信息中提取出查询信息;利用查询信息中的引导词,查询引导词关联的自定义设置内容;播放查询出的自定义设置内容。
在一些实施例中,查询信息还包括场景信息,并且查询出的自定义设置内容还与场景信息相关联。
根据本公开实施例的第二方面,提供一种智能音箱控制装置,包括:指示提取模块,被配置为从采集的第一用户语音信息中提取出指示内容;模式控制模块,被配置为若指示内容是预设的启动设置引导词,则进入自定义设置模式;引导词采集模块,被配置为采集第二用户语音信息,并将从第二用户语音信息中提取出的内容作为自定义设置引导词;内容采集模块,被配置为采集第三用户语音信息以作为自定义设置内容;存储模块,被配置为将自定义设置引导词和自定义设置内容进行关联存储。
在一些实施例中,智能音箱控制装置还包括:信息提取模块,被配置为从采集的第五用户语音信息中提取出查询信息;查询模块,被配置为利用查询信息中的引导词查询引导词关联的自定义设置内容;播放模块,被配置为播放查询出的自定义设置内容。
根据本公开实施例的第三方面,提供一种智能音箱控制装置,包括:存储器,被配置为存储指令;处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如上述任一实施例所述的方法。
根据本公开实施例的第四方面,提供一种智能音箱,包括如上述任一实施例所述的智能音箱控制装置。
根据本公开实施例的第五方面,提供一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如上述任一实施例涉及的方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1是根据本公开一些实施例的智能音箱设置方法的流程示意图;
图2是根据本公开另一些实施例的智能音箱设置方法的流程示意图;
图3根据本公开一些实施例的智能音箱设置装置的结构示意图;
图4根据本公开另一些实施例的智能音箱设置装置的结构示意图;
图5是根据本公开一些实施例的智能音箱控制方法的流程示意图;
图6是根据本公开一些实施例的智能音箱控制装置的结构示意图;
图7是根据本公开另一些实施例的智能音箱控制装置的结构示意图。
应当明白,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。此外,相同或类似的参考标号表示相同或类似的构件。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。对示例性实施例的描述仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。本公开可以以许多不同的形式实现,不限于这里所述的实施例。提供这些实施例是为了使本公开透彻且完整,并且向本领域技术人员充分表达本公开的范围。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、材料的组分和数值应被解释为仅仅是示例性的,而不是作为限制。
本公开中使用的“包括”或者“包含”等类似的词语意指在该词前的要素涵盖在该词后列举的要素,并不排除也涵盖其他要素的可能。
本公开使用的所有术语(包括技术术语或者科学术语)与本公开所属领域的普通技术人员理解的含义相同,除非另外特别定义。还应当理解,在诸如通用字典中定义的术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
发明人通过研究发现,在相关技术中,智能音箱使用的知识库通常是预先设置的知识库,例如百科网站、音乐网站等公共的知识库。智能音箱并不会使用与用户相对 应的私有知识库为用户提供个性化服务。
据此,本公开提供一种智能音箱使用私有知识库为用户提供个性化服务的方案。
图1是根据本公开一些实施例的智能音箱设置方法的流程示意图。在一些实施例中,下面的智能音箱设置方法步骤由智能音箱设置装置执行。
在步骤101,采集用户语音信息,以便从所采集的用户语音信息中提取出指示内容。
在步骤102,若指示内容是预设的启动设置引导词,则进入自定义设置模式。在一些实施例中,用于提取启动设置引导词的语音信息为第一用户语音信息。
例如,若用户说“设计自定义技能”,其中启动设置引导词为“自定义技能”,则智能音箱进入自定义设置模式。
在步骤103,采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为自定义设置引导词。在一些实施例中,用于提取自定义设置引导词的语音信息为第二用户语音信息。
例如,在智能音箱进入自定义设置模式后,若用户说“白雪公主”,则将“白雪公主”作为用户自定义的技能引导词。
在步骤104,采集用户语音信息以作为自定义设置内容。在一些实施例中,作为自定义设置内容的语音信息为第三用户语音信息。
例如,在用户说出自定义设置引导词后,可根据智能音箱的提示录入童话故事、儿童歌曲等内容。如在用户说“白雪公主”后,可根据智能音箱的提示开始讲白雪公主的故事。在讲的过程中可以暂停,将故事讲完后可进行相应操作以结束录入。例如点击智能音箱的结束按钮等,或者对智能音箱说“故事讲完了”等预设的语音结束指令,以便智能音箱结束语音录入。
在步骤105,将自定义设置引导词和自定义设置内容进行关联存储。
在一些实施例中,将自定义设置引导词和自定义设置内容在云服务器中进行关联存储。
在本公开上述实施例提供的智能音箱设置方法中,用户通过自定义设置模式录入自定义设置引导词和自定义设置内容,以便构建个性化的知识库,从而能够为用户提供个性化的服务。
图2是根据本公开一些实施例的智能音箱设置方法的流程示意图。在一些实施例中,下面的智能音箱设置方法步骤由智能音箱设置装置执行。
在步骤201,采集用户语音信息,以便从所采集的用户语音信息中提取出指示内容。
在步骤202,若指示内容是预设的启动设置引导词,则进入自定义设置模式。
例如,若用户说“设计自定义技能”,则智能音箱进入自定义设置模式。
在步骤203,采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为场景指令,并根据场景指令进入相应场景。在一些实施例中,用于提取场景指令的语音信息为第四用户语音信息。
在一些实施例中,场景包括分享领域的场景和私人领域的场景。例如,若用户说“进入分享领域”,则智能音箱进入分享领域。位于分享领域中的内容可用于通过网络分享,从而有助于积累粉丝。若用户说“进入私人领域”,则智能音箱进入私人领域。位于私人领域中的内容用于个人、家庭或特定人群使用。
在步骤204,采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为自定义设置引导词。
例如,在智能音箱进入自定义设置模式后,若用户说“白雪公主”,则将“白雪公主”作为用户自定义的技能引导词。
在一些实施例中,检测用户的自定义设置引导词是否与已有引导词发生冲突,例如通过步骤S205实现。
在步骤205,检测自定义设置引导词是否与已有引导词发生冲突。
若自定义设置引导词不与已有引导词发生冲突,则执行步骤206;若自定义设置引导词与已有引导词发生冲突,则重复执行步骤204。
例如,若用户说“白雪公主”,但之前已使用“白雪公主”作为引导词了,为了避免冲突,用户可将说出的内容调整为“宝宝爱听的白雪公主”。
在步骤206,采集用户语音信息以作为自定义设置内容。
例如,在用户说出自定义设置引导词后,可根据智能音箱的提示录入童话故事、儿童歌曲等内容。如在用户说“白雪公主”后,可根据智能音箱的提示开始讲白雪公主的故事。在讲的过程中可以暂停,将故事讲完后可进行相应操作以结束录入。例如点击智能音箱的结束按钮等,或者对智能音箱说“故事讲完了”等预设的语音结束指令,以便智能音箱结束语音录入。
在步骤207,将自定义设置引导词和自定义设置内容进行关联存储。
在一些实施例中,将自定义设置引导词和自定义设置内容在云服务器中进行关联 存储。
图3根据本公开一些实施例的智能音箱设置装置的结构示意图。如图3所示,智能音箱设置装置包括指示提取模块31、模式控制模块32、引导词采集模块33、内容采集模块34和存储模块35。
指示提取模块31被配置为采集用户语音信息,以便从所采集的用户语音信息中提取出指示内容。
模式控制模块32被配置为若指示内容是预设的启动设置引导词,则进入自定义设置模式。在一些实施例中,用于提取启动设置引导词的语音信息为第一用户语音信息。
例如,若用户说“设计自定义技能”,则智能音箱进入自定义设置模式。
引导词采集模块33被配置为采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为自定义设置引导词。在一些实施例中,用于提取自定义设置引导词的语音信息为第二用户语音信息。
例如,在智能音箱进入自定义设置模式后,若用户说“白雪公主”,则将“白雪公主”作为用户自定义的技能引导词。
内容采集模块34被配置为采集用户语音信息以作为自定义设置内容。在一些实施例中,作为自定义设置内容的语音信息为第三用户语音信息。
例如,在用户说出自定义设置引导词后,可根据智能音箱的提示录入童话故事、儿童歌曲等内容。如在用户说“白雪公主”后,可根据智能音箱的提示开始讲白雪公主的故事。在讲的过程中可以暂停,将故事讲完后可进行相应操作以结束录入。例如点击智能音箱的结束按钮,或者对智能音箱说“故事讲完了”等预设的语音结束指令,以便智能音箱结束语音录入。
存储模块35被配置为将自定义设置引导词和自定义设置内容进行关联存储。
在一些实施例中,存储模块35被配置为将自定义设置引导词和自定义设置内容在云服务器中进行关联存储。
在一些实施例中,引导词采集模块33被配置为在将从所采集的用户语音信息(例如第二用户语音信息)中提取出的内容作为自定义设置引导词后,检测自定义设置引导词是否与已有引导词发生冲突。若自定义设置引导词不与已有引导词发生冲突,则引导词采集模块33指示内容采集模块34执行采集用户语音信息(例如第三用户语音信息)以作为自定义设置内容的操作。若自定义设置引导词与已有引导词发生冲突, 则引导词采集模块33再次采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为自定义设置引导词。
在一些实施例中,引导词采集模块33被配置为在进入自定义设置模式后,采集用户语音信息,例如第四用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为场景指令,并根据场景指令进入相应场景,然后执行采集用户语音信息,以便将从所采集的用户语音信息中提取出的内容作为自定义设置引导词的操作。
图4根据本公开一些实施例的智能音箱设置装置的结构示意图。如图4所示,该装置包括存储器41和处理器42。
存储器41用于存储指令。处理器42耦合到存储器41。处理器42被设置为基于存储器存储的指令执行实现如图1或图2中任一实施例涉及的方法。
如图4所示,该装置还包括通信接口43,用于与其它设备进行信息交互。同时,该装置还包括总线44,处理器42、通信接口43、以及存储器41通过总线44完成相互间的通信。
存储器41可以包含高速RAM(Random Access Memory,随机存取存储器),也可还包括NVM(Non-Volatile Memory,非易失性存储器)。例如至少一个磁盘存储器。存储器41也可以是存储器阵列。存储器41还可能被分块,并且块可按一定的规则组合成虚拟卷。
此外,处理器42可以是一个中央处理器,或者可以是ASIC(Application Specific Integrated Circuit,专用集成电路),或者是被设置成实施本公开实施例的一个或多个集成电路。
本公开还提供一种计算机可读存储介质。计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如图1或图2中任一实施例涉及的方法。
图5是根据本公开一些实施例的智能音箱控制方法的流程示意图。智能音箱利用图1或图2中任一实施例涉及的方法进行设置。在一些实施例中,下面的智能音箱控制方法步骤由智能音箱控制装置执行。
在步骤501,采集用户语音信息,以便从所采集的用户语音信息中提取出查询信息。在一些实施例中,用于提取查询信息的语音信息为第五语音信息。
在步骤502,利用查询信息中的引导词,查询引导词关联的自定义设置内容。
在步骤503,播放查询出的自定义设置内容。
在一些实施例中,查询信息还包括场景信息。查询出的自定义设置内容还与场景 信息相关联。
例如,孩子父母预先录入了白雪公主的故事。某天因父母出差,无法在孩子睡觉前给孩子将故事。孩子奶奶就可以给智能音箱说“私人场景,白雪公主”,则智能音箱在私人场景下查询与引导词“白雪公主”相关联的音频内容并进行播放。从而在父母不在孩子身边的情况下,也能给孩子讲故事。
图6是根据本公开一些实施例的智能音箱控制装置的结构示意图。智能音箱利用图1或图2中任一实施例涉及的方法进行设置。控制装置包括信息提取模块61、查询模块62和播放模块63。
信息提取模块61被配置为采集用户语音信息,以便从所采集的用户语音信息中提取出查询信息。在一些实施例中,用于提取查询信息的语音信息为第五语音信息。
查询模块62被配置为利用查询信息中的引导词,查询引导词关联的自定义设置内容。
播放模块63被配置为播放查询出的自定义设置内容。
在一些实施例中,查询信息还包括场景信息。查询出的自定义设置内容还与场景信息相关联。
图7是根据本公开另一些实施例的智能音箱控制装置的结构示意图。该装置包括存储器71、处理器72、通信接口73和总线74。图7与图4的不同之处在于,在图7所示实施例中,处理器72基于存储器71存储的指令执行实现如图6中任一实施例涉及的方法。
本公开还提供一种智能音箱。智能音箱包括如图3或图4中任一实施例涉及的智能音箱设置装置,和如图6或图7中任一实施例涉及的智能音箱控制装置中的至少一种。
本公开还提供一种计算机可读存储介质。计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如图5中任一实施例涉及的方法。
在一些实施例中,上述功能模块可以实现为用于执行本公开所描述功能的通用处理器、可编程逻辑控制器(Programmable Logic Controller,简称:PLC)、数字信号处理器(Digital Signal Processor,简称:DSP)、专用集成电路(Application Specific Integrated Circuit,简称:ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意适当组合。
至此,已经详细描述了本公开的实施例。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改或者对部分技术特征进行等同替换。本公开的范围由所附权利要求来限定。

Claims (12)

  1. 一种智能音箱控制方法,包括:
    从采集的第一用户语音信息中提取出指示内容;
    若所述指示内容是预设的启动设置引导词,则进入自定义设置模式;
    采集第二用户语音信息;
    将从所述第二用户语音信息中提取出的内容作为自定义设置引导词;
    采集第三用户语音信息以作为自定义设置内容;以及
    将所述自定义设置引导词和所述自定义设置内容进行关联存储。
  2. 根据权利要求1所述的方法,其中:
    所述智能音箱控制方法还包括:在将从所述第二用户语音信息中提取出的内容作为自定义设置引导词后,检测所述自定义设置引导词是否与已有引导词发生冲突;并且,
    在所述自定义设置引导词不与已有引导词发生冲突的情况下,采集所述第三用户语音信息以作为自定义设置内容。
  3. 根据权利要求2所述的方法,还包括:
    在所述自定义设置引导词与已有引导词发生冲突的情况下,再次采集第二用户语音信息,以便将从再次采集的第二用户语音信息中提取出的内容作为自定义设置引导词。
  4. 根据权利要求1所述的方法,还包括:
    在进入自定义设置模式后,采集第四用户语音信息;以及
    将从所述第四用户语音信息中提取出的内容作为场景指令,并根据场景指令进入相应场景,其中,在进入相应场景后,采集所述第二用户语音信息。
  5. 根据权利要求1所述的方法,其中,将所述自定义设置引导词和所述自定义设置内容进行关联存储包括:
    将所述自定义设置引导词和所述自定义设置内容在云服务器中进行关联存储。
  6. 根据权利要求1-5中任一项所述的方法,还包括:
    从采集的第五用户语音信息中提取出查询信息;
    利用所述查询信息中的引导词,查询所述引导词关联的自定义设置内容;以及
    播放查询出的自定义设置内容。
  7. 根据权利要求6所述的方法,其中,所述查询信息还包括场景信息,并且所述查询出的自定义设置内容还与所述场景信息相关联。
  8. 一种智能音箱控制装置,包括:
    指示提取模块,被配置为从采集的第一用户语音信息中提取出指示内容;
    模式控制模块,被配置为若所述指示内容是预设的启动设置引导词,则进入自定义设置模式;
    引导词采集模块,被配置为采集第二用户语音信息,并将从所述第二用户语音信息中提取出的内容作为自定义设置引导词;
    内容采集模块,被配置为采集第三用户语音信息以作为自定义设置内容;以及
    存储模块,被配置为将所述自定义设置引导词和所述自定义设置内容进行关联存储。
  9. 根据权利要求8所述的智能音箱控制装置,还包括:
    信息提取模块,被配置为从采集的第五用户语音信息中提取出查询信息;
    查询模块,被配置为利用所述查询信息中的引导词查询所述引导词关联的自定义设置内容;
    播放模块,被配置为播放查询出的自定义设置内容。
  10. 一种智能音箱控制装置,包括:
    存储器,被配置为存储指令;
    处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如权利要求1-7中任一项所述的智能音箱控制方法。
  11. 一种智能音箱,包括如权利要求8-10中任一项所述的智能音箱控制装置。
  12. 一种非瞬时性计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如权利要求1-7中任一项所述的方法。
PCT/CN2020/117180 2019-11-26 2020-09-23 智能音箱设置方法和装置、控制方法和装置、智能音箱 WO2021103788A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911171447.4A CN111785265A (zh) 2019-11-26 2019-11-26 智能音箱设置方法和装置、控制方法和装置、智能音箱
CN201911171447.4 2019-11-26

Publications (1)

Publication Number Publication Date
WO2021103788A1 true WO2021103788A1 (zh) 2021-06-03

Family

ID=72755753

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117180 WO2021103788A1 (zh) 2019-11-26 2020-09-23 智能音箱设置方法和装置、控制方法和装置、智能音箱

Country Status (2)

Country Link
CN (1) CN111785265A (zh)
WO (1) WO2021103788A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150128185A1 (en) * 2012-05-16 2015-05-07 Tata Consultancy Services Limited System and method for personalization of an applicance by using context information
CN105404161A (zh) * 2015-11-02 2016-03-16 百度在线网络技术(北京)有限公司 智能语音交互方法和装置
CN106792044A (zh) * 2016-12-16 2017-05-31 Tcl集团股份有限公司 一种智能电视的语音控制方法和装置
CN108717853A (zh) * 2018-05-09 2018-10-30 深圳艾比仿生机器人科技有限公司 一种人机语音交互方法、装置及存储介质
CN108831469A (zh) * 2018-08-06 2018-11-16 珠海格力电器股份有限公司 语音命令定制方法、装置和设备及计算机存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4235645A3 (en) * 2016-07-06 2023-10-04 DRNC Holdings, Inc. System and method for customizing smart home speech interfaces using personalized speech profiles
CN109961780B (zh) * 2017-12-22 2024-02-02 深圳市优必选科技有限公司 一种人机交互方法、装置、服务器和存储介质
CN108877790A (zh) * 2018-05-21 2018-11-23 江西午诺科技有限公司 音箱控制方法、装置、可读存储介质及移动终端
CN110060682B (zh) * 2019-04-28 2021-10-22 Oppo广东移动通信有限公司 音箱控制方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150128185A1 (en) * 2012-05-16 2015-05-07 Tata Consultancy Services Limited System and method for personalization of an applicance by using context information
CN105404161A (zh) * 2015-11-02 2016-03-16 百度在线网络技术(北京)有限公司 智能语音交互方法和装置
CN106792044A (zh) * 2016-12-16 2017-05-31 Tcl集团股份有限公司 一种智能电视的语音控制方法和装置
CN108717853A (zh) * 2018-05-09 2018-10-30 深圳艾比仿生机器人科技有限公司 一种人机语音交互方法、装置及存储介质
CN108831469A (zh) * 2018-08-06 2018-11-16 珠海格力电器股份有限公司 语音命令定制方法、装置和设备及计算机存储介质

Also Published As

Publication number Publication date
CN111785265A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2017084185A1 (zh) 基于语义分析的智能终端控制方法、***及智能终端
US20190304466A1 (en) Voice control method, voice control device and computer readable storage medium
US8972260B2 (en) Speech recognition using multiple language models
WO2019001194A1 (zh) 语音识别方法、装置、设备及存储介质
WO2018188586A1 (zh) 一种用户注册方法、装置及电子设备
DK3257043T3 (en) Speaker recognition in a multimedia system
CN204496731U (zh) 一种语音控制听写装置
WO2020114384A1 (zh) 一种语音交互方法和装置
TWI554984B (zh) 電子裝置
KR20140089876A (ko) 대화형 인터페이스 장치 및 그의 제어 방법
CN102568478A (zh) 一种基于语音识别的视频播放控制方法和***
JP5779032B2 (ja) 話者分類装置、話者分類方法および話者分類プログラム
US9870772B2 (en) Guiding device, guiding method, program, and information storage medium
JP2020009440A (ja) 情報を生成するための方法と装置
KR20190093492A (ko) 음악 인식 스마트 스피커
CN105006179A (zh) 语音输入的内容跟读方法和装置
TW201537559A (zh) 新增口說語彙的語音辨識系統與方法及電腦可讀取媒體
WO2021103788A1 (zh) 智能音箱设置方法和装置、控制方法和装置、智能音箱
CN105868400A (zh) 录音信息处理方法及装置
US20140376885A1 (en) Method for playing video file and electronic device using the same
TW201423733A (zh) 音頻處理系統與音頻處理方法
CN104427263B (zh) 一种显示字幕的方法和多媒体播放装置
JP5997813B2 (ja) 話者分類装置、話者分類方法および話者分類プログラム
JP6709558B2 (ja) 会話処理装置
US20140350705A1 (en) Music playing system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20893732

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20893732

Country of ref document: EP

Kind code of ref document: A1