CN110534094B

CN110534094B - Voice interaction method, device and equipment

Info

Publication number: CN110534094B
Application number: CN201910701320.2A
Authority: CN
Inventors: 左亚军
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2022-05-31
Anticipated expiration: 2039-07-31
Also published as: CN110534094A

Abstract

The embodiment of the invention discloses a voice interaction method, a device and equipment, wherein the method comprises the following steps: determining a scene simulator associated with voice data and control parameters of the scene simulator by analyzing the voice data of a user; controlling a scene simulator based on the determined control parameters to assist the voice data in scene simulation; therefore, in the scheme, the speaker only needs to send out voice, the equipment can acquire the voice data, the scene simulator is controlled to assist the voice data to perform scene simulation by analyzing the voice data, the auxiliary effect of the speaker is achieved, and besides language expression, less manpower is consumed.

Description

Voice interaction method, device and equipment

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a device for speech interaction.

Background

In some scenarios, the speaker tells a story, shares content, or disseminates a point of view to an audience; for example, a preschool teacher or a parent tells a fairy tale to a child to inspire the imagination of the child; for another example, in a middle school class, a teacher tells a historical story or a romantic life of some celebrities for students to provide rich knowledge for the students; as another example, in a speech scene, a speaker speaks his own opinion to an audience to elicit a thought of the audience or to gain support from the audience.

The speaker typically employs some assistance to enhance the audience's sense of substitution so that the audience can feel like the speaker. For example, a kindergarten teacher may make some things related to a story in the course of telling the story, for example, when telling the story of a tortoise-rabbit race, a rabbit doll and a tortoise doll may be made to enhance the sense of substitution of children. In the course of the teacher giving lessons, some scenes can be displayed to students more intuitively through PPT (Power Point, presentation software) so as to enhance the substituting sense of the students. During the process of the lecture of the lecturer, some scenes can be displayed to the audience more intuitively through the PPT so as to enhance the substitution feeling of the audience.

In the above solutions, the speaker needs to make related articles in advance or prepare assistant work such as PPT, and it is obvious that such solutions require much labor in addition to language expression.

Disclosure of Invention

In view of the above, the present invention provides a voice interaction method, device and apparatus, which play a role in assisting a speaker and consume less manpower besides language expression.

Based on the above purpose, an embodiment of the present invention provides a voice interaction method, including:

acquiring voice data of a user;

determining a scene simulator associated with the voice data and control parameters of the scene simulator by analyzing the voice data;

controlling the scene simulator based on the determined control parameters to assist the speech data in scene simulation.

Optionally, the determining a scene simulator associated with the voice data and a control parameter of the scene simulator by analyzing the voice data includes:

determining a keyword in the voice data as a keyword to be processed by analyzing the voice data;

and determining the scene simulator and the control parameters thereof corresponding to the keywords to be processed according to the corresponding relation between the acquired keywords and the scene simulator and the control parameters thereof.

Optionally, after the determining the keyword in the voice data by analyzing the voice data, as the keyword to be processed, the method further includes:

if the keywords to be processed are date information, acquiring associated data of the date information, wherein the associated data comprises any one or more of the following items: weather data, holiday data, user tag data;

and determining a scene simulator corresponding to the associated data and control parameters of the scene simulator.

Optionally, the controlling the scene simulator based on the determined control parameter to assist the voice data in scene simulation includes any one or more of the following steps:

controlling the sound player to play the sound corresponding to the keyword based on the determined control parameter of the sound player;

controlling the projector to project based on the determined control parameters of the projector so as to simulate a scene corresponding to the keyword;

adjusting a light emitter based on the determined light parameters to simulate light corresponding to the keyword;

and adjusting the light transmittance of the window glass based on the determined light transmittance parameters of the window glass so as to simulate light rays corresponding to the keywords.

Optionally, the method further includes:

after receiving a starting instruction sent by a user, determining a lecture theme of the user;

and acquiring a data packet corresponding to the theme, wherein the data packet comprises a corresponding relation between the keywords associated with the theme and the scene simulator and the control parameters thereof.

Optionally, the controlling the scene simulator based on the determined control parameter to assist the voice data in scene simulation includes:

if a plurality of keywords to be processed are determined and conflicts exist among the control parameters of the scene simulator corresponding to the keywords to be processed, determining the priority sequence among the keywords to be processed according to the preset priority of each keyword, or determining the priority sequence among the keywords to be processed through semantic analysis;

determining a scene simulator to be processed and control parameters thereof according to the determined priority sequence;

and controlling the to-be-processed scene simulator based on the determined control parameters to assist the voice data in scene simulation.

Based on the above object, an embodiment of the present invention further provides a voice interaction apparatus, including:

the first acquisition module is used for acquiring voice data of a user;

the first determination module is used for determining a scene simulator related to the voice data and control parameters of the scene simulator by analyzing the voice data;

and the control module is used for controlling the scene simulator based on the determined control parameters so as to assist the voice data in carrying out scene simulation.

Optionally, the first determining module is specifically configured to:

and determining the scene simulator corresponding to the keywords to be processed and the control parameters thereof according to the corresponding relation between the acquired keywords and the scene simulator and the control parameters thereof.

Optionally, the apparatus further comprises:

a second obtaining module, configured to obtain associated data of the date information if the keyword to be processed is the date information, where the associated data includes any one or more of the following items: weather data, holiday data, user tag data;

and the second determining module is used for determining the scene simulator corresponding to the associated data and the control parameters of the scene simulator.

Optionally, the control module is specifically configured to execute any one or more of the following steps:

Optionally, the apparatus further comprises:

the third determining module is used for determining the lecture theme of the user after receiving the starting instruction sent by the user;

and the third acquisition module is used for acquiring a data packet corresponding to the theme, wherein the data packet comprises the corresponding relation between the keywords associated with the theme and the scene simulator and the control parameters thereof.

Optionally, the control module is further configured to:

In view of the above object, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements any one of the above voice interaction methods when executing the program.

By applying the embodiment of the invention, the scene simulator related to the voice data and the control parameters of the scene simulator are determined by analyzing the voice data of the user; controlling a scene simulator based on the determined control parameters to assist the voice data in scene simulation; therefore, in the scheme, the speaker only needs to send out voice, the equipment can acquire the voice data, the scene simulator is controlled to assist the voice data to perform scene simulation by analyzing the voice data, the auxiliary effect of the speaker is achieved, and besides language expression, less manpower is consumed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a first flowchart of a voice interaction method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a voice interaction method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a voice interaction apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In order to solve the technical problem, embodiments of the present invention provide a voice interaction method, apparatus, and device, where the method and apparatus may be applied to vehicle-mounted devices, home equipment, teaching equipment, and the like, and are not limited specifically. First, a voice interaction method provided by an embodiment of the present invention is described below.

Fig. 1 is a first flowchart of a voice interaction method according to an embodiment of the present invention, including:

s101: voice data of a user is acquired.

For example, in a vehicle-mounted scene or a home scene, when parents tell a story to a child, the scheme can be adopted for assistance so as to enhance the sense of substitution of the child; thus, in S101, the voice data of the parent is acquired. As another example, in a teaching scene, when a teacher tells a historical story or a romantic life of some celebrities to a student, the scheme can be adopted for assistance so as to enhance the substitution feeling of the student; thus, the voice data of the teacher is acquired in S101. The scheme can also be applied to a lecture scene or other scenes, and the voice data of other speakers can be acquired in S101, and the specific application scene, the speakers and the lecture content are not limited.

The electronic device (execution main body, hereinafter referred to as the electronic device) executing the scheme can be internally provided with a voice acquisition module, and the electronic device acquires voice data of a speaker through the voice acquisition module. Or the electronic equipment can be in communication connection with the voice acquisition equipment, and the electronic equipment acquires the voice data of the speaker through the voice acquisition equipment.

In one embodiment, after the user starts the auxiliary mode, the electronic device executes the scheme again, so that the situation of false triggering can be reduced. For example, the user may start the auxiliary mode through a voice instruction, or the electronic device may present an interactive interface, and the user starts the auxiliary mode by clicking a button in the interactive interface.

S102: by parsing the voice data, a scene simulator associated with the voice data, and control parameters of the scene simulator are determined.

For example, the scene simulator may include any one or more of: sound player, projector, illuminator, window glass, smell generator, air volume controller, humidity controller, temperature controller. These scene simulators may be utilized to assist in the simulation of a scene with speech data.

In one embodiment, S102 may include: determining a keyword in the voice data as a keyword to be processed by analyzing the voice data; and determining the scene simulator corresponding to the keywords to be processed and the control parameters thereof according to the corresponding relation between the acquired keywords and the scene simulator and the control parameters thereof.

For example, keywords in the speech data may be identified through a semantic analysis algorithm. Alternatively, a keyword recognition model may be obtained by training in advance, and the keyword recognition model may be used to recognize a keyword in the speech data. For example, sample voice data and a corresponding keyword tag thereof may be obtained, the sample voice data is input into a neural network with a preset structure, the keyword tag is used as a supervision, and parameters in the neural network are iteratively adjusted until a convergence condition is met, so as to obtain a trained keyword recognition model.

In one embodiment, a general database may be stored in advance, and the general database includes some corresponding relations between general keywords and the scene simulator and its control parameters. For example, the scene simulator corresponding to the keyword "raining" may be a sound player, and the corresponding control parameter may be a parameter related to playing a sound file of "raining". For another example, the scene simulator corresponding to the keyword "moon" may be a projector, and the control parameter may be a parameter related to the projection "moon".

In another embodiment, after receiving a starting instruction sent by a user, determining a lecture subject of the user; and acquiring a data packet corresponding to the theme, wherein the data packet comprises the corresponding relation between the keywords associated with the theme and the scene simulator and the control parameters thereof.

For example, the start command can be understood as a command for the user to start the auxiliary mode in the above description. The lecture subject of the user may be a name of a lecture story, a lecture subject, or the like, and is not limited in particular. In this embodiment, after receiving a start instruction sent by the user, the voice data of the user may be acquired, and the lecture topic may be determined by analyzing the voice data.

For example, the spoken subject in the speech data may be recognized by a semantic analysis algorithm. Alternatively, a topic recognition model may be obtained by training in advance, and the topic recognition model may be used to recognize a topic of a lecture in speech data. For example, sample voice data and a corresponding topic label thereof may be obtained, the sample voice data is input into a neural network with a preset structure, and parameters in the neural network are iteratively adjusted by using the topic label as supervision until a convergence condition is met, so as to obtain a trained topic recognition model.

Taking the scene that the parents tell the story for the child as an example, the parents can tell some classic fairy tales for the child. The parents can say 'the story i want to say next is the princess of snow and white' to children 'after opening the auxiliary mode, so, this electronic equipment confirms that the theme is the story of snow and white' by analyzing this speech data. The electronic equipment can acquire a data packet corresponding to the story of the white snow princess, wherein the data packet comprises the corresponding relation between keywords associated with the story of the white snow princess and the scene simulator and the control parameters thereof.

The method is suitable for telling scenes of known stories, and keywords appearing in the stories generally have small changes, so that corresponding data packages can be established in advance aiming at the known stories, wherein the data packages comprise corresponding relations between the keywords in the stories and the scene simulator and control parameters thereof.

Alternatively, the present embodiment is also applicable to a scene in which other contents such as a known poem and a prose are described, and is not limited specifically. When other contents such as unknown stories, lectures, poems and prose are told, the scene simulator corresponding to the keywords to be processed and the control parameters thereof can be determined through the universal database in the embodiment.

Still take the scene that the parents tell a story for a child as an example, the parents can compile some new stories by themselves. In the process that parents tell a new story, this electronic equipment can obtain parents' voice data in real time, confirms the keyword in this voice data in real time to call this general database, confirm the scene simulator that this keyword corresponds and its control parameter.

S103: based on the determined control parameters, the scene simulator is controlled to assist the speech data in scene simulation.

As mentioned above, the scene simulator may include any one or more of: sound player, projector, illuminator, window glass, smell generator, air volume controller, humidity controller, temperature controller.

If the scene simulator includes a sound player, S103 includes: and controlling the sound player to play the sound corresponding to the keyword based on the determined control parameter of the sound player. If the scene simulator includes a projector, S103 includes: and controlling the projector to project based on the determined control parameters of the projector so as to simulate the scene corresponding to the keyword. If the scene simulator includes a light emitter, S103 includes: and adjusting the light emitter based on the determined light parameters to simulate the light corresponding to the keyword. If the scene simulator includes a window glass, S103 includes: and adjusting the light transmittance of the window glass based on the determined light transmittance parameters of the window glass so as to simulate light rays corresponding to the keywords.

If the scene simulator includes an odor generator, S103 includes: and controlling the odor generator to adjust the odor proportion based on the determined control parameters of the odor generator so as to generate the odor corresponding to the keyword. If the scene simulator includes an air volume controller, S103 includes: and controlling the air volume controller to generate the air volume corresponding to the keyword based on the determined control parameter of the air volume controller. If the scene simulator includes a humidity controller, S103 includes: and controlling the humidity controller to adjust the humidity in the current environment to the humidity range corresponding to the keyword based on the determined control parameter of the humidity controller. If the scene simulator includes a temperature controller, S103 includes: and controlling the temperature controller to adjust the temperature in the current environment to the temperature range corresponding to the keyword based on the determined control parameter of the temperature controller.

For example, the odor generator may be a fragrance diffusing device, the device may include a plurality of fragrances, the fragrances may be configured with different fragrances, the device may further include a nozzle, the nozzle may eject the configured fragrances, the device may further include a fan, the fan may adjust the emitting speed of the fragrances, and the like, and may also adjust the emitting amount or emitting speed of the fragrances by adjusting the pressure of the nozzle. In one case, the odor ratios corresponding to various keywords can be stored in advance, and can be used as control parameters of the odor generator, and the odor generator can be controlled to match different fragrances according to the odor ratios.

In one case, the window glass may be a glass whose light transmittance can be adjusted, for example, the window glass may be filled with a light-transmitting object whose light transmittance can be changed by changing the frequency of a voltage applied to the glass. For example, the transparent object may be a liquid crystal or other substances, and is not limited specifically.

For example, the window glass light transmission parameter corresponding to each keyword may be stored in advance, and the window glass light transmission parameter may be the frequency of the voltage applied to the window glass.

Or, in another case, the window glass can be electrochromic glass, and the coloring condition of the glass can be controlled by the magnitude of the applied voltage, so that the light transmittance of the glass is adjusted. In this case, the light transmission parameter of the window glass may be the magnitude of the voltage applied to the glass.

Assuming that parents say stories for children and say 'Tian black' and 'moon', the keywords are analyzed to be 'Tian black' and 'moon'; assuming that the scene simulator corresponding to the 'dark' is a light emitter and window glass, the control parameter of the corresponding light emitter is 'lowest brightness', and the control parameter of the corresponding window glass is 'lowest light transmittance'; assuming that a scene simulator corresponding to the moon is a projector, and the control parameter of the corresponding illuminator is a projection moon; the light emitter may be adjusted to have the lowest brightness, the window glass may be adjusted to have the lowest light transmittance, and the shape of the moon projected by the projector may be controlled in S103.

In the above example, if the electronic apparatus is an in-vehicle apparatus, the light of the interior of the vehicle may be adjusted to have the lowest brightness, the rear window glass may be adjusted to have the lowest transmittance, and the projector may be controlled to project the shape of the moon onto the ceiling of the interior of the vehicle.

As another example, if the parent tells a story for a child and says "wolve is" the keyword is resolved as "wolve"; suppose that the scene simulator corresponding to "wolf" is a sound player, and the control parameter of the corresponding sound player is "sound file playing howline"; then the sound player can be controlled in S103 to play the howling sound file.

If the fact that the all-weather stars are one flash and one flash is stated, the keyword is analyzed to be the 'stars flash'; supposing that a scene simulator corresponding to the star flashing is a small lamp and window glass on a driver seat, the control parameter of the corresponding small lamp is flashing, and the control parameter of the corresponding window glass is lowest light transmittance; the small lamp on the driver' S seat may be controlled to blink and the window glass may be adjusted to minimize the transmittance in S103.

In one embodiment, after the voice data is analyzed to determine a keyword in the voice data, and after the keyword is used as a keyword to be processed, if the keyword to be processed is date information, associated data of the date information may be obtained, where the associated data includes any one or more of the following: weather data, holiday data, user tag data; and determining a scene simulator corresponding to the associated data and control parameters of the scene simulator.

In one embodiment, geographic location information associated with the date information may also be obtained. In one case, the geographic location information may be obtained by analyzing the voice data, for example, if the voice data sent by the user is "that is beijing in 7/20/2019", the obtained date information is 7/20/2019, and the geographic location information is beijing. In another case, the geographical location information related to the date information may be determined according to a user address acquired in advance. In another case, if the electronic device is an in-vehicle device, the geographical position information of the vehicle may be specified by a GPS (Global Positioning System) as the geographical position information associated with the date information.

Describing by taking the example that the associated data includes weather data, assuming that the date information is 7/20/2019 and the geographic position information is beijing, the acquired weather data may be the weather data of 20/7/2019 and beijing, assuming that the acquired weather data is light rain, and a scene simulator corresponding to the "light rain" is a sound player, and the control parameter of the corresponding sound player is to play a sound file corresponding to the "light rain"; thus, the sound player may be controlled to play the sound file corresponding to "light rain" in S103. Or, the scene simulator corresponding to the light rain can further comprise a humidity controller, and the control parameter of the corresponding humidity controller is the humidity range corresponding to the 'light rain'; in this way, the humidity controller may be controlled in S103 to adjust the humidity in the current environment to the humidity range corresponding to "light rain". And if the acquired weather data is 4-5 grade wind, controlling the air volume controller to generate corresponding air volume. For example, in order to reduce discomfort to the user, the air volume may be appropriately reduced.

Taking the example that the associated data includes festival data, assuming that the date information is 2019, 6/1/h, the festival data is a child festival, the scene simulator corresponding to the child festival may be a sound player, and the control parameter of the corresponding sound player may be "play a child festival song", so that the sound player may be controlled to play the child festival song in S103.

In some cases, people in different regions celebrate different festivals, and the geographic location information associated with the date information may be obtained to determine whether the people at the geographic location celebrate the corresponding festivals. For example, if the date information is 2018, 10, 1 and the relevant geographical location information is china, it may be determined that the celebratory festival is the national celebration festival.

Taking the example that the associated data includes the user mark data as an example, the user may mark some special dates, such as a birthday, a wedding anniversary, and the like, assuming that the user marks 2019, 11, month and 1 day as the birthday of the user, and the parsed voice data includes "2019, 11, month and 1 day", the associated data of "2019, 11, month and 1 day" is "the birthday", the scene simulator corresponding to the birthday "may be a sound player, and the control parameter of the corresponding sound player may be" play birthday song ", so that the sound player may be controlled to play the birthday song in S103.

In one embodiment, S103 may include: if a plurality of keywords to be processed are determined and conflicts exist among the control parameters of the scene simulator corresponding to the keywords to be processed, determining the priority sequence among the keywords to be processed according to the preset priority of each keyword, or determining the priority sequence among the keywords to be processed through semantic analysis;

and controlling the scene simulator to be processed based on the determined control parameters so as to assist the voice data in scene simulation.

For example, if the voice data acquired in S101 is "sun comes out at night", the keywords to be processed obtained by analysis are "night" and "sun"; assuming that a scene simulator corresponding to 'night' is a light emitter, and a control parameter of the corresponding light emitter is 'brightness level A', wherein A represents the lowest brightness level; assuming that a scene simulator corresponding to the sun is a light emitter, and the control parameter of the corresponding light emitter is a brightness level B, wherein B represents a certain brightness level which is not the lowest brightness level; thus, there is a conflict between the control parameters of the scene simulators corresponding to the two keywords. In this case, the priority order between the keywords may be determined.

In one case, the priority order among the plurality of keywords to be processed may be determined according to a preset priority of each keyword. Assuming that "night" is preset to have a higher priority than "sun", the light emitter is adjusted to simulate "night" light based on "brightness level a".

In another case, the priority order among the multiple keywords to be processed may be determined through semantic parsing. Assuming that the voice data is semantically analyzed, and the night is simulated first and then the sun is simulated according to the context determination, the illuminator is adjusted based on the brightness level A to simulate the light of the night, and then the illuminator is adjusted based on the brightness level B to simulate the light of the sun.

By applying the embodiment of the invention, on the first hand, the speaker can acquire the voice data only by sending out voice, and the equipment can control the scene simulator to assist the voice data to perform scene simulation by analyzing the voice data, so that the assistance of the speaker is realized, and less manpower is consumed except for language expression. On the second hand, parents or other guardians send children to and from the way, in the vehicle environment, generally only listening to the music radio station; by applying the scheme, stories, reading-out articles, poems and the like can be told to children, substitution feeling of the children can be enhanced through the auxiliary effect of the vehicle-mounted equipment, and interestingness is improved. In a third aspect, the user can tell a known story, poem, prose or other content, the user can also compose new stories or create other content, and the scheme can assist in both cases. In the fourth aspect, by applying the scheme, scene simulation can be performed on multiple aspects such as auditory sense, visual sense, light, glass light transmittance, date related data and the like, more interest of audiences can be aroused, and a better auxiliary telling effect is achieved.

Assuming that the execution subject is an in-vehicle device, a voice collecting device, a light emitter, a projector, and a sound player may be installed in a rear seat of a vehicle in advance, and a specific embodiment will be described below with reference to fig. 2:

s201: and after receiving a starting instruction sent by a user, starting the auxiliary mode.

For example, the user may start the auxiliary mode through a voice instruction, or the electronic device may present an interactive interface, and the user starts the auxiliary mode by clicking a button in the interactive interface. After the auxiliary mode is started by the user, the vehicle-mounted equipment executes the subsequent steps again, so that the condition of false triggering can be reduced.

S202: the method comprises the steps of obtaining first voice data of a user, and determining a lecture theme of the user by analyzing the first voice data.

For distinguishing description, after the auxiliary mode is started, the voice data of the preset number received by the vehicle-mounted device is called first voice data, and the voice data received in the subsequent voice interaction process is called second voice data.

The lecture subject of the user may be a name of a lecture story, a lecture subject, or the like, and is not limited in particular. In this embodiment, after receiving a start instruction sent by the user, the first voice data of the user may be acquired, and the lecture topic may be determined by analyzing the first voice data.

Taking the scene that the parents tell the story for the child as an example, the parents can tell some classic fairy tales for the child. The parents can say 'the story i want to say next is the princess of snow and white' to children 'after opening the auxiliary mode, so, this electronic equipment confirms that the theme is the story of snow and white' by analyzing this speech data.

For example, the spoken subject in the first speech data may be identified by a semantic analysis algorithm. Alternatively, a topic recognition model may be obtained by training in advance, and the topic recognition model may be used to recognize the lecture topic in the first speech data. For example, sample voice data and a corresponding topic label thereof may be obtained, the sample voice data is input into a neural network with a preset structure, and parameters in the neural network are iteratively adjusted by using the topic label as supervision until a convergence condition is met, so as to obtain a trained topic recognition model.

S203: and acquiring a data packet corresponding to the theme, wherein the data packet comprises the corresponding relation between the keywords associated with the theme and the scene simulator and the control parameters thereof.

Continuing with the above example, the in-vehicle device may obtain a data packet corresponding to "the story of the white snow princess", the data packet including a correspondence relationship of keywords associated with "the story of the white snow princess" with the scene simulator and the control parameters thereof.

Alternatively, the present embodiment is also applicable to a scene in which other contents such as a known poem and a prose are described, and is not limited specifically.

S204: and acquiring second voice data of the user, and determining a keyword in the second voice data as a keyword to be processed by analyzing the second voice data.

For example, keywords in the second speech data may be identified by a semantic analysis algorithm. Alternatively, a keyword recognition model may be obtained by training in advance, and the keyword recognition model may be used to recognize the keyword in the second speech data. For example, sample voice data and a corresponding keyword tag thereof may be obtained, the sample voice data is input into a neural network with a preset structure, the keyword tag is used as a supervision, and parameters in the neural network are iteratively adjusted until a convergence condition is met, so as to obtain a trained keyword recognition model.

S205: if the keywords to be processed are date information, acquiring associated data of the date information, wherein the associated data comprises any one or more of the following items: weather data, holiday data, user tagging data.

Taking the example that the associated data includes weather data, assuming that the date information is 7/month and 20/year 2019 and the geographical location information is beijing, the acquired weather data may be the weather data of 20/month and 20/year 2019.

Taking the example that the associated data includes the festival data, the festival data is a child festival if the date information is 6/1/2019. In some cases, people in different regions celebrate different festivals, and geographic location information associated with the date information may be obtained to determine whether people in the geographic location celebrate the corresponding festival. For example, if the date information is 2018, 10, 1 and the relevant geographical location information is china, it may be determined that the celebratory festival is the national celebration festival.

Taking the example that the associated data includes the user mark data, the user may mark some special dates such as birthdays, wedding anniversaries, and the like, and assuming that the user marks 2019, 11, 1 and the date as the birthday of the user and the parsed voice data includes "2019, 11, 1 and the date", the associated data of "2019, 11, 1 and the date" is the birthday.

S206: determining a scene simulator corresponding to the keywords to be processed and control parameters thereof according to the acquired data packet; and determining a scene simulator corresponding to the associated data and control parameters of the scene simulator.

The data packet is the data packet acquired in S203, and the data packet includes a correspondence between the keywords associated with the theme and the scene simulator and the control parameters thereof; the associated data in S206 is the associated data acquired in S205.

S207: based on the determined control parameters, the scene simulator is controlled to assist the speech data in scene simulation.

If the scene simulator includes a sound player, S207 includes: and controlling the sound player to play the sound corresponding to the keyword based on the determined control parameter of the sound player. If the scene simulator includes a projector, S207 includes: and controlling the projector to project based on the determined control parameters of the projector so as to simulate the scene corresponding to the keyword. If the scene simulator includes a light emitter, S207 includes: and adjusting the light emitter based on the determined light parameters to simulate the light corresponding to the keyword. If the scene simulator includes a window glass, S207 includes: and adjusting the light transmittance of the window glass based on the determined light transmittance parameters of the window glass so as to simulate light rays corresponding to the keywords.

If the scene simulator includes the odor generator, S207 includes: and controlling the odor generator to adjust the odor proportion based on the determined control parameters of the odor generator so as to generate the odor corresponding to the keyword. If the scene simulator includes the air volume controller, S207 includes: and controlling the air volume controller to generate the air volume corresponding to the keyword based on the determined control parameter of the air volume controller. If the scene simulator includes a humidity controller, S207 includes: and controlling the humidity controller to adjust the humidity in the current environment to the humidity range corresponding to the keyword based on the determined control parameter of the humidity controller. If the scene simulator includes a temperature controller, S207 includes: and controlling the temperature controller to adjust the temperature in the current environment to the temperature range corresponding to the keyword based on the determined control parameter of the temperature controller.

In one case, the window glass may be a glass whose light transmittance can be adjusted, for example, a light-transmitting object may be filled in the window glass, and the light transmittance of the light-transmitting object may be changed by changing the frequency of a voltage applied to the glass. For example, the transparent object may be a liquid crystal or other substances, and is not limited specifically.

Assuming that parents say stories for children and say 'Tian black' and 'moon', the keywords are analyzed to be 'Tian black' and 'moon'; assuming that the scene simulator corresponding to the 'dark' is a light emitter and window glass, the control parameter of the corresponding light emitter is 'lowest brightness', and the control parameter of the corresponding window glass is 'lowest light transmittance'; assuming that a scene simulator corresponding to the moon is a projector, and the control parameter of the corresponding illuminator is a projection moon; the light emitter may be adjusted to have the lowest brightness, the window glass may be adjusted to have the lowest light transmittance, and the projector may be controlled to project the shape of the moon in S207.

As another example, if the parent tells a story for a child and says "wolve is" the keyword is resolved as "wolve"; suppose the scene simulator corresponding to the wolf is a sound player, and the control parameter of the corresponding sound player is the sound file playing howling; then the sound player may be controlled to play the howling sound file in S207.

If the situation that the all-sky stars are one flash and one flash is given, the keyword is analyzed to be the star flash; supposing that a scene simulator corresponding to the star flashing is a small lamp and window glass on a driver seat, the control parameter of the corresponding small lamp is flashing, and the control parameter of the corresponding window glass is lowest light transmittance; the small lamp on the driver' S seat may be controlled to blink and the window glass may be adjusted to minimize the transmittance in S207.

Following the above example, the following description will be made of the related data:

describing by taking the example that the associated data comprises weather data, assuming that the acquired weather data is light rain, a scene simulator corresponding to the light rain is a sound player, and the control parameter of the corresponding sound player is a sound file corresponding to the playing of the light rain; thus, the sound player may be controlled to play the sound file corresponding to "light rain" in S207. Or, the scene simulator corresponding to the light rain can further comprise a humidity controller, and the control parameter of the corresponding humidity controller is the humidity range corresponding to the 'light rain'; thus, the humidity controller may be controlled to adjust the humidity in the current environment to the humidity range corresponding to "light rain" in S207. And if the acquired weather data is 4-5 grade wind, controlling the air volume controller to generate corresponding air volume. For example, in order to reduce the uncomfortable feeling to the user, the air volume may be appropriately reduced.

The description is given by taking the example that the associated data includes festival data, and assuming that the acquired festival data is a child festival, the scene simulator corresponding to the child festival may be a sound player, and the control parameter of the corresponding sound player may be "play a child festival song", so that the sound player may be controlled to play the child festival song in S207.

Taking the example that the associated data includes the user tag data as an example, it is assumed that the obtained user tag data is "birthday", "the scene simulator corresponding to the birthday" may be a sound player, and the control parameter of the corresponding sound player may be "play birthday song", so that the sound player may be controlled to play the birthday song in S207.

By applying the embodiment, on the first aspect, the device can acquire the voice data only by making a voice, and by analyzing the voice data, the scene simulator is controlled to assist the voice data in scene simulation, so that the assistance of the speaker is achieved, and besides language expression, less manpower is consumed. On the second hand, parents or other guardians send children to and from the way, in the vehicle environment, generally only listening to the music radio station; by applying the embodiment, stories, reading-out articles, poems and the like can be told to children, the substitution feeling of the children can be enhanced through the auxiliary effect of the vehicle-mounted equipment, and the interestingness is improved. And in the third aspect, the simulation scene in the vehicle can be adjusted according to different dates, so that a better auxiliary effect is achieved. In the fourth aspect, by applying the scheme, scene simulation can be performed on multiple aspects such as related data of hearing, vision, light, glass light transmittance and date, more interests of audiences can be aroused, a better auxiliary telling effect is achieved, and better experience is brought to driving time of parents and children.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a voice interaction apparatus, as shown in fig. 3, including:

a first obtaining module 301, configured to obtain voice data of a user;

a first determining module 302, configured to determine a scene simulator associated with the voice data and a control parameter of the scene simulator by parsing the voice data;

a control module 303, configured to control the scene simulator based on the determined control parameter, so as to assist the voice data in performing scene simulation.

As an embodiment, the first determining module 302 is specifically configured to:

As an embodiment, the apparatus further comprises: a second obtaining module and a second determining module (not shown in the figure), wherein,

a second obtaining module, configured to obtain associated data of the date information if the keyword to be processed is the date information, where the associated data includes any one or more of the following items: weather data, holiday data, user marking data;

As an embodiment, the control module 303 is specifically configured to perform any one or more of the following steps:

As an embodiment, the apparatus further comprises: a third determining module and a third obtaining module (not shown in the figure), wherein,

As an embodiment, the control module 303 is further configured to:

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, which includes a memory 402, a processor 401, and a computer program stored on the memory 402 and executable on the processor 401, and when the processor 401 executes the computer program, any one of the voice interaction methods is implemented.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, which stores computer instructions for causing the computer to perform any one of the above voice interaction methods.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of voice interaction, comprising:

acquiring first voice data of a user;

determining a lecture theme of a user by analyzing the first voice data;

acquiring a data packet corresponding to the theme, wherein the data packet comprises a corresponding relation between keywords associated with the theme and a scene simulator and control parameters thereof;

acquiring second voice data of a user, and determining a keyword in the second voice data as a keyword to be processed by analyzing the second voice data;

if the keywords to be processed are date information, acquiring associated data of the date information, wherein the associated data comprises any one or more of the following items: events corresponding to festivals and special dates marked by users;

determining a scene simulator corresponding to the associated data and control parameters of the scene simulator according to the acquired data packet;

2. The method of claim 1, wherein controlling the scene simulator to assist scene simulation of the speech data based on the determined control parameters comprises any one or more of:

3. The method of claim 1, wherein controlling the scene simulator based on the determined control parameters to assist scene simulation of the speech data comprises:

4. A voice interaction apparatus, comprising:

the first acquisition module is used for acquiring first voice data of a user;

the first determining module is used for determining the lecture theme of the user by analyzing the first voice data;

the second acquisition module is used for acquiring a data packet corresponding to the theme, wherein the data packet comprises the corresponding relation between the keywords associated with the theme and the scene simulator and the control parameters thereof;

a third obtaining module, configured to obtain second voice data of a user, determine a keyword in the second voice data as a to-be-processed keyword by analyzing the second voice data, and if the to-be-processed keyword is date information, obtain associated data of the date information, where the associated data includes any one or more of the following: events corresponding to festivals and special dates marked by users;

the second determining module is used for determining a scene simulator corresponding to the associated data and control parameters of the scene simulator according to the acquired data packet;

5. The apparatus of claim 4, wherein the control module is specifically configured to perform any one or more of the following steps:

6. The apparatus of claim 4, wherein the control module is further configured to:

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 3 when executing the program.