CN115426553A - Intelligent sound box and display method thereof - Google Patents

Intelligent sound box and display method thereof Download PDF

Info

Publication number
CN115426553A
CN115426553A CN202110517035.2A CN202110517035A CN115426553A CN 115426553 A CN115426553 A CN 115426553A CN 202110517035 A CN202110517035 A CN 202110517035A CN 115426553 A CN115426553 A CN 115426553A
Authority
CN
China
Prior art keywords
sound box
intelligent sound
determining
audio stream
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110517035.2A
Other languages
Chinese (zh)
Inventor
孟卫明
王彦芳
王月岭
蒋鹏民
杜兆臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Holding Co Ltd
Original Assignee
Hisense Group Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Holding Co Ltd filed Critical Hisense Group Holding Co Ltd
Priority to CN202110517035.2A priority Critical patent/CN115426553A/en
Publication of CN115426553A publication Critical patent/CN115426553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/028Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09FDISPLAYING; ADVERTISING; SIGNS; LABELS OR NAME-PLATES; SEALS
    • G09F9/00Indicating arrangements for variable information in which the information is built-up on a support by selection or combination of individual elements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an intelligent sound box and a display method thereof, wherein a virtual imaging body displayed inside the intelligent sound box is imaged outside by using a negative refractive index material, so that the volume of the intelligent sound box is saved, and the use experience of a user is improved. Including being located inside display screen of intelligent audio amplifier, being located negative refractive index material and treater on the intelligent audio amplifier, wherein: the display screen is used for displaying; the negative refractive index material is used for refracting the content displayed by the display screen to the outside of the intelligent sound box for imaging; the processor is configured to perform: determining a working scene of the intelligent sound box; determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the preset working scene and the state of the virtual imaging body; and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box by the negative refractive index material to be imaged by taking the display screen inside the intelligent sound box as a light source.

Description

Intelligent sound box and display method thereof
Technical Field
The invention relates to the technical field of intelligent sound boxes, in particular to an intelligent sound box and a display method thereof.
Background
The intelligent sound box is a product of sound box upgrading, is a tool for a household consumer to surf the internet by voice, such as song playing, online shopping or weather forecast knowing, and can also be used as a medium for a user to control intelligent household equipment by voice, such as opening a curtain, setting the temperature of a refrigerator, heating a water heater in advance and the like. The existing intelligent sound box is mainly used for voice interaction and has poor capability in visual interaction.
Although also having intelligent audio amplifier and display technology to combine at present, utilize the display screen to carry out the scheme that shows, nevertheless because intelligent audio amplifier structurally needs to add a display screen, must need separately set up the sound production of display screen and audio amplifier and pickup part, lead to intelligent audio amplifier's volume increase, and because the display screen has taken the volume of intelligent audio amplifier originally, make this intelligent audio amplifier's sound production part must be poor than the effect of encircleing of cylindrical audio amplifier, user's use experience is relatively poor.
Disclosure of Invention
The invention provides an intelligent sound box and a display method thereof, which are used for imaging a virtual imaging body displayed inside the intelligent sound box outside by using a negative refractive index material without displaying in a mode of arranging a display screen outside the intelligent sound box, so that the volume of the intelligent sound box is saved, and the use experience of a user is improved.
In a first aspect, an embodiment of the present invention provides an intelligent sound box, including a display screen located inside the intelligent sound box, a negative refractive index material located on the intelligent sound box, and a processor, where:
the display screen is used for displaying;
the negative refractive index material is used for refracting the content displayed by the display screen to the outside of the intelligent sound box for imaging;
the processor is configured to perform:
determining a working scene of the intelligent sound box;
determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the preset working scene and the state of the virtual imaging body;
and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box by the negative refractive index material to be imaged by taking the display screen inside the intelligent sound box as a light source.
The intelligent sound box provided by the embodiment of the invention can display the sound through an aerial virtual imaging mode outside the intelligent sound box, projects a virtual imaging body displayed by a display screen embedded in the intelligent sound box in the air outside the intelligent sound box through a negative refractive index material, and can present various states, so that the problem that the volume of the intelligent sound box is increased by adding a display screen outside the intelligent sound box for displaying is solved, and the use experience of a user is improved.
In some examples, the processor is configured to perform:
if the intelligent sound box does not receive the local audio stream, determining the working state of the intelligent sound box as the working scene of the intelligent sound box; or
If the smart sound box receives a local audio stream in a non-standby state, determining semantics of the local audio stream, and determining a working scene of the smart sound box according to the semantics; or the like, or a combination thereof,
and if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
In some examples, the processor is further specifically configured to perform:
and performing voiceprint recognition on the local audio stream, and determining user preference corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to a mode corresponding to the user preference.
In some examples, after the determining the state of the virtual imaging volume corresponding to the work scene, the processor is further specifically configured to perform:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In some examples, the processing appliance is configured to perform:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or, alternatively,
and determining dance actions corresponding to the played music according to the music dance generation algorithm.
In a second aspect, an embodiment of the present invention provides a display method for a smart sound box, including:
determining a working scene of the intelligent sound box;
determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the preset working scene and the state of the virtual imaging body;
the method comprises the steps that a display screen inside the intelligent sound box is used as a light source, and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box through a negative refractive index material on the intelligent sound box to be imaged.
In some examples, the determining the working scenario of the smart speaker includes:
if the intelligent sound box does not receive the local audio stream, determining the working state of the intelligent sound box as the working scene of the intelligent sound box; or
If the intelligent sound box receives a local audio stream in a non-standby state, determining the semantics of the local audio stream, and determining the working scene of the intelligent sound box according to the semantics; or the like, or, alternatively,
if the smart sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining a working scene of the smart sound box according to whether the awakening information is identified.
In some examples, the identifying the wake up information of the local audio stream further comprises:
and performing voiceprint recognition on the local audio stream, and determining user preferences corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to the mode corresponding to the user preferences.
In some examples, after determining the state of the virtual imaging volume corresponding to the work scene, further comprising:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In some examples, the determining a dance action corresponding to the played music includes:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or, alternatively,
and determining the dance action corresponding to the played music according to the music dance generation algorithm.
In a third aspect, an embodiment of the present invention further provides a display device for a smart speaker, including:
the scene determining unit is used for determining the working scene of the intelligent sound box;
the state determining unit is used for determining the state of the virtual imaging body corresponding to a preset working scene according to the corresponding relation between the working scene and the state of the virtual imaging body;
and the imaging display unit is used for taking the display screen inside the intelligent sound box as a light source and refracting the virtual imaging body displayed by the display screen to be refracted to the outside of the intelligent sound box for imaging through a negative refractive index material on the intelligent sound box.
In some examples, the determining the scene unit is specifically to:
if the intelligent sound box does not receive the local audio stream, determining the working state of the intelligent sound box as the working scene of the intelligent sound box; or
If the intelligent sound box receives a local audio stream in a non-standby state, determining the semantics of the local audio stream, and determining the working scene of the intelligent sound box according to the semantics; or the like, or, alternatively,
and if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
In some examples, the determining the scene unit is further specifically configured to:
and performing voiceprint recognition on the local audio stream, and determining user preferences corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to the mode corresponding to the user preferences.
In some examples, after determining the state of the virtual imaging volume corresponding to the work scene, the control unit is further configured to:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In some examples, the control unit is specifically configured to:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or, alternatively,
and determining the dance action corresponding to the played music according to the music dance generation algorithm.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program is used to implement the steps of the method in the second aspect when executed by a processor.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic view of an intelligent sound box according to an embodiment of the present invention;
fig. 2 is a schematic view of a smart sound box displaying a virtual imaging volume according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an imaging principle of a negative refractive index material according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating an implementation of a method for controlling a virtual imaging volume according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating an implementation of a virtual display method for an intelligent sound box according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating an implementation of a method for waking up a main smart speaker according to an embodiment of the present invention;
fig. 7 is a flowchart illustrating an implementation of determining a main smart speaker according to an embodiment of the present invention;
fig. 8 is a flowchart illustrating an implementation of a wake-up method for another smart sound box according to an embodiment of the present invention;
fig. 9 is a flowchart illustrating an implementation of a display method for an intelligent sound box according to an embodiment of the present invention;
fig. 10 is a schematic view of a display device of an intelligent sound box according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. In the description of the present invention, the meaning of "a plurality" is two or more, unless otherwise specified.
The existing intelligent sound box is mainly used for voice interaction and poor in experience in visual interaction, the content displayed inside the intelligent sound box is projected to the outside of the intelligent sound box based on the aerial projection display technology, the use experience of a user is improved through a display mode of a virtual imaging body, and the technological sense of a product is improved. And the aerial projection display technology is combined with the intelligent sound box, so that the problem of conflict between screen display and sound production and pickup structures outside the intelligent sound box can be solved, and the volume of the sound box is reduced while the appearance is kept attractive.
As shown in fig. 1, an intelligent sound box provided in an embodiment of the present invention includes a display screen 100 located inside the intelligent sound box, a negative refractive index material 101 located on the intelligent sound box, and a processor 102, where:
the display screen 100 is used for displaying; the display mode in the embodiment of the invention is different from the display mode in the prior art that the display screen is arranged outside the intelligent sound box for displaying, the display mode in the embodiment does not increase the volume of the intelligent sound box, and the display screen in the intelligent sound box does not need to be provided for a user to watch, so that the display function is only required, the viewability is not required, and a smaller volume can be embedded in the intelligent sound box, so that the display function is provided, the volume of the intelligent sound box is not increased, and the conflict between the display screen, the sound production structure and the sound pickup structure is not caused.
The negative refractive index material 101 is used for refracting the content displayed by the display screen to the outside of the intelligent sound box for imaging; the negative refractive index material in the embodiment of the invention can realize an interactive aerial imaging technology, and the technology applies a light field reconstruction principle through the negative refractive index material to reconverge the scattered light rays in the air so as to form a real image without medium bearing, and can realize direct interaction between a person and the aerial real image by combining an interaction technology. The greatest advantage of the interactive aerial imaging technology is that people can directly interact with imaging in space without using auxiliary display equipment similar to VR, and the technology can directly present pictures in the air without any bearing medium, and the pictures are imaged into real images, which is different from virtual images presented by common 'holographic projection' in the prior art.
The processor 102 is configured to perform the steps of:
step 1, determining a working scene of the intelligent sound box; the working scene in this embodiment is used to represent the current working state or working mode of the smart speaker, where the working state or working mode may be determined without receiving the local audio stream, or may be determined according to the local audio stream after receiving the local audio stream, which is not limited in this embodiment. The working scenario in this embodiment includes, but is not limited to, one or more of the following: defaulting to a state after starting up; the state of the distribution network stage; a state of being awakened to wait for a command; broadcasting a conversation scene; playing non-music media asset scenes such as states of playing stories and poems; playing music media scenes, such as the state of playing songs; inquiring weather state; state where user commands are not understood or heard; controlling the state of the intelligent household appliance; state at shutdown.
Step 2, determining the state of the virtual imaging body corresponding to a preset working scene according to the corresponding relation between the working scene and the state of the virtual imaging body;
the embodiment provides a specific corresponding relationship between a work scene and a state of a virtual imaging body, the corresponding relationship in the embodiment is only a specific example, the method for protecting the virtual imaging body includes but is not limited to the example, and corresponding relationships between other work scenes and states of the virtual imaging body obtained based on the same principle all belong to the protection scope of the present invention. The virtual imaging body in this embodiment may be a virtual character, the wearing and the action of the virtual character under different working scenes are different, for convenience of description, the virtual character may be called harley, and the correspondence between the working scenes and the harley states is listed as follows:
if the mobile phone is in a default state after being started, harry is in a standing state, and the action of calling and calling or the state of kicking and kicking a ball are executed at set time intervals; if the mobile terminal is in the state of the network distribution stage, harry wears a hat in the form of an engineer and operates a router, and the router presents a WIFI signal, namely Harry presents a busy state of configuring a network; if the mobile phone is in the state of being awakened to wait for the command, harry executes the action of listening to the side ear, namely, putting the hand on the ear; if the music player is in the state of playing songs, harley shows the states of dancing hands and dancing hands, or playing guitar and beating drum set; if the music box is in the state of playing stories and poems, the music box is in the state of wearing glasses and holding books by hands; if the weather is inquired, harry can present states of fighting, wearing a wadded jacket, wearing a windy coat and drinking cold drinks with too hot sweat according to the current weather; if the user command is not understood or heard, then Harry presents a doubtful expression state; if the intelligent household appliance is in a state of controlling the intelligent household appliance, harry presents a household image and takes out the remote controller to perform clicking action; if the mobile phone is in the power-off state, the Rayleigh disappears after the Rayleigh executes the Rayleigh action.
And 3, with the display screen inside the intelligent sound box as a light source, reflecting the state of the virtual imaging body displayed by the display screen to the outside of the intelligent sound box through the negative refractive index material for imaging.
In this embodiment, the state of the virtual imaging volume corresponding to the working scene is preset based on different working scenes, and the virtual imaging volume may be displayed in a 3D virtual state based on a 3D rendering display technology (for example, unity 3D technology). It should be noted that the 3D virtual display in this embodiment is different from the holographic projection display, and the display manner in this embodiment is to display a 3D virtual state on the display screen inside the smart sound box, and project the currently displayed 3D virtual state into the air outside the smart sound box for imaging, where the actual imaging state may be a state displayed on the display screen inside the smart sound box as seen by human eyes, and the 3D virtual state may perform an action such as rotation according to a received instruction.
In some embodiments, as shown in fig. 2, an implementation manner of displaying a virtual imaging volume by a smart sound box is provided in the embodiments of the present invention, wherein a negative refractive material 200 is located above a smart sound box 201 and is integrated with the smart sound box 201, and a virtual imaging volume 202 is located above the negative refractive material 200. As shown in fig. 3, this embodiment provides an imaging principle schematic diagram of a negative refractive index material, where the light source in fig. 3 is an internal display screen of an intelligent sound box in this embodiment, and may also be understood as all contents displayed by the internal display screen, and the imaging in fig. 3 is a virtual imaging body in this embodiment, and this embodiment may refract all contents displayed by the internal display screen outside the intelligent sound box through the negative refractive index material to perform imaging, so as to replace the display function of an external display screen.
In some embodiments, the present invention may determine the current working scenario of the smart sound box according to whether the local audio stream is received, where the manner of determining the working scenario of the smart sound box includes, but is not limited to, one of the following:
mode 1, if the smart sound box does not receive a local audio stream, determining the working state of the smart sound box as a working scene of the smart sound box;
in this embodiment, if the smart sound box does not receive the local audio stream, the current working state of the smart sound box is determined as a working scene, where the working state includes but is not limited to: the system comprises a starting-up state, a default state that a local audio stream is not received after starting up, a distribution network state, a distribution network completion state, a song playing state, a story or poem playing state, a weather inquiring state, a state of controlling an intelligent household appliance, a shutdown state and the like.
Mode 2, if the smart sound box receives a local audio stream in a non-standby state, determining semantics of the local audio stream, and determining a working scene of the smart sound box according to the semantics;
in this embodiment, if a local audio stream is received, the working scene is determined according to the semantics of the local audio stream, for example, if an audio stream for playing a song is received, the working scene is determined to be the song being played.
In some embodiments, the semantics of the local audio stream are determined by:
1) Converting the local audio stream by a local speech recognition algorithm;
2) Determining semantics of the converted local audio stream through a local semantic understanding algorithm or a cloud semantic understanding algorithm.
In implementation, a local audio stream is converted through a local voice recognition algorithm to obtain a language which can be processed by a computer, if the converted semantic is analyzed through a local semantic understanding algorithm, the semantic is determined through the local semantic understanding algorithm, if the converted semantic is not analyzed through the local semantic understanding algorithm, the converted semantic is analyzed through a cloud semantic understanding algorithm to determine the semantic, wherein the local semantic understanding algorithm is stored in a local server or a local intelligent sound box, and the cloud semantic understanding algorithm is stored in the cloud server.
In some scenarios, for example, a local audio stream related to chat content is received, the converted local audio stream may be parsed by using a local semantic understanding algorithm, or a local audio stream related to controlling the smart home device is received, and then the local semantic understanding algorithm may be used to perform semantic parsing, so as to determine a working scenario, and further determine a state (such as wearing, motion, and the like) of the virtual imager, so as to maintain or switch the state. In some scenes, for example, local audio streams related to weather query, song playing, story, poem and the like are received, and after semantic parsing is performed by using a cloud semantic understanding algorithm, corresponding media resources are obtained from a cloud for playing.
And in the mode 3, if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
In some embodiments, in order to save power consumption or quickly start up, the smart speaker is generally configured with a standby mode (i.e., a standby state), and in this mode, the smart speaker is in a standby or sleep state and does not analyze a local audio stream. Namely, if the awakening information is identified, the working scene is determined to be a scene of the awakened waiting command, otherwise, the working scene is determined to be a standby scene.
In some implementation scenarios, while performing voice recognition or semantic understanding on the local audio stream, or performing wake-up information recognition, voiceprint recognition may be performed on the local audio stream, and a user preference corresponding to a result of the voiceprint recognition may be determined, and if the local audio stream is an audio stream related to controlling the smart home device, the smart home device may be controlled by the smart speaker in a manner corresponding to the user preference. For example, family members a and B say "turn on the air conditioner" after a wakes up the smart speaker, the smart speaker receives the local audio stream of "turn on the air conditioner" and performs semantic analysis, controls the air conditioner to turn on, and sets the air conditioner mode and temperature that are liked to a, wherein, the liked of user a can be set through the APP that is bound to a, or obtained from the history interactive information of a and the smart speaker.
In some embodiments, the state of the virtual imaging volume in this embodiment further includes a mouth shape and an action synchronized with the broadcasted voice media, and after determining the state of the virtual imaging volume corresponding to the work scene, this embodiment further includes one or more of the following display modes:
the method 1 includes controlling a mouth shape of the virtual imaging body to be synchronous with a played chatting audio stream according to a lip synchronization algorithm if the working scene is determined to be a broadcast dialogue scene;
in implementation, the broadcast dialog scenario may be a chat dialog scenario. After determining the state (wearing, action) of the virtual imaging body corresponding to the broadcast dialog scene, the present embodiment further controls the mouth shape of the virtual imaging body to be synchronized with the played chatting audio stream based on a lip synchronization algorithm. That is, if the user chats with the smart speaker in a voice manner, the smart speaker plays the chatting audio stream and the mouth shape of the virtual imaging body is presented in a unified state according to the content of the played chatting audio stream.
Mode 2, if the working scene is determined to be a non-music media asset playing scene, playing the obtained non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm;
in the implementation, the non-music type media asset scenes such as weather can be inquired. In this embodiment, after determining the state (wearing, motion) of the virtual imaging object corresponding to the played non-music media asset scene, the lip synchronization algorithm is further used to control the mouth shape of the virtual imaging object to be synchronized with the played non-music media asset. In some examples, the scene further includes a special scene, that is, after the smart speaker performs voice recognition and semantic understanding, the semantics of the local audio stream cannot be obtained, and then the smart speaker plays a special audio, such as "play right, not hear your instruction, please say again". That is, if the user inputs the local audio stream corresponding to the non-music type media asset scene, the intelligent sound box switches the state of the virtual imaging body to correspond to the non-music type media asset scene, and then controls the mouth shape of the virtual imaging body to present a combined state according to the played content while playing the non-music type media asset scene.
In some examples, the basic principle of the lip-sync algorithm in the present embodiment is to control the mouth change of the virtual imaging volume by identifying initials, finals and mouth shape numbers corresponding to the initials and finals in the chat audio stream or the non-music media asset. Optionally, after obtaining the chatting audio stream or the non-music media asset, identifying an initial consonant and a final in the chatting audio stream or the non-music media asset, and determining a mouth shape number by using mouth shape change of a combination of the initial consonant and the final, wherein the initial consonant and the final in the audio stream can be identified through a pre-trained deep learning algorithm. Optionally, a speech recognition technology is used to recognize the Chinese characters in the audio stream, and directly obtain the pinyin, and determine the mouth shape corresponding to the pinyin.
And 3, if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In the implementation, music media asset scenes such as songs can be played. In this embodiment, after determining the state (wearing, movement) of the virtual imaging body corresponding to the played music-like media asset scene, a dance movement may be generated based on the played music, and the virtual imaging body may be controlled to dance according to the dance movement. For example, when the smart speaker receives an instruction "please play a song", the virtual imager represents chorea actions according to a preset dance action in advance, or represents chorea actions according to an audio rhythm, or represents clapping and stomping actions according to the audio rhythm.
In some examples, including but not limited to one or more of the following methods of determining dance movements:
1) According to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined;
in implementation, the existing library of librosa can be used to monitor the beat point beat _ times of the played music in real time through one thread. And taking the beat _ times variable as a Linux shared variable, and sending a hand-clapping or foot-clapping signal to an internal display screen when the current moment is detected to be a beat point in another thread so as to synchronize the played audio stream with the virtual imaging body.
2) And determining the dance action corresponding to the played music according to the music dance generation algorithm.
In implementation, based on existing music and dance matching data, dividing music and dance according to sections, extracting feature vectors for the music sections and dance sections respectively, then generating a confrontation Network (GAN) model based on pix2pix, determining dance sections corresponding to the feature vectors, and finally generating dance actions according to a plurality of dance sections.
In some examples, in the embodiments of the present invention, a microphone or a microphone array on a smart speaker picks up a local audio stream, performs noise reduction processing, and sends the local audio stream to a smart speaker processor for speech recognition and semantic understanding of the local audio stream, and after semantic information of the local audio stream is analyzed by using a local semantic understanding algorithm or a cloud semantic understanding algorithm, a corresponding audio stream is played according to the semantic information, and a state of a virtual imaging object corresponding to the semantic information is switched. Optionally, the local audio stream picked up by the smart sound box may be received by setting a service interface, and the audio stream to be played may be received to drive a speaker of the smart sound box to play. For example, the following service interface contents are set, which represent semantic information for the virtual imager to perform a left turn:
Timestamp 20210311160052
Type 2
Data turning to the left "
Wherein, timestamp is a Timestamp and represents the time of picking up the current local audio stream; the Type is Type, belongs to enumeration value, and takes values of 1, 2 and 3.1 represents a system instruction, which is responsible for instructions of starting, resetting, standby and the like of a virtual imaging body; 2, semantic information is shown, type is filled in 2, the interface requests the semantic information for transmission, the semantic information is used for switching the preset image of the virtual imaging body, and whether the obtained audio stream needs to utilize a lip synchronization algorithm or a music beat algorithm to carry out lip synchronization or dance action output is set; and 3, the audio content is represented, and Data is transmitted as audio stream when the Type is 3, and the audio stream Data controls the lip or limb movement by performing corresponding processing in a lip synchronization algorithm or a music tempo algorithm.
In some examples, as shown in fig. 4, the present embodiment further provides a method for controlling a virtual imaging volume, where the method is implemented as follows:
step 400, determining semantics of a local audio stream received by the smart sound box;
step 401, determining a state of the virtual imaging volume according to the semantics, wherein the state includes a basic image of the virtual imaging volume;
step 402, judging whether the semantic is a broadcast dialogue semantic, if so, executing step 403, otherwise, executing step 404;
step 403, controlling the mouth shape of the virtual imaging body to be synchronous with the broadcasted conversation audio according to a lip synchronization algorithm;
and step 404, controlling limbs of the virtual imaging body to dance according to the broadcasted audio according to the music cooperation algorithm.
Wherein the music collaboration algorithm includes, but is not limited to, the music beat algorithm or the music dance generation algorithm described above.
The intelligent sound box provided by the embodiment can present a standing state by projecting the virtual imaging body in the air above the intelligent sound box in the default starting state, and can randomly present actions of walking, twisting head, arranging schoolbag and the like. When the smart speaker receives an instruction, such as "turn left", the virtual image executes a left turn according to a preset left turn action corresponding to the instruction. When a user interacts with the intelligent sound box, if the intelligent sound box cannot recognize the user instruction after voice recognition and semantic understanding, the intelligent sound box can broadcast 'no-go, no-you instruction, please say again', and the mouth of the virtual imaging body presents a unified state according to the speaking content while broadcasting. When the intelligent sound box receives an instruction of 'please play a song', the virtual imaging body can present chorea actions according to a preset dance action, or present dance actions according to an audio rhythm, or present hand beating and foot beating actions according to the audio rhythm. When the user inquires weather, this virtual imaging body can be according to weather conditions rainy, wind, haze, temperature etc. adjust the dress of virtual imaging body self to with the weather content matching of actual report.
In some examples, as shown in fig. 5, an embodiment of the present invention provides a virtual display method for a smart sound box, where a specific implementation flow of the method is as follows:
500, the intelligent sound box receives a local audio stream in a standby state;
step 501, performing awakening information identification and voiceprint identification on a local audio stream;
step 502, after the awakening information is confirmed to be identified, awakening the intelligent sound box;
step 503, receiving the local audio stream again;
step 504, performing voice recognition and semantic understanding on the local audio stream;
step 505, judging whether the semantics of the local audio stream can be determined according to a local semantic understanding algorithm, if so, executing step 506, otherwise, executing step 510;
step 506, judging whether the scene is a chat scene or not according to the semantics, if so, executing step 507, otherwise, executing step 509;
step 507, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm;
step 508, determining that the chatting is finished, and playing the finished chatting audio;
509, determining that the semantic meaning is a control instruction, switching the state of the corresponding virtual imaging body according to the control instruction, and executing corresponding control;
step 510, judging whether the semantics of the local audio stream can be determined according to a cloud semantic understanding algorithm, if so, executing step 511, otherwise, executing step 513;
step 511, determining the state of the virtual imaging body according to the semantics, and acquiring corresponding media resource according to the semantics;
and step 512, playing the acquired media resources, and determining to control the mouth shape or the action of the virtual imaging body according to a lip synchronization algorithm or a music cooperation algorithm according to the type of the media resources.
Step 513, playing the state of the virtual imaged volume corresponding to the "unintelligible" semantics to convey the unintelligible information to the user.
In some examples, if there are multiple smart speakers, that is, when the wake-up information spoken by the user is received by multiple smart speakers, the embodiment may further select one of the smart speakers to wake up, the wake-up principle in the embodiment is to wake up nearby, that is, the smart speaker "closest" to the user is woken up. As shown in fig. 6, when the method is applied to a main smart speaker, the specific implementation of waking up in this embodiment is as follows:
step 600, if a local audio stream is received in a standby state, performing wakeup information identification according to the received local audio stream;
601, after the awakening information is identified, triggering to generate an awakening request, sending the awakening request to the intelligent sound box, and receiving the awakening request sent by other intelligent sound boxes, wherein the awakening request carries parameters representing the quality of the awakening information;
step 602, determining the smart sound box which needs to be wakened from the received self wake-up request and other wake-up requests, and sending a wake-up agreement request to the smart sound box which needs to be wakened so as to trigger the smart sound box which needs to be wakened to be switched from a standby state to a wake-up state.
Wherein, to the intelligent audio amplifier that needs awaken up sends and agrees awakening the request of awakening up, still include: and sending a closing awakening request to other intelligent sound boxes which do not need to be awakened.
In some examples, the wake request includes, but is not limited to, some or all of the following information: sound loudness, arousal quality score, timestamp. In this embodiment, the main smart speaker may select the wake-up request with the optimal wake-up information quality from the wake-up requests of the main smart speaker and the other smart speakers according to the parameter representing the quality of the wake-up information included in the wake-up request, so as to determine the smart speaker corresponding to the optimal wake-up request to wake up.
In some examples, during the waking process of the main smart sound box, the user may be informed of the selection of the smart sound box that needs to be woken up at this time through the state of the virtual imaging volume, so that the user can better interact with the smart sound box.
In some examples, the master smart speaker is determined by:
generating first random numbers after the plurality of intelligent sound boxes are powered on; each intelligent sound box receives second random numbers sent by other intelligent sound boxes, and compares the first random numbers with the second random numbers; and determining whether the loudspeaker is the main intelligent loudspeaker box or not according to the comparison result.
In some examples, as shown in fig. 7, taking two smart speakers as an example, the implementation flow of determining the main smart speaker is as follows:
step 700, powering on the intelligent sound box 1 and the intelligent sound box 2;
step 701, generating a random number 1 by the intelligent sound box 1, and generating a random number 2 by the intelligent sound box 2;
step 702, the smart speaker 1 broadcasts a random number 1 to the lan port, and carries the IP address of the smart speaker 1, and the smart speaker 2 broadcasts a random number 2 to the lan port, and carries the IP address of the smart speaker 2;
step 703, the smart sound box 1 receives the random number 2, and the smart sound box 2 receives the random number 1;
step 704, the smart sound box 2 compares the random number 1 with the random number 2, and selects the smart sound box corresponding to the maximum random number as the main smart sound box.
In some examples, the same as the wake-up principle of the main smart speaker described above, as shown in fig. 8, the present embodiment further provides a wake-up method applied to other smart speakers, where the implementation flow of the method is as follows:
step 800, if a local audio stream is received in a standby state, performing awakening information identification according to the received local audio stream;
step 801, after the awakening information is identified, generating an awakening request by triggering and sending the awakening request to a main intelligent sound box, so that the main intelligent sound box determines the intelligent sound box to be awakened according to the received awakening requests sent by the intelligent sound box and other intelligent sound boxes and sends an agreement awakening request to the intelligent sound box to be awakened, wherein the awakening request carries parameters representing the quality of the awakening information;
and 802, if a wake-up approval request sent by the main intelligent sound box is received, switching from a standby state to a wake-up state.
In some examples, determining whether the smart speaker is the master smart speaker is by:
generating a first random number after the intelligent sound box is powered on;
and receiving second random numbers sent by other intelligent sound boxes, and determining whether the intelligent sound box is the main intelligent sound box or not according to a comparison result of the first random numbers and the second random numbers.
In some examples, after the smart sound box is switched from the standby state to the wake state, the method further includes:
and the state of the virtual imaging body displayed by the display screen inside the intelligent sound box is refracted to the outside of the intelligent sound box for imaging through the negative refractive index material on the intelligent sound box, wherein the state of the virtual imaging body is determined according to the awakening state of the intelligent sound box.
Embodiment 2 is based on the same inventive concept, and the embodiment of the present invention further provides a display method for an intelligent sound box, because the method is a method corresponding to the intelligent sound box in the embodiment of the present invention, and the principle of the method for solving the problem is similar to that of the intelligent sound box, the implementation of the method can refer to the implementation of the intelligent sound box, and repeated details are not repeated.
As shown in fig. 9, the implementation flow of the method is as follows:
step 900, determining a working scene of the intelligent sound box;
step 901, determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the working scene and the state of the virtual imaging body;
step 902, with a display screen inside the smart sound box as a light source, refracting the state of the virtual imaging object displayed by the display screen to the outside of the smart sound box through a negative refractive index material on the smart sound box for imaging.
In some examples, the determining the working scene of the smart speaker includes:
if the smart sound box does not receive the local audio stream, determining the working state of the smart sound box as a working scene of the smart sound box; or
If the smart sound box receives a local audio stream in a non-standby state, determining semantics of the local audio stream, and determining a working scene of the smart sound box according to the semantics; or the like, or, alternatively,
and if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
In some examples, the identifying the local audio stream with wake up information further comprises:
and performing voiceprint recognition on the local audio stream, and determining user preferences corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to the mode corresponding to the user preferences.
In some examples, after determining the state of the virtual imaging volume corresponding to the work scene, further comprising:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In some examples, the determining a dance action corresponding to the played music comprises:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or, alternatively,
and determining the dance action corresponding to the played music according to the music dance generation algorithm.
Embodiment 3, based on the same inventive concept, the embodiment of the present invention further provides a display device of an intelligent sound box, and as the device is the device of the method in the embodiment of the present invention, and the principle of the device to solve the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 10, the apparatus includes:
a scene determining unit 1000, configured to determine a working scene of the smart speaker;
a state determining unit 1001 configured to determine a state of a virtual imaging object corresponding to a preset working scene according to a correspondence between the working scene and the state of the virtual imaging object;
and the imaging display unit 1002 is used for taking the display screen inside the intelligent sound box as a light source, and refracting the state of the virtual imaging body displayed by the display screen into the outside of the intelligent sound box for imaging through a negative refractive index material on the intelligent sound box.
In some examples, the determining the scene unit is specifically to:
if the smart sound box does not receive the local audio stream, determining the working state of the smart sound box as a working scene of the smart sound box; or
If the intelligent sound box receives a local audio stream in a non-standby state, determining the semantics of the local audio stream, and determining the working scene of the intelligent sound box according to the semantics; or the like, or, alternatively,
and if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
In some examples, the determining the scene unit is further specifically configured to:
and performing voiceprint recognition on the local audio stream, and determining user preference corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to a mode corresponding to the user preference.
In some examples, after determining the state of the virtual imaging volume corresponding to the work scene, the control unit is further configured to:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
In some examples, the control unit is specifically configured to:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or a combination thereof,
and determining the dance action corresponding to the played music according to the music dance generation algorithm.
Based on the same inventive concept, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the steps of:
determining a working scene of the intelligent sound box;
determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the preset working scene and the state of the virtual imaging body;
the method comprises the steps that a display screen inside the intelligent sound box is used as a light source, and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box through a negative refractive index material on the intelligent sound box to be imaged.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. The utility model provides an intelligent sound box, its characterized in that, including being located the inside display screen of intelligent sound box, being located last negative refractive index material and the treater of intelligent sound box, wherein:
the display screen is used for displaying;
the negative refractive index material is used for refracting the content displayed by the display screen to the outside of the intelligent sound box for imaging;
the processor is configured to perform:
determining a working scene of the intelligent sound box;
determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the working scene and the state of the virtual imaging body;
and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box by the negative refractive index material to be imaged by taking the display screen inside the intelligent sound box as a light source.
2. The smart sound box of claim 1, wherein the processing appliance is configured to perform:
if the intelligent sound box does not receive the local audio stream, determining the working state of the intelligent sound box as the working scene of the intelligent sound box; or
If the smart sound box receives a local audio stream in a non-standby state, determining semantics of the local audio stream, and determining a working scene of the smart sound box according to the semantics; or the like, or, alternatively,
and if the intelligent sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining the working scene of the intelligent sound box according to whether the awakening information is identified.
3. The smartspeaker of claim 2, wherein the processor is further specifically configured to perform:
and performing voiceprint recognition on the local audio stream, and determining user preferences corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to the mode corresponding to the user preferences.
4. The smartspeaker of claim 1, wherein after determining the state of the virtual imaging volume corresponding to the work scene, the processor is further configured to perform:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music media assets, playing the acquired non-music media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
5. The smart sound box of claim 4, wherein the processing appliance is configured to perform:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or, alternatively,
and determining dance actions corresponding to the played music according to the music dance generation algorithm.
6. A display method of an intelligent sound box is characterized by comprising the following steps:
determining a working scene of the intelligent sound box;
determining the state of a virtual imaging body corresponding to a preset working scene according to the corresponding relation between the preset working scene and the state of the virtual imaging body;
the method comprises the steps that a display screen inside the intelligent sound box is used as a light source, and the state of the virtual imaging body displayed by the display screen is refracted to the outside of the intelligent sound box through a negative refractive index material on the intelligent sound box to be imaged.
7. The method of claim 6, wherein determining the operational scenario of the smart sound box comprises:
if the smart sound box does not receive the local audio stream, determining the working state of the smart sound box as a working scene of the smart sound box; or
If the intelligent sound box receives a local audio stream in a non-standby state, determining the semantics of the local audio stream, and determining the working scene of the intelligent sound box according to the semantics; or the like, or, alternatively,
if the smart sound box receives a local audio stream in a standby state, performing awakening information identification on the local audio stream, and determining a working scene of the smart sound box according to whether the awakening information is identified.
8. The method of claim 7, wherein the identifying the local audio stream with the wake up information further comprises:
and performing voiceprint recognition on the local audio stream, and determining user preference corresponding to the voiceprint recognition result so as to control the intelligent household equipment through the intelligent sound box according to a mode corresponding to the user preference.
9. The method of claim 6, after determining the state of the virtual imaging volume corresponding to the work scene, further comprising:
if the working scene is determined to be a broadcast dialogue scene, controlling the mouth shape of the virtual imaging body to be synchronous with the played chatting audio stream according to a lip synchronization algorithm; or
If the working scene is determined to be a scene for playing the non-music type media assets, playing the acquired non-music type media assets, and controlling the mouth shape of the virtual imaging body to be synchronous with the played non-music type media assets according to a lip synchronization algorithm; or
And if the working scene is determined to be a music media asset playing scene, playing the acquired music media assets, determining dance actions corresponding to the played music media assets, and controlling limbs of the virtual imaging body to dance according to the dance actions.
10. The method of claim 9, wherein determining the dance motion corresponding to the played music comprises:
according to a music beat algorithm, beat points of played music are monitored, and dance actions corresponding to the beat points at different moments are determined; or the like, or a combination thereof,
and determining the dance action corresponding to the played music according to the music dance generation algorithm.
CN202110517035.2A 2021-05-12 2021-05-12 Intelligent sound box and display method thereof Pending CN115426553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110517035.2A CN115426553A (en) 2021-05-12 2021-05-12 Intelligent sound box and display method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110517035.2A CN115426553A (en) 2021-05-12 2021-05-12 Intelligent sound box and display method thereof

Publications (1)

Publication Number Publication Date
CN115426553A true CN115426553A (en) 2022-12-02

Family

ID=84195432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110517035.2A Pending CN115426553A (en) 2021-05-12 2021-05-12 Intelligent sound box and display method thereof

Country Status (1)

Country Link
CN (1) CN115426553A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587706A (en) * 2009-07-08 2009-11-25 沈阳蓝火炬软件有限公司 System and method for analyzing streaming-media real-time music beats and controlling dance
CN107765852A (en) * 2017-10-11 2018-03-06 北京光年无限科技有限公司 Multi-modal interaction processing method and system based on visual human
CN108052250A (en) * 2017-12-12 2018-05-18 北京光年无限科技有限公司 Virtual idol deductive data processing method and system based on multi-modal interaction
CN110060678A (en) * 2019-04-16 2019-07-26 深圳欧博思智能科技有限公司 A kind of virtual role control method and smart machine based on smart machine
CN110309470A (en) * 2019-05-14 2019-10-08 广东康云科技有限公司 A kind of virtual news main broadcaster system and its implementation based on air imaging
CN111081270A (en) * 2019-12-19 2020-04-28 大连即时智能科技有限公司 Real-time audio-driven virtual character mouth shape synchronous control method
CN211880554U (en) * 2020-04-19 2020-11-06 郭生文 Holographic aerial imaging device on AI intelligent sound box

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587706A (en) * 2009-07-08 2009-11-25 沈阳蓝火炬软件有限公司 System and method for analyzing streaming-media real-time music beats and controlling dance
CN107765852A (en) * 2017-10-11 2018-03-06 北京光年无限科技有限公司 Multi-modal interaction processing method and system based on visual human
CN108052250A (en) * 2017-12-12 2018-05-18 北京光年无限科技有限公司 Virtual idol deductive data processing method and system based on multi-modal interaction
CN110060678A (en) * 2019-04-16 2019-07-26 深圳欧博思智能科技有限公司 A kind of virtual role control method and smart machine based on smart machine
CN110309470A (en) * 2019-05-14 2019-10-08 广东康云科技有限公司 A kind of virtual news main broadcaster system and its implementation based on air imaging
CN111081270A (en) * 2019-12-19 2020-04-28 大连即时智能科技有限公司 Real-time audio-driven virtual character mouth shape synchronous control method
CN211880554U (en) * 2020-04-19 2020-11-06 郭生文 Holographic aerial imaging device on AI intelligent sound box

Similar Documents

Publication Publication Date Title
CN106878820B (en) Live broadcast interaction method and device
JP6448971B2 (en) Interactive device
CN107340991B (en) Voice role switching method, device, equipment and storage medium
CN106804076B (en) A kind of lighting system of smart home
CN107340865A (en) Multi-modal virtual robot exchange method and system
CN110213613B (en) Image processing method, device and storage medium
CN111835986A (en) Video editing processing method and device and electronic equipment
WO2017141530A1 (en) Information processing device, information processing method and program
CN109166575A (en) Exchange method, device, smart machine and the storage medium of smart machine
WO2015198716A1 (en) Information processing apparatus, information processing method, and program
US11647261B2 (en) Electrical devices control based on media-content context
CN111754997B (en) Control device and operation method thereof, and voice interaction device and operation method thereof
CN109343695A (en) Exchange method and system based on visual human's behavioral standard
CN112735423A (en) Voice interaction method and device, electronic equipment and storage medium
CN112652041A (en) Virtual image generation method and device, storage medium and electronic equipment
CN115206306A (en) Voice interaction method, device, equipment and system
CN112463108B (en) Voice interaction processing method and device, electronic equipment and storage medium
JP2024521795A (en) Simulating crowd noise at live events with sentiment analysis of distributed inputs
CN104822095A (en) Composite beat special effect system and composite beat special effect processing method
CN110109377A (en) Control system and method of household appliance and air conditioner
CN106227323A (en) A kind of display packing and device
CN115426553A (en) Intelligent sound box and display method thereof
US20230353707A1 (en) Method for enabling synthetic autopilot video functions and for publishing a synthetic video feed as a virtual camera during a video call
US11627283B2 (en) Method for enabling synthetic autopilot video functions and for publishing a synthetic video feed as a virtual camera during a video call
CN109658924A (en) Conversation message processing method, device and smart machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination