WO2021196647A1 - 交互对象的驱动方法、装置、设备以及存储介质 - Google Patents

交互对象的驱动方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2021196647A1
WO2021196647A1 PCT/CN2020/129830 CN2020129830W WO2021196647A1 WO 2021196647 A1 WO2021196647 A1 WO 2021196647A1 CN 2020129830 W CN2020129830 W CN 2020129830W WO 2021196647 A1 WO2021196647 A1 WO 2021196647A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
control parameter
target data
interactive object
target
Prior art date
Application number
PCT/CN2020/129830
Other languages
English (en)
French (fr)
Inventor
孙林
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021549865A priority Critical patent/JP2022531056A/ja
Priority to SG11202109201XA priority patent/SG11202109201XA/en
Priority to KR1020217027681A priority patent/KR20210124306A/ko
Publication of WO2021196647A1 publication Critical patent/WO2021196647A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a method, device, device, and storage medium for driving interactive objects.
  • the way of human-computer interaction is mostly: the user inputs based on keys, touch, and voice, and the device responds by presenting images, text or virtual characters on the display screen.
  • virtual characters are mostly improved on the basis of voice assistants, and the interaction between users and virtual characters is still on the surface.
  • the embodiments of the present disclosure provide a driving solution for interactive objects.
  • a method for driving an interactive object includes: acquiring sound-driven data of the interactive object displayed by a display device; The control parameter sequence of the setting action of the interactive object matched by the target data; the interactive object is controlled to execute the setting action according to the obtained control parameter sequence.
  • the method further includes: controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or displaying text according to the text information corresponding to the sound-driven data .
  • the controlling the interaction object to perform the setting action according to the obtained control parameter sequence includes: determining the voice information corresponding to the target data; obtaining the voice information output Time information; determine the execution time of the setting action corresponding to the target data according to the time information; according to the execution time, control the interactive object to execute the setting action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters.
  • the control parameter sequence corresponding to the target data is used to control the interactive object to execute the setting.
  • the predetermined action includes: invoking each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • control parameter sequence includes one or more sets of control parameters, and according to the execution time, the interactive object is controlled to execute the control parameter sequence corresponding to the target data.
  • the setting action includes: determining the calling rate of the control parameter sequence according to the execution time; calling each group of control parameters in the control parameter sequence at the calling rate, so that the interactive object outputs and each group of control parameters The posture corresponding to the parameter.
  • the controlling the interactive object to execute the setting action according to the execution time with the control parameter sequence corresponding to the target data includes: outputting the target data corresponding to the The set time before the voice message starts to call the control parameter sequence corresponding to the target data, so that the interactive object starts to perform the set action.
  • the sound-driven data includes multiple target data
  • the controlling the interactive object to perform the setting action according to the obtained control parameter sequence includes: responding to detecting the There is overlap between adjacent target data in the multiple target data, and the interactive object is controlled to perform the setting action according to the control parameter sequence corresponding to the target data arranged first based on the word order.
  • the sound-driven data includes a plurality of target data
  • the controlling the interactive object to perform the setting action according to the control parameter sequence corresponding to the target data includes: responding to detection When the control parameter sequences corresponding to adjacent target data among the multiple target data overlap in execution time, the overlapping parts of the control parameter sequences corresponding to the adjacent target data are merged.
  • the acquiring, based on the target data contained in the sound-driven data, the control parameter sequence of the setting action of the interactive object that matches the target data includes: responding to the The sound-driven data includes audio data, performs voice recognition on the audio data, and determines the target data contained in the audio data according to the recognized voice content; in response to the sound-driven data including text data, according to the text The text content contained in the data determines the target data contained in the text data.
  • the sound-driven data includes syllable data
  • the control of the set action of an interactive object matching the target data is obtained based on the target data contained in the sound-driven data
  • the parameter sequence includes: determining whether the syllable data contained in the sound driving data matches the target syllable data, wherein the target syllable data belongs to a pre-divided syllable type, and a syllable type corresponds to a device Mouth shape, a set of mouth shapes with corresponding control parameter sequences; in response to the syllable data being matched with the target syllable data, based on the syllable type to which the matched target syllable data belongs, the matched syllable data is obtained
  • the control parameter sequence for setting the mouth shape corresponding to the target syllable data is obtained by the target syllable data.
  • the method further includes: acquiring first data other than the target data in the sound-driven data; acquiring an acoustic characteristic of the first data; acquiring a posture matching the acoustic characteristic Control parameters; control the posture of the interactive object according to the posture control parameters.
  • an apparatus for driving an interactive object includes: a first acquisition unit for acquiring sound-driven data of an interactive object displayed by a display device; and a second acquisition unit for The target data contained in the sound-driven data obtains the control parameter sequence of the setting action of the interactive object that matches the target data; the driving unit is used to control the interactive object to execute the setting according to the obtained control parameter sequence Definite action.
  • the device further includes an output unit for controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or, according to the voice information corresponding to the sound-driven data Text information display text.
  • the driving unit is specifically configured to: determine the voice information corresponding to the target data; obtain the time information for outputting the voice information; determine the voice information corresponding to the target data according to the time information Set the execution time of the action; according to the execution time, control the interactive object to execute the set action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: call each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • the control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: determine the call rate of the control parameter sequence according to the execution time; call each group of control parameters in the control parameter sequence at the call rate, so that all The interactive object outputs the posture corresponding to each group of control parameters.
  • the control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: start to call the control parameter sequence corresponding to the target data at a set time before outputting the voice information corresponding to the target data, so that the interactive object starts to execute all The setting action.
  • the sound-driven data includes a plurality of target data
  • the driving unit is specifically configured to respond to detection of overlapping of adjacent target data in the plurality of target data, according to the arrangement based on word order
  • the control parameter sequence corresponding to the previous target data controls the interactive object to perform the setting action.
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that the control parameter sequence corresponding to adjacent target data in the plurality of target data is in The execution time overlaps, and the overlapping parts of the control parameter sequences corresponding to the adjacent target data are merged.
  • the second acquiring unit is specifically configured to: in response to the sound-driven data including audio data, perform voice recognition on the audio data, and perform voice recognition on the audio data according to the voice content contained in the audio data , Determine the target data included in the audio data; in response to the sound-driven data including text data, determine the target data included in the text data according to the text content included in the text data.
  • the sound-driven data includes syllable data
  • the second acquiring unit is specifically configured to determine whether the syllable data contained in the sound-driven data matches the target syllable data, wherein:
  • the target syllable data belongs to a pre-divided syllable type, one syllable type corresponds to a set mouth shape, and a set mouth shape is set with a corresponding control parameter sequence; in response to the syllable data and the set mouth shape.
  • the target syllable data is matched, and based on the syllable type to which the matched target syllable data belongs, a control parameter sequence for setting the mouth shape corresponding to the matched target syllable data is obtained.
  • the device further includes a posture control unit configured to: obtain first data other than the target data in the sound-driven data; obtain the acoustic characteristics of the first data; The posture control parameter matched with the acoustic feature of the first data; the posture of the interactive object is controlled according to the posture control parameter.
  • a posture control unit configured to: obtain first data other than the target data in the sound-driven data; obtain the acoustic characteristics of the first data; The posture control parameter matched with the acoustic feature of the first data; the posture of the interactive object is controlled according to the posture control parameter.
  • an electronic device includes a memory and a processor, the memory is used to store computer instructions that can be run on the processor, and the processor is used to execute the computer instructions when the computer instructions are executed.
  • the method for driving interactive objects described in any of the embodiments provided in the present disclosure is implemented.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for driving an interactive object according to any one of the embodiments provided in the present disclosure is realized.
  • the driving method, device, device, and computer-readable storage medium of an interactive object according to one or more embodiments of the present disclosure, according to at least one target data contained in the sound driving data of the interactive object displayed by the display device, and the target data are obtained
  • the control parameters of the set action of the matched interactive object are used to control the actions of the interactive object displayed by the display device, so that the interactive object can make the action corresponding to the target data contained in the sound-driven data, so that the interactive object can speak
  • the state is natural and vivid, which enhances the interactive experience of the target object.
  • FIG. 1 is a schematic diagram of a display device in a method for driving an interactive object according to an embodiment of the present disclosure
  • Fig. 2 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 3 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 4 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 5 is a schematic structural diagram of a driving device for interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of an electronic device proposed according to an embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides a method for driving interactive objects.
  • the driving method may be executed by electronic devices such as a terminal device or a server.
  • the terminal device may be a fixed terminal or a mobile terminal, such as a mobile phone, a tablet, or a game.
  • the server includes a local server or a cloud server, etc., and the method can also be implemented by a processor calling computer-readable instructions stored in a memory.
  • the interactive object can be any interactive object that can interact with the target object. It can be a virtual character, virtual animal, virtual item, cartoon image, etc., and other virtual objects that can realize interactive functions.
  • the image, the display form of the avatar may be either 2D or 3D, which is not limited in the present disclosure.
  • the target object may be a user, a robot, or other smart devices.
  • the interaction manner between the interaction object and the target object may be an active interaction manner or a passive interaction manner.
  • the target object can make a demand by making gestures or body movements, and trigger the interactive object to interact with it by means of active interaction.
  • the interactive object may actively greet the target object, prompt the target object to make an action, etc., so that the target object interacts with the interactive object in a passive manner.
  • the interactive object may be displayed through electronic equipment, and the electronic equipment may also be a TV, an all-in-one machine with a display function, a projector, a virtual reality (VR) device, or an augmented reality (AR) Devices, etc., the present disclosure does not limit the specific form of the electronic device.
  • the electronic equipment may also be a TV, an all-in-one machine with a display function, a projector, a virtual reality (VR) device, or an augmented reality (AR) Devices, etc., the present disclosure does not limit the specific form of the electronic device.
  • Fig. 1 shows a display device proposed according to an embodiment of the present disclosure.
  • the display device has a display screen, which can display a stereoscopic picture on the display screen to present a virtual scene and interactive objects.
  • the interactive objects displayed on the display screen in Figure 1 are virtual cartoon characters.
  • the electronic device described in the present disclosure may include a built-in display or be integrated with the above-mentioned display device. Through the display or the display device, a stereoscopic picture may be displayed to present a virtual scene and interactive objects. In other embodiments, the electronic device described in the present disclosure may not include a built-in display, and the content to be displayed may be notified through a wired or wireless connection to notify an external display to present a virtual scene and interactive objects.
  • the interactive object in response to the electronic device receiving sound driving data for driving the interactive object to output voice, the interactive object may emit a specified voice to the target object.
  • Sound-driven data can be generated according to the actions, expressions, identities, preferences, etc. of the target object around the electronic device to drive the interactive object to respond by issuing a specified voice, thereby providing anthropomorphic services for the target object.
  • the interactive object while driving the interactive object to emit a specified voice according to the sound-driven data, the interactive object cannot be driven to make facial movements synchronized with the specified voice, so that the interactive object is uttering a voice. Stiff and unnatural, affecting the target audience and interactive experience.
  • an embodiment of the present disclosure proposes a driving method for an interactive object, so as to improve the experience of the target object interacting with the interactive object.
  • FIG. 2 shows a flowchart of a method for driving an interactive object according to an embodiment of the present disclosure. As shown in FIG. 2, the method includes steps 201 to 203.
  • step 201 the sound-driven data of the interactive object displayed by the display device is obtained.
  • the sound driving data may include audio data (voice data), text data, and so on.
  • the sound driving data may be driving data generated by the electronic device according to the actions, expressions, identity, preferences, etc. of the target object interacting with the interactive object, or directly obtained by the electronic device, such as sound driving data called from internal memory Wait.
  • the present disclosure does not limit the acquisition method of the sound-driven data.
  • step 202 based on the target data contained in the sound-driven data, a control parameter sequence of a setting action of an interactive object matching the target data is obtained, and the control parameter sequence includes one or more sets of control parameters .
  • the target data is data that is pre-matched with a setting action, and the setting action is controlled by a corresponding control parameter sequence, so the target data and the control parameter of the setting action Sequence matching.
  • the target data may be set keywords, words, sentences, etc. Taking the keyword "wave” as an example, when text data is included in the sound-driven data, the target data corresponding to "wave” is the text data of "wave", and/or the sound-driven data contains In the case of audio or syllable data, the target data corresponding to "wave” is the voice data of "wave".
  • the sound-driven data matches the above-mentioned target data, it can be determined that the sound-driven data contains the target data.
  • the setting action can be realized by using a universal unit animation.
  • the unit animation can include a sequence of image frames. Each image frame in the sequence corresponds to a posture of the interactive object through the change of the corresponding posture between the image frames. That is, the interactive object can realize the set action.
  • the posture of the interactive object in an image frame can be realized by a set of control parameters, for example, a set of control parameters formed by the displacement of multiple bone points. Therefore, the control parameter sequence formed by multiple sets of control parameters is used to control the posture change of the interactive object, and the interactive object can be controlled to realize the setting action.
  • the target data may include target syllable data, the target syllable data corresponds to a control parameter for setting a mouth shape, a target syllable data belongs to a pre-divided syllable type, and the one One syllable type corresponds to a set mouth shape, and one set mouth shape is set with a corresponding control parameter sequence.
  • the syllable data is a phonetic unit formed by a combination of at least one phoneme, and the syllable data includes syllable data of a pinyin language and syllable data of a non-pinyin language (for example, Chinese).
  • a syllable type refers to syllable data whose pronunciation actions are consistent or basically consistent.
  • a syllable type can correspond to an action of an interactive object. Specifically, a syllable type can be a set mouth when speaking with an interactive object.
  • Type correspondence that is, corresponding to a kind of pronunciation action, so that the syllable data of the same type can match the set control parameter sequence of the same mouth shape, for example, the pinyin "ma”, “man”, “mang” type Syllable data, because the pronunciation actions of this type of syllable data are basically the same, it can be regarded as the same type, and can correspond to the control parameter sequence of the mouth shape of the "mouth open" when the interactive object speaks. In this way, the sound drive data is detected When such target syllable data is included, the interactive object can be controlled to make a corresponding mouth shape according to the control parameter sequence of the mouth shape matched by the target syllable data.
  • multiple different types of mouth shape control parameter sequences can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • Speaking state multiple different types of mouth shape control parameter sequences can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • Speaking state can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • step 203 the interactive object is controlled to execute the setting action according to the obtained control parameter sequence.
  • a corresponding control parameter sequence of the setting action For one or more target data contained in the sound driving data, a corresponding control parameter sequence of the setting action can be obtained.
  • the action of the interactive object is controlled according to the obtained control parameter sequence, that is, the setting action corresponding to each target data in the acoustic driving data can be realized.
  • the control parameter sequence of the set action of the interactive object that matches the target data is obtained to control the display of the display device.
  • the action of the interactive object enables the interactive object to make an action corresponding to the target data contained in the sound-driven data, so that the speaking state of the interactive object is natural and vivid, and the interactive experience of the target object is improved.
  • Fig. 3 shows a flowchart of a method for driving interactive objects according to an embodiment of the present disclosure. As shown in Fig. 3, the method further includes:
  • Step 204 Control the display device to output voice according to the voice information corresponding to the sound-driven data, or control the display device to output voice according to the voice information corresponding to the sound-driven data, and according to the text corresponding to the sound-driven data Information display text.
  • the interactive object While controlling the display device to output the voice corresponding to the sound-driven data, according to the control parameter sequence matched by each target data in the sound-driven data, the interactive object is sequentially controlled to perform corresponding actions, so that the interactive object can output the voice at the same time , Make actions according to the content of the sound, so that the speaking state of the interactive object is natural and vivid, and the interactive experience of the target object is improved.
  • the interactive object performs corresponding actions, so that the interactive object can perform actions according to the content contained in the sound and text while outputting voice and displaying text, so that the state of expression of the interactive object is natural and vivid, and the interaction of the target object is improved Experience.
  • the image frame sequence corresponding to the variable content can be formed, which improves the driving efficiency of the interactive object.
  • the target data can be added or modified as needed to cope with the changed content and facilitate the maintenance and update of the drive system.
  • the method is applied to a server, including a local server or a cloud server, etc.
  • the server processes the sound-driven data of the interactive object, generates the posture parameter value of the interactive object, and generates the posture parameter value according to the posture parameter.
  • the value is rendered using a three-dimensional or two-dimensional rendering engine to obtain the response animation of the interactive object.
  • the server may send the response animation to the terminal device for display to respond to the target object, and may also send the response animation to the cloud, so that the terminal device can obtain the response animation from the cloud to perform the response animation on the target object. Response.
  • the posture parameter value may also be sent to the terminal, so that the terminal completes the process of rendering, generating a response animation, and performing display.
  • the method is applied to a terminal device, which processes the sound-driven data of an interactive object, generates a posture parameter value of the interactive object, and uses 3D or 2D according to the posture parameter value.
  • the rendering engine performs rendering to obtain the response animation of the interactive object, and the terminal may display the response animation to respond to the target object.
  • the voice content contained in the audio data may be obtained by performing voice recognition on the sound-driven data, and the target data contained in the audio data may be determined. By matching the voice content with the target data, the target data contained in the sound-driven data can be determined.
  • the target data included in the text data is determined based on the text content included in the text data.
  • the sound driving data when the sound driving data includes syllable data, the sound driving data is split to obtain at least one syllable data.
  • the sound driving data is split to obtain at least one syllable data.
  • the priority of different splitting methods can be set to change the priority
  • the syllable data combination obtained by the splitting method is used as the splitting result.
  • the split syllable data is matched with the target syllable data.
  • the syllable data matching the target syllable data of any syllable type it can be determined that the syllable data matches the target syllable data, and the sound can be determined
  • the driving data includes the target data.
  • the target syllable data may include "ma”, "man”, and "mang” type syllable data, in response to the sound drive data containing a syllable that matches any of "ma”, "man”, and "mang” Data, it is determined that the sound-driven data includes the target syllable data.
  • the control parameter sequence for setting the mouth shape corresponding to the target syllable data is obtained, and the interactive object is controlled to make the corresponding Mouth type.
  • the mouth shape change of the interactive object can be controlled according to the control parameter sequence of the mouth shape corresponding to the sound-driven data, so that the interactive object realizes a anthropomorphic speaking state.
  • the syllable data obtained by splitting may be multiple syllable data. For each syllable data in multiple syllable data, it is possible to find whether the syllable data matches a certain target syllable data, and when the syllable data matches a certain target syllable data, obtain the set mouth corresponding to the target syllable data Type of control parameter sequence.
  • step 203 further includes:
  • Step 2031 Determine the voice information corresponding to the target data
  • Step 2032 Obtain time information for outputting the voice information
  • Step 2033 Determine the execution time of the set action corresponding to the target data according to the time information.
  • Step 2034 According to the execution time, control the interactive object to execute the setting action with the control parameter sequence corresponding to the target data.
  • the time information of the voice information corresponding to the output target data can be determined, for example, the time when the voice information corresponding to the target data starts to be output, The time and duration to end the output.
  • the execution time of the set action corresponding to the target data may be determined according to the time information, and within the execution time or within a certain range of the execution time, the interaction is controlled by the control parameter sequence corresponding to the target data The object executes the setting action.
  • the duration of outputting the voice according to the sound-driven data is the same or similar to the duration of controlling the interactive object to perform continuous setting actions according to a plurality of control parameter sequences; and for each target data, The duration of outputting the corresponding voice is also consistent or similar to the duration of controlling the interactive object to perform the set action according to the corresponding control parameter sequence, so that the time when the interactive object speaks and the time when the action is performed are matched, thus Synchronize and coordinate the voices and actions of interactive objects.
  • each group of control parameters in the control parameter sequence may be called at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters. That is, the control parameter sequence corresponding to each target data is always executed at a constant speed.
  • the call rate of the control parameter sequence corresponding to the target data is determined according to the execution time of the set action corresponding to the target data, and the call rate corresponding to the target data is called at the call rate.
  • Each group of control parameters in the control parameter sequence enables the interactive object to display a posture corresponding to each group of control parameters.
  • the call rate of the control parameter sequence determines the rate at which the interactive object performs actions. For example, when the control parameter sequence is called at a higher speed, the posture of the interactive object changes relatively fast, so the set action can be completed in a shorter time.
  • the time to perform the set action can be adjusted according to the time when the voice of the target data is output, such as compression or expansion, so that the time when the interactive object performs the set action matches the time when the voice of the target data is output. , So as to synchronize and coordinate the voice and actions of the interacting objects.
  • control parameter sequence corresponding to the target data may be called at a set time before outputting the voice according to the phoneme corresponding to the target data, so that the interactive object starts to perform the setting corresponding to the control parameter sequence action.
  • the interactive object starts to output the voice corresponding to the target data, it starts to call the control parameter sequence corresponding to the target data, so that the interactive object starts to perform the set action, which is more in line with the state of the real person speaking.
  • the speech of the interactive object is more natural and vivid, which improves the interactive experience of the target object.
  • the corresponding target data may be arranged first based on the word order (that is, the natural arrangement order of the received sound-driven data)
  • the control parameter sequence controls the interactive object to perform a corresponding setting action, and ignores the target data that overlaps the target data and is arranged later.
  • Each target data contained in the sound-driven data may be stored in the form of an array, and each target data is an element thereof. It should be noted that since morphemes can be combined in different ways to obtain different target data, there may be overlaps between two adjacent target data among multiple target data. For example, in the case where the text corresponding to the sound-driven data is "The weather is really good", the corresponding target data are respectively: 1, day, 2, weather, 3, really good. For the adjacent target data 1 and 2, they contain the common morpheme " ⁇ ", and the target data 1 and 2 can match the same specified action, for example, pointing up with a finger.
  • the priority of the target data that appears first can be set higher than the target data that follows. Regarding the above example of "the weather is really good", the priority of "day” is higher than that of "weather”. Therefore, the interactive object is controlled to perform the setting action according to the control parameter sequence of the setting action corresponding to "day". And ignore the remaining morpheme "qi" (that is, ignore the target data "weather” that overlaps with the target data "day”), and then directly match "really good”.
  • the overlapping part of the control parameter sequences corresponding to the adjacent target data may be adjusted. Perform fusion.
  • the overlapping parts of the control parameter sequences may be averaged or weighted averaged to achieve the fusion of the overlapping control parameter sequences.
  • an interpolation method can be used to interpolate a frame of the previous action (for example, the Nth group of control parameters n of the first control parameter sequence corresponding to the action) to the next action according to the transition time Transition until the transition to the first frame in the next action coincides (for example, find the first group of control parameters 1 in the second control parameter sequence corresponding to the next action to be the same as the control parameter n, or change the next
  • the action is inserted into the certain frame, so that the total execution time of the two actions after the interpolation transition is the same as the playback or display time of the corresponding voice data/text data), then all the actions after a certain frame in the previous action are ignored Frame, directly execute the next action, thus realizing the fusion of overlapping control parameter sequences.
  • the actions of the interactive objects can be smoothly transitioned, so that the actions of the interactive objects are smooth and natural, and the interactive experience of the target object is improved.
  • data other than each target data for example, referred to as the first data
  • the first data can be based on the attitude control parameters matched by the acoustic characteristics of the first data, and based on the The posture control parameter controls the posture of the interactive object.
  • a sequence of voice frames contained in the first data may be acquired, and acoustic characteristics corresponding to at least one voice frame may be acquired, and the posture control parameters of the interactive object corresponding to the acoustic characteristics may be acquired , Such as a posture control vector, to control the posture of the interactive object.
  • the acoustic features corresponding to the phonemes can be obtained according to the phonemes corresponding to the morphemes in the text data, and the gesture control parameters of the interactive object corresponding to the acoustic features, such as gesture
  • the control vector is used to control the posture of the interactive object.
  • the acoustic feature may be a feature related to speech emotion, such as a fundamental frequency feature, a common peak feature, Mel Frequency Cofficient (MFCC) and so on.
  • MFCC Mel Frequency Cofficient
  • the speech and/or displayed text according to the first data is different from controlling the attitude of the interactive object according to the attitude parameter value.
  • the gesture made by the interactive object is synchronized with the output voice and/or text, giving the target object the feeling that the interactive object is speaking.
  • the attitude control vector is related to the acoustic characteristics of the output sound, driving according to the attitude control vector makes the expression and body movements of the interactive object have emotional factors, making the speaking process of the interactive object more natural and vivid, thereby Improve the interactive experience of the target object.
  • the sound-driven data includes at least one target data, and first data other than the target data.
  • For the first data determine the posture control parameters according to the acoustic characteristics of the first data to control the posture of the interactive object; for the target data, according to the set action matching the target data
  • the parameter sequence is controlled to control the interactive object to make the setting action.
  • FIG. 5 shows a schematic structural diagram of an interactive object driving apparatus according to at least one embodiment of the present disclosure.
  • the apparatus may include: a first obtaining unit 301, configured to obtain a sound drive of an interactive object displayed by a display device Data; the second acquisition unit 302, based on the target data contained in the sound-driven data, to acquire the control parameter sequence of the set action of the interactive object that matches the target data; the driving unit 303, used according to the The obtained control parameter sequence controls the interactive object to perform the setting action.
  • the device further includes an output unit for controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or displaying text according to the text information corresponding to the sound-driven data .
  • the driving unit is specifically configured to: determine the voice information corresponding to the target data; obtain time information for outputting the voice information; determine the setting action corresponding to the target data according to the time information Execution time; according to the execution time, the interactive object is controlled to execute the set action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: call each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: determine the call rate of the control parameter sequence according to the execution time; call each group of control parameters in the control parameter sequence at the call rate to make the interactive object output The attitude corresponding to each set of control parameters.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: start calling the control parameter sequence corresponding to the target data at a set time before outputting the voice information corresponding to the target data, so that the interactive object starts to perform the setting action .
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that adjacent target data in the plurality of target data overlap;
  • the control parameter sequence corresponding to the target data controls the interactive object to perform the setting action.
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that the control parameter sequences corresponding to adjacent target data in the plurality of target data overlap in execution time Fusing the overlapping parts of the control parameter sequences corresponding to the adjacent target data.
  • the second acquiring unit is specifically configured to perform voice recognition on the audio data in response to the sound-driven data including audio data, and determine whether the audio data contains The target data; in response to the sound-driven data including text data, according to the text content contained in the text data, determine the target data contained in the text data.
  • the target data includes target syllable data
  • the second acquiring unit is specifically configured to determine whether the syllable data contained in the sound driving data matches the target syllable data, wherein the target syllable The data belongs to a pre-divided syllable type, one syllable type corresponds to a set mouth shape, and a set mouth shape is set with a corresponding control parameter sequence; in response to the syllable data and the target syllable data Matching, based on the syllable type to which the matched target syllable data belongs, obtaining a control parameter sequence for setting the mouth shape corresponding to the matched target syllable data.
  • the device further includes a posture control unit, configured to: obtain first data other than target data in the sound-driven data; obtain acoustic features of the first data; obtain matching with the acoustic features The posture control parameters of; control the posture of the interactive object according to the posture control parameters.
  • a posture control unit configured to: obtain first data other than target data in the sound-driven data; obtain acoustic features of the first data; obtain matching with the acoustic features The posture control parameters of; control the posture of the interactive object according to the posture control parameters.
  • At least one embodiment of this specification also provides an electronic device. As shown in FIG. 6, the device includes a memory and a processor. The memory is used to store computer instructions that can run on the processor. The processor is used to execute the The method for driving interactive objects described in any embodiment of the present disclosure is realized by computer instructions. At least one embodiment of this specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for driving an interactive object according to any embodiment of the present disclosure is realized.
  • one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the embodiments of the subject and functional operations described in this specification can be implemented in the following: digital electronic circuits, tangible computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or among them A combination of one or more.
  • the embodiments of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or one of the computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Multiple modules.
  • the program instructions may be encoded on artificially generated propagated signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processing and logic flow described in this specification can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
  • the processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from a read-only memory and/or a random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer will be operatively coupled to this mass storage device to receive data from or send data to it. It transmits data, or both.
  • the computer does not have to have such equipment.
  • the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or, for example, a universal serial bus (USB ) Flash drives are portable storage devices, just to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or Removable disks), magneto-optical disks, CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or Removable disks
  • magneto-optical disks CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种交互对象的驱动方法、装置、设备以及存储介质,所述方法包括:获取显示设备展示的交互对象的声音驱动数据(201);基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列(202);根据所获得的控制参数序列控制所述交互对象执行所述设定动作(203)。

Description

交互对象的驱动方法、装置、设备以及存储介质 技术领域
本公开涉及计算机技术领域,具体涉及一种交互对象的驱动方法、装置、设备以及存储介质。
背景技术
人机交互的方式大多为:用户基于按键、触摸、语音进行输入,设备通过在显示屏上呈现图像、文本或虚拟人物进行回应。目前虚拟人物多是在语音助理的基础上改进得到的,用户与虚拟人物的交互还停留表面上。
发明内容
本公开实施例提供一种交互对象的驱动方案。
根据本公开的一方面,提供一种交互对象的驱动方法,所述方法包括:获取显示设备展示的交互对象的声音驱动数据;基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列;根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述方法还包括:根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,和/或,根据所述声音驱动数据对应的文本信息展示文本。
结合本公开提供的任一实施方式,所述根据所获得的控制参数序列控制所述交互对象执行所述设定动作,包括:确定所述目标数据对应的语音信息;获取输出所述语音信息的时间信息;根据所述时间信息确定所述目标数据对应的设定动作的执行时间;根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述控制参数序列包括一组或多组控制参数所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。
结合本公开提供的任一实施方式,所述控制参数序列包括一组或多组控制参数,所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:根据所述执行时间,确定所述控制参数序列的调用速率;以所述调用速率调用所述控制参数序列中的每组控制参数,使所述交互对象输出与每组控制参数对应的姿态。
结合本公开提供的任一实施方式,所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:在输出所述目标数据对应的语音信息之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行所述设定动作。
结合本公开提供的任一实施方式,所述声音驱动数据包含多个目标数据,所述根据所获得的控制参数序列控制所述交互对象执行所述设定动作,包括:响应于检测到所述多个目标数据中相邻目标数据存在重叠,根据基于语序排列在前的目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述声音驱动数据包含多个目标数据,所述根据所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:响应于检测到所述多个目标数据中相邻目标数据对应的控制参数序列在执行时间上重叠,对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
结合本公开提供的任一实施方式,所述基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,包括:响应于所述声音驱动数据包括音频数据,对所述音频数据进行语音识别,根据所识别出的语音内容,确定所述音频数据所包含的目标数据;响应于所述声音驱动数据包括文本数据,根据所述文本数据所包含的文本内容,确定所述文本数据所包含的目标数据。
结合本公开提供的任一实施方式,所述声音驱动数据包括音节数据,所述基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,包括:确定所述声音驱动数据所包含的音节数据是否与目标音节数据相匹配,其中,所述目标音节数据属于预先划分好的一种音节类型,一种音节类型对应于一种设定嘴型,一种设定嘴型设置有对应的控制参数序列;响应于所述音节数据与所述目标音节数据相匹配,基于匹配的所述目标音节数据所属的音节类型,获取与匹配的所述目标音节数据对应的设定嘴型的控制参数序列。
结合本公开提供的任一实施方式,所述方法还包括:获取所述声音驱动数据中目标数据以外的第一数据;获取所述第一数据的声学特征;获取与所述声学特征匹配的姿态控制参数;根据所述姿态控制参数控制所述交互对象的姿态。
根据本公开的一方面,提出一种交互对象的驱动装置,所述装置包括:第一获取单元,用于获取显示设备展示的交互对象的声音驱动数据;第二获取单元,用于基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列;驱动单元,用于根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述装置还包括输出单元,用于根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,和/或,根据所述声音驱动数据对应的文本信息展示文本。
结合本公开提供的任一实施方式,所述驱动单元具体用于:确定所述目标数据对应的语音信息;获取输出所述语音信息的时间信息;根据所述时间信息确定所述目标数据对应的设定动作的执行时间;根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。
结合本公开提供的任一实施方式,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:根据所述执行时间,确定所述控制参数序列的调用速率;以所述调用速率调用所述控制参数序列中的每组控制参数,使所述交互对象输出与每组控制参数对应的姿态。
结合本公开提供的任一实施方式,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:在输出所述目标数据对应的语音信息之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行所述设定动作。
结合本公开提供的任一实施方式,所述声音驱动数据包含多个目标数据,所述驱动单元具体用于响应于检测到所述多个目标数据中相邻目标数据存在重叠,根据基于语序排列在前的目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
结合本公开提供的任一实施方式,所述声音驱动数据包含多个目标数据,所述驱动单元具体用于:响应于检测到所述多个目标数据中相邻目标数据对应的控制参数序列在执行时间上重叠,对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
结合本公开提供的任一实施方式,所述第二获取单元具体用于:响应于所述声音驱动数据包括音频数据,对所述音频数据进行语音识别,根据所述音频数据所包含的语音内容,确定所述音频数据所包含的目标数据;响应于所述声音驱动数据包括文本数据,根据所述文本数据所包含的文本内容,确定所述文本数据所包含的目标数据。
结合本公开提供的任一实施方式,所述声音驱动数据包括音节数据,所述第二获取单元具体用于:确定所述声音驱动数据所包含的音节数据是否与目标音节数据相匹配,其中,所述目标音节数据属于预先划分好的一种音节类型,一种音节类型对应于一种设定嘴型,一种设定嘴型设置有对应的控制参数序列;响应于所述音节数据与所述目标音节数据相匹配,基于匹配的所述目标音节数据所属的音节类型,获取与匹配的所述目标音节数据对应的设定嘴型的控制参数序列。
结合本公开提供的任一实施方式,所述装置还包括姿态控制单元,用于:获取所述声音驱动数据中目标数据以外的第一数据;获取所述第一数据的声学特征;获取与所述第一数据的声学特征匹配的姿态控制参数;根据所述姿态控制参数控制所述交互对象的姿态。
根据本公开的一方面,提供一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开提供的任一实施方式所述的交互对象的驱动方法。
根据本公开的一方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开提供的任一实施方式所述的交互对象的驱动方法。
本公开一个或多个实施例的交互对象的驱动方法、装置、设备及计算机可读存储介质,根据显示设备展示的交互对象的声音驱动数据中包含的至少一个目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数,以控制所述显示设备展示的交互对象的动作,使得交互对象可以做出声音驱动数据中所包含的目标数据对应的动作,从 而使交互对象说话的状态自然生动,提升了目标对象的交互体验。
附图说明
图1是根据本公开实施例提出的交互对象的驱动方法中显示设备的示意图;
图2是根据本公开实施例提出的交互对象的驱动方法的流程图;
图3是根据本公开实施例提出的交互对象的驱动方法的流程图;
图4是根据本公开实施例提出的交互对象的驱动方法的流程图;
图5是根据本公开实施例提出的交互对象的驱动装置的结构示意图;
图6是根据本公开实施例提出的电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所述的、本公开的一些方面相一致的装置和方法的例子。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
本公开至少一个实施例提供了一种交互对象的驱动方法,所述驱动方法可以由终端设备或服务器等电子设备执行,所述终端设备可以是固定终端或移动终端,例如手机、平板电脑、游戏机、台式机、广告机、一体机、车载终端等等,所述服务器包括本地服务器或云端服务器等,所述方法还可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
在本公开实施例中,交互对象可以是任意一种能够与目标对象进行交互的交互对象,其可以是虚拟人物,还可以是虚拟动物、虚拟物品、卡通形象等等其他能够实现交互功能的虚拟形象,虚拟形象的展现形式即可以是2D形式也可以是3D形式,本公开对此 并不限定。所述目标对象可以是用户,也可以是机器人,还可以是其他智能设备。所述交互对象和所述目标对象之间的交互方式可以是主动交互方式,也可以是被动交互方式。一示例中,目标对象可以通过做出手势或者肢体动作来发出需求,通过主动交互的方式来触发交互对象与其交互。另一示例中,交互对象可以通过主动打招呼、提示目标对象做出动作等方式,使得目标对象采用被动方式与交互对象进行交互。
所述交互对象可以通过电子设备进行展示,所述电子设备还可以是电视机、带有显示功能的一体机、投影仪、虚拟现实(Virtual Reality,VR)设备、增强现实(Augmented Reality,AR)设备等,本公开并不限定电子设备的具体形式。
图1示出根据本公开实施例提出的显示设备。如图1所示,该显示设备具有显示屏,其可以在显示屏上显示立体画面,以呈现出虚拟场景以及交互对象。例如图1中显示屏显示的交互对象有虚拟卡通人物。
本公开中所述的电子设备可以包括内置的显示器或与上述显示设备集成为一体,通过显示器或显示设备,可以显示立体画面,以呈现出虚拟场景以及交互对象。在另一些实施例中,本公开中所述的电子设备还可以不包括内置的显示器,所需显示的内容可以通过有线或无线的连接通知外接的显示器呈现出虚拟场景以及交互对象。
在一些实施例中,响应于电子设备接收到用于驱动交互对象输出语音的声音驱动数据,交互对象可以对目标对象发出指定语音。可以根据电子设备周边目标对象的动作、表情、身份、偏好等,生成声音驱动数据,以驱动交互对象通过发出指定语音进行回应,从而为目标对象提供拟人化的服务。在交互对象与目标对象的交互过程中,存在根据该声音驱动数据驱动交互对象发出指定语音的同时,无法驱动所述交互对象做出与该指定语音同步的面部动作,使得交互对象在发出语音时呆板、不自然,影响了目标对象与交互体验。基于此,本公开实施例提出一种交互对象的驱动方法,以提升目标对象与交互对象进行交互的体验。
图2示出根据本公开实施例的交互对象的驱动方法的流程图,如图2所示,所述方法包括步骤201~步骤203。
在步骤201中,获取显示设备展示的交互对象的声音驱动数据。
在本公开实施例中,所述声音驱动数据可以包括音频数据(语音数据)、文本数据等等。所述声音驱动数据可以是电子设备根据与交互对象进行交互的目标对象的动作、表情、身份、偏好等生成的驱动数据,也可以是电子设备直接获取的,比如从内部存储 器调用的声音驱动数据等。本公开对于该声音驱动数据的获取方式不进行限制。
在步骤202中,基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,所述控制参数序列包括一组或多组控制参数。
在本公开实施例中,目标数据为预先匹配了设定动作的数据,而所述设定动作通过相应的控制参数序列进行控制而实现,因而所述目标数据与所述设定动作的控制参数序列匹配。所述目标数据可以是设置的关键字、词、句等等。以关键词为“挥手”为例,在所述声音驱动数据中包含了文本数据时,“挥手”对应的目标数据为“挥手”的文本数据,和/或在所述声音驱动数据中包含了音频或音节数据时,“挥手”对应的目标数据为“挥手”的语音数据。在所述声音驱动数据匹配到上述目标数据时,则可以确定所述声音驱动数据中包含了目标数据。
所述设定动作可以利用通用的单元动画实现,该单元动画可以包含图像帧序列,该序列中的每个图像帧对应于所述交互对象的一个姿态,通过图像帧之间对应的姿态的变化即可以使交互对象实现设定动作。其中,一个图像帧中交互对象姿态可以通过一组控制参数实现,例如多个骨骼点的位移形成的一组控制参数。因此,利用多组控制参数形成的控制参数序列来控制交互对象的姿态变化,能够控制交互对象实现设定动作。
在一些实施例中,所述目标数据可以包括目标音节数据,所述目标音节数据与设定嘴型的控制参数相对应,一种目标音节数据属于预先划分好的一种音节类型,所述一种音节类型对应于一种设定嘴型,一种设定嘴型设置有对应的控制参数序列。
其中,音节数据是由至少一个音素组合形成的语音单位,所述音节数据包括拼音语言的音节数据,和非拼音语言(例如,汉语)的音节数据。一种音节类型是指发音动作一致或者基本一致的音节数据,一种音节类型可与交互对象的一种动作对应,具体的,一种音节类型可与交互对象说话时的一种设定的嘴型对应,即与一种发音动作对应,这样,同种类型的音节数据可以匹配设定的同种嘴型的控制参数序列,比如,拼音“ma”、“man”、“mang”这类型的音节数据,由于这类音节数据的发音动作基本一致,故可以视为同一类型,均可对应交互对象说话时“嘴巴张开”的嘴型的控制参数序列,这样,在检测到声音驱动数据中包括此类目标音节数据时,可根据该目标音节数据所匹配的嘴型的控制参数序列来控制交互对象做出对应的嘴型。进而,通过多种类型的音节数据,可匹配出多个不同类型的嘴型的控制参数序列,进而可以利用所述多个控制参数序列来控制交互对象的嘴型变化,控制交互对象实现拟人的说话状态。
在步骤203中,根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
对于所述声音驱动数据中所包含的一个或多个目标数据,均可以获得相应的设定动作的控制参数序列。根据所获得的控制参数序列控制所述交互对象的动作,即可以实现所述声时驱动数据中各个目标数据对应的设定动作。
在本公开实施例中,根据显示设备展示的交互对象的声音驱动数据中包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,以控制所述显示设备展示的交互对象的动作,使得交互对象可以做出声音驱动数据中所包含的目标数据对应的动作,从而使交互对象说话的状态自然生动,提升了目标对象的交互体验。
图3示出了根据本公开实施例的交互对象的驱动方法的流程图,如图3所示,所述方法还包括:
步骤204,根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,或者根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,并根据所述声音驱动数据对应的文本信息展示文本。
在控制显示设备输出声音驱动数据对应的语音的同时,根据所述声音驱动数据中各个目标数据匹配的控制参数序列,依次控制所述交互对象执行相应的动作,使得交互对象能够在输出语音的同时,根据声音所包含的内容做出动作,从而使交互对象说话的状态自然生动,提升了目标对象的交互体验。
还可以在控制显示设备输出声音驱动数据对应的语音的同时,在所述显示设备展示所述声音驱动数据对应的文本,再根据所述声音驱动数据中各个目标数据匹配的控制参数序列,依次控制所述交互对象执行相应的动作,使得交互对象能够在输出语音、展示文本的同时,根据声音、文本所包含的内容做出动作,从而使交互对象表达的状态自然生动,提升了目标对象的交互体验。
在本公开实施例中,由于只需要针对指定动作设置控制参数序列,即可以组成可变内容对应的图像帧序列,提高了交互对象的驱动效率。此外,目标数据可以根据需要进行增加或者修改,以应对变化的内容,便于对驱动***的维护和更新。
在一些实施例中,所述方法应用于服务器,包括本地服务器或云端服务器等,所述服务器对于交互对象的声音驱动数据进行处理,生成所述交互对象的姿态参数值,并根据所述姿态参数值利用三维或二维渲染引擎进行渲染,得到所述交互对象的回应动画。所述服务器可以将所述回应动画发送至终端设备进行展示来对目标对象进行回应,还可 以将所述回应动画发送至云端,以使终端设备能够从云端获取所述回应动画来对目标对象进行回应。在服务器生成所述交互对象的姿态参数值后,还可以将所述姿态参数值发送至终端,以使终端完成渲染、生成回应动画、进行展示的过程。
在一些实施例中,所述方法应用于终端设备,所述终端设备对于交互对象的声音驱动数据进行处理,生成所述交互对象的姿态参数值,并根据所述姿态参数值利用三维或二维渲染引擎进行渲染,得到所述交互对象的回应动画,所述终端可以展示所述回应动画以对目标对象进行回应。
响应于声音驱动数据包括音频数据,可以通过对声音驱动数据进行语音识别,获得所述音频数据所包含的语音内容,并确定所述音频数据所包含的目标数据。通过将语音内容与目标数据进行匹配,可以确定所述声音驱动数据中所包含的目标数据。
响应于声音驱动数据包括文本数据,根据所述文本数据所包含的文本内容,确定所述文本数据所包含的目标数据。
在一些实施例中,在所述声音驱动数据包括音节数据的情况下,对所述声音驱动数据进行拆分得到至少一个音节数据。本领域技术人员应当理解,对于声音驱动数据的拆分方式可以不止一种,不同的拆分方式可以得到不同的音节数据组合,可以通过对不同的拆分方式设置优先级,将优先级高的拆分方式所得到的音节数据组合作为拆分结果。
将拆分得到的音节数据与目标音节数据进行匹配,响应于所述音节数据与任一音节类型的目标音节数据匹配,则可以确定所述音节数据与目标音节数据匹配,进而可以确定所述声音驱动数据包含所述目标数据。例如,目标音节数据可以包括“ma”、“man”、“mang”类型的音节数据,响应于所述声音驱动数据包含与“ma”、“man”、“mang”中的任一个匹配的音节数据,则确定所述声音驱动数据包含所述目标音节数据。
在所述声音驱动数据包含目标音节数据的情况下,根据所述目标音节数据所属的音节类型,获取与所述目标音节数据对应的设定嘴型的控制参数序列,控制交互对象做出对应的嘴型。通过上述方式,根据声音驱动数据所对应的嘴型的控制参数序列能够控制所述交互对象的嘴型变化,从而使交互对象实现拟人的说话状态。
拆分得到音节数据可以是多个音节数据。可以针对多个音节数据中的每个音节数据,查找该音节数据是否与某一目标音节数据匹配,当该音节数据与某一目标音节数据匹配时,获取与该目标音节数据对应的设定嘴型的控制参数序列。
在一些实施例中,如图4所示,步骤203进一步包括:
步骤2031:确定所述目标数据对应的语音信息;
步骤2032:获取输出所述语音信息的时间信息;
步骤2033:根据所述时间信息确定所述目标数据对应的设定动作的执行时间;以及
步骤2034:根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
在根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音的情况下,可以确定输出目标数据所对应的语音信息的时间信息,例如开始输出所述目标数据对应的语音信息的时间、结束输出的时间以及持续时间。可以根据所述时间信息确定所述目标数据对应的设定动作的执行时间,在所述执行时内,或者在执行时间的一定范围内,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
在本公开实施例中,根据声音驱动数据输出语音的持续时间,与根据多个控制参数序列控制交互对象执行连续设定动作的持续时间,是一致的或者相近的;并且对于每个目标数据,输出对应的语音的持续时间,与根据对应的控制参数序列控制交互对象执行设定动作的持续时间,也是一致的或者相近的,以使交互对象说话的时间与进行动作的时间是匹配的,从而使交互对象的语音和动作同步、协调。
在一些实施例中,可以以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。也即,始终以恒定的速度来执行各个目标数据所对应的控制参数序列。
在目标数据对应的音素数目较少,而目标数据所匹配的设定动作的控制参数序列较长的情况下,也即在交互对象说出目标数据的时间较短,而执行动作的时间较长的情况下,可以在输出语音结束的同时,也停止调用该控制参数序列,停止执行该设定动作。并且,对于该设定动作执行结束的姿态,与下一指定动作开始执行的姿态,进行平滑的过渡,以使所述交互对象的动作流畅、自然,提高目标对象的交互感受。
在一些实施例中,对于每个目标数据,根据该目标数据对应的设定动作的执行时间,确定该目标数据对应的控制参数序列的调用速率,并以所述调用速率调用该目标数据对应的控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。
在执行时间较短时,控制参数序列的调用速率相对较高;反之则较低。而控制参数序列的调用速率决定了交互对象执行动作的速率。例如,在以较高的速度调用控制参数序列的情况下,交互对象的姿态变化速度也相应较快,因而可以在较短的时间里完成设 定动作。
在一些实施例中,可以根据输出目标数据的语音的时间对执行设定动作的时间进行调整,例如进行压缩或扩展,使得交互对象执行设定动作的时间与输出目标数据的语音的时间是匹配的,从而使交互对象的语音和动作同步、协调。
在一个示例中,可以在根据所述目标数据对应的音素输出语音之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行与控制参数序列对应的设定动作。
例如,在交互对象开始输出目标数据对应的语音之前的极短时间,例如0.1秒,开始调用目标数据对应的控制参数序列,使交互对象开始执行设定动作,更加符合真实人物说话的状态,使交互对象的说话更加自然、生动,提高了目标对象的交互体验。
在一些实施例中,在检测到多个目标数据中相邻目标数据存在重叠的情况下,可以根据基于语序(即,接收到的声音驱动数据的自然排列顺序)排列在前的目标数据对应的控制参数序列控制所述交互对象执行对应的设定动作,并忽略与该目标数据重叠的排列在后的目标数据。
可以将所述声音驱动数据所包含的各个目标数据以数组的形式进行存储,每个目标数据为其中的元素。应当注意的是,由于语素之间可以通过不同的方式进行组合,而得到不同的目标数据,因此,多个目标数据中相邻的两个目标数据之间可能存在重叠部分。例如,在声音驱动数据对应的文本是“天气真好”的情况下,其所对应的目标数据分别为:1、天,2、天气,3、真好。对于相邻目标数据1和2,它们之间包含了共同的语素“天”,并且目标数据1和2可以匹配相同的指定动作,例如用手指指向上方。
可以通过为各个目标数据分别设置优先级,根据优先级来确定执行重叠的目标数据中的哪一个。
在一个示例中,可以将首先出现的目标数据的优先级设置为高于后面的目标数据。针对以上“天气真好”的示例,“天”的优先级高于“天气”,因此,则根据“天”所对应的设定动作的控制参序列来控制所述交互对象执行设定动作,并忽略余下的语素“气”(即忽略与目标数据“天”重叠的目标数据“天气”),接下来直接匹配“真好”。
在本公开实施例中,通过对于相邻目标数据重叠的情况设置匹配规则,可以避免交互对象重复执行设定动作。
在一些实施例中,在检测到所述多个目标数据中相邻目标数据对应的控制参数序列 在执行时间上重叠的情况下,可以对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
在一个实施例中,可以将控制参数序列的重叠部分进行平均或者加权平均,以实现重叠的控制参数序列的融合。
在另一实施例中,可以利用插值的方法,将上一个动作的某一帧(例如,该动作对应的第一控制参数序列的第N组控制参数n),按照过渡时间向下一个动作插值过渡,直到过渡到与下一个动作中第一帧开始重合(例如,找到下一个动作对应的第二控制参数序列中的第1组控制参数1与所述控制参数n相同,或者,将下一动作***到所述某一帧处,使得经过插值过渡后两个动作的总执行时间与相应的语音数据/文本数据的播放或显示时间相同),则忽略上一个动作中某一帧之后的所有帧,直接执行下一个动作,从而实现了重叠的控制参数序列的融合。
通过对所述相邻目标数据对应的控制参数序列的重叠部分进行融合,使得交互对象的动作之间可以平滑过渡,以使所述交互对象的动作流畅、自然,提高目标对象的交互感受。
在一些实施例中,对于所述声音驱动数据中,各个目标数据以外的其他数据,例如将其称为第一数据,可以根据所述第一数据的声学特征匹配的姿态控制参数,并根据所述姿态控制参数控制所述交互对象的姿态。
响应于所述声音驱动数据包括音频数据,可以获取所述第一数据包含的语音帧序列,并获取至少一个语音帧对应的声学特征,根据所述声学特征对应的所述交互对象的姿态控制参数,例如姿态控制向量,来控制所述交互对象的姿态。
响应于所述声音驱动数据包括文本数据,可以根据文本数据中的语素所对应的音素,获取所述音素对应的声学特征,根据所述声学特征对应的所述交互对象的姿态控制参数,例如姿态控制向量,来控制所述交互对象的姿态。
在本公开实施例中,声学特征可以是与语音情感相关的特征,例如基频特征、共峰特征、梅尔频率倒谱系数(Mel Frequency Cofficient,MFCC)等等。
由于所述姿态控制参数值是与所述语音段的语音帧序列是匹配的,因此根据所述第一数据输出的语音和/展示的文本,与根据所述姿态参数值控制交互对象的姿态是同步进行的情况下,交互对象所做出的姿态与输出的语音和/或文本是同步的,给目标对象以所述交互对象正在说话的感觉。并且由于所述姿态控制向量是与输出声音的声学特征相关 的,根据所述姿态控制向量进行驱动使得交互对象的表情和肢体动作具有了情感因素,使得交互对象的说话过程更加自然、生动,从而提高了目标对象的交互体验。
在一些实施例中,所述声音驱动数据包括至少一个目标数据,以及所述目标数据以外的第一数据。对于所述第一数据,根据所述第一数据的声学特征来确定姿态控制参数,以控制所述交互对象的姿态;对于所述目标数据,则根据与所述目标数据匹配的设定动作的控制参数序列,控制所述交互对象做出所述设定动作。
图5示出根据本公开至少一个实施例的交互对象的驱动装置的结构示意图,如图5所示,该装置可以包括:第一获取单元301,用于获取显示设备展示的交互对象的声音驱动数据;第二获取单元302,用于基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列;驱动单元303,用于根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
在一些实施例中,所述装置还包括输出单元,用于根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,和/或,根据所述声音驱动数据对应的文本信息展示文本。
在一些实施例中,所述驱动单元具体用于:确定所述目标数据对应的语音信息;获取输出所述语音信息的时间信息;根据所述时间信息确定所述目标数据对应的设定动作的执行时间;根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
在一些实施例中,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。
在一些实施例中,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:根据所述执行时间,确定所述控制参数序列的调用速率;以所述调用速率调用所述控制参数序列中的每组控制参数,使所述交互对象输出与每组控制参数对应的姿态。
在一些实施例中,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行 所述设定动作时,具体用于:在输出所述目标数据对应的语音信息之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行所述设定动作。
在一些实施例中,所述声音驱动数据包含多个目标数据,所述驱动单元具体用于:响应于检测到所述多个目标数据中相邻目标数据存在重叠;根据基于语序排列在前的目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
在一些实施例中,所述声音驱动数据包含多个目标数据,所述驱动单元具体用于:响应于检测到所述多个目标数据中相邻目标数据对应的控制参数序列在执行时间上重叠,对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
在一些实施例中,所述第二获取单元具体用于:响应于所述声音驱动数据包括音频数据,对所述音频数据进行语音识别,根据识别出的语音内容,确定所述音频数据所包含的目标数据;响应于所述声音驱动数据包括文本数据,根据所述文本数据所包含的文本内容,确定所述文本数据所包含的目标数据。
在一些实施例中,所述目标数据包括目标音节数据,所述第二获取单元具体用于:确定所述声音驱动数据所包含的音节数据是否与目标音节数据相匹配,其中,所述目标音节数据属于预先划分好的一种音节类型,一种音节类型对应于一种设定嘴型,一种设定嘴型设置有对应的控制参数序列;响应于所述音节数据与所述目标音节数据相匹配,基于匹配的所述目标音节数据所属的音节类型,获取与匹配的所述目标音节数据对应的设定嘴型的控制参数序列。
在一些实施例中,所述装置还包括姿态控制单元,用于:获取所述声音驱动数据中目标数据以外的第一数据;获取所述第一数据的声学特征;获取与所述声学特征匹配的姿态控制参数;根据所述姿态控制参数控制所述交互对象的姿态。
本说明书至少一个实施例还提供了一种电子设备,如图6所示,所述设备包括存储器、处理器,存储器用于存储可在处理器上运行的计算机指令,处理器用于在执行所述计算机指令时实现本公开任一实施例所述的交互对象的驱动方法。本说明书至少一个实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的交互对象的驱动方法。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、***或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用 在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本说明书中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机 不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位***(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本说明书包含许多具体实施细节,但是这些不应被解释为限制任何发明的范围或所要求保护的范围,而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种***模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和***通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。

Claims (20)

  1. 一种交互对象的驱动方法,包括:
    获取显示设备展示的交互对象的声音驱动数据;
    基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列;
    根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,和/或,根据所述声音驱动数据对应的文本信息展示文本。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所获得的控制参数序列控制所述交互对象执行所述设定动作,包括:
    确定所述目标数据对应的语音信息;
    获取输出所述语音信息的时间信息;
    根据所述时间信息确定所述目标数据对应的设定动作的执行时间;
    根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
  4. 根据权利要求3所述的方法,其特征在于,所述控制参数序列包括一组或多组控制参数,所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:
    以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态。
  5. 根据权利要求3所述的方法,其特征在于,所述控制参数序列包括一组或多组控制参数,所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:
    根据所述执行时间,确定所述控制参数序列的调用速率;
    以所述调用速率调用所述控制参数序列中的每组控制参数,使所述交互对象输出与每组控制参数对应的姿态。
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:
    在输出所述目标数据对应的语音信息之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行所述设定动作。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述声音驱动数据包含多个目标数据,所述根据所获得的控制参数序列控制所述交互对象执行所述设定动作,包括:
    响应于检测到所述多个目标数据中相邻目标数据存在重叠,根据基于语序排列在前的目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
  8. 根据权利要求1至6任一项所述的方法,其特征在于,所述声音驱动数据包含多个目标数据,所述根据所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作,包括:
    响应于检测到所述多个目标数据中相邻目标数据对应的控制参数序列在执行时间上重叠,对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,包括:
    响应于所述声音驱动数据包括音频数据,对所述音频数据进行语音识别,根据所识别出的语音内容,确定所述音频数据所包含的目标数据;
    响应于所述声音驱动数据包括文本数据,根据所述文本数据所包含的文本内容,确定所述文本数据所包含的目标数据。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述声音驱动数据包括音节数据,
    所述基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列,包括:
    确定所述声音驱动数据所包含的音节数据是否与目标音节数据相匹配,其中,所述目标音节数据属于预先划分好的一种音节类型,一种音节类型对应于一种设定嘴型,一 种设定嘴型设置有对应的控制参数序列;
    响应于所述音节数据与所述目标音节数据相匹配,基于匹配的所述目标音节数据所属的音节类型,获取与匹配的所述目标音节数据对应的设定嘴型的控制参数序列。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述方法还包括:
    获取所述声音驱动数据中目标数据以外的第一数据;
    获取所述第一数据的声学特征;
    获取与所述声学特征匹配的姿态控制参数;
    根据所述姿态控制参数控制所述交互对象的姿态。
  12. 一种交互对象的驱动装置,包括:
    第一获取单元,用于获取显示设备展示的交互对象的声音驱动数据;
    第二获取单元,用于基于所述声音驱动数据中所包含的目标数据,获取与所述目标数据匹配的交互对象的设定动作的控制参数序列;
    驱动单元,用于根据所获得的控制参数序列控制所述交互对象执行所述设定动作。
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括输出单元,用于根据所述声音驱动数据对应的语音信息控制所述显示设备输出语音,和/或,根据所述声音驱动数据对应的文本信息展示文本。
  14. 根据权利要求12或13所述的装置,其特征在于,所述驱动单元具体用于:
    确定所述目标数据对应的语音信息;
    获取输出所述语音信息的时间信息;
    根据所述时间信息确定所述目标数据对应的设定动作的执行时间;
    根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作。
  15. 根据权利要求14所述的装置,其特征在于,所述控制参数序列包括一组或多组控制参数;所述驱动单元在用于根据所述执行时间,以所述目标数据对应的控制参数序列控制所述交互对象执行所述设定动作时,具体用于:
    以设定速率调用所述控制参数序列中的每组控制参数,使所述交互对象展示与每组控制参数对应的姿态;或,
    根据所述执行时间,确定所述控制参数序列的调用速率;
    以所述调用速率调用所述控制参数序列中的每组控制参数,使所述交互对象输出与每组控制参数对应的姿态;或,
    在输出所述目标数据对应的语音信息之前的设定时间,开始调用所述目标数据对应的控制参数序列,使所述交互对象开始执行所述设定动作。
  16. 根据权利要求12至15任一项所述的装置,其特征在于,所述声音驱动数据包含多个目标数据,所述驱动单元具体用于:
    响应于检测到所述多个目标数据中相邻目标数据存在重叠,根据基于语序排列在前的目标数据对应的控制参数序列控制所述交互对象执行所述设定动作;或,
    响应于检测到所述多个目标数据中相邻目标数据对应的控制参数序列在执行时间上重叠,对所述相邻目标数据对应的控制参数序列的重叠部分进行融合。
  17. 根据权利要求12至16任一项所述的装置,其特征在于,所述声音驱动数据包括音节数据,
    所述第二获取单元具体用于:
    确定所述声音驱动数据所包含的音节数据是否与目标音节数据相匹配,其中,所述目标音节数据属于预先划分好的一种音节类型,一种音节类型对应于一种设定嘴型,一种设定嘴型设置有对应的控制参数序列;
    响应于所述音节数据与所述目标音节数据相匹配,基于匹配的所述目标音节数据所属的音节类型,获取与匹配的所述目标音节数据对应的设定嘴型的控制参数序列。
  18. 根据权利要求12至17任一项所述的装置,其特征在于,所述装置还包括姿态控制单元,用于:
    获取所述声音驱动数据中目标数据以外的第一数据;
    获取所述第一数据的声学特征;
    获取与所述声学特征匹配的姿态控制参数;
    根据所述姿态控制参数控制所述交互对象的姿态。
  19. 一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权 利要求1至11任一项所述的方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。
PCT/CN2020/129830 2020-03-31 2020-11-18 交互对象的驱动方法、装置、设备以及存储介质 WO2021196647A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021549865A JP2022531056A (ja) 2020-03-31 2020-11-18 インタラクティブ対象の駆動方法、装置、デバイス、及び記録媒体
SG11202109201XA SG11202109201XA (en) 2020-03-31 2020-11-18 Methods, apparatuses, electronic devices and storage media for driving an interactive object
KR1020217027681A KR20210124306A (ko) 2020-03-31 2020-11-18 인터랙티브 대상의 구동 방법, 장치, 디바이스 및 기록 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010245772.7A CN111459451A (zh) 2020-03-31 2020-03-31 交互对象的驱动方法、装置、设备以及存储介质
CN202010245772.7 2020-03-31

Publications (1)

Publication Number Publication Date
WO2021196647A1 true WO2021196647A1 (zh) 2021-10-07

Family

ID=71683496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129830 WO2021196647A1 (zh) 2020-03-31 2020-11-18 交互对象的驱动方法、装置、设备以及存储介质

Country Status (6)

Country Link
JP (1) JP2022531056A (zh)
KR (1) KR20210124306A (zh)
CN (1) CN111459451A (zh)
SG (1) SG11202109201XA (zh)
TW (1) TWI759039B (zh)
WO (1) WO2021196647A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348297A (ja) * 1993-06-10 1994-12-22 Osaka Gas Co Ltd 発音練習装置
CN109599113A (zh) * 2019-01-22 2019-04-09 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110413841A (zh) * 2019-06-13 2019-11-05 深圳追一科技有限公司 多态交互方法、装置、***、电子设备及存储介质
CN110853614A (zh) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 虚拟对象口型驱动方法、装置及终端设备
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827034B1 (en) * 2002-11-27 2010-11-02 Totalsynch, Llc Text-derived speech animation tool
US10630751B2 (en) * 2016-12-30 2020-04-21 Google Llc Sequence dependent data message consolidation in a voice activated computer network environment
KR20140052155A (ko) * 2012-10-19 2014-05-07 삼성전자주식회사 디스플레이 장치, 디스플레이 장치 제어 방법 및 디스플레이 장치의 제어를 위한 정보처리장치
JP5936588B2 (ja) * 2013-09-30 2016-06-22 Necパーソナルコンピュータ株式会社 情報処理装置、制御方法、及びプログラム
WO2015116151A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Voice input command
JP2015166890A (ja) * 2014-03-03 2015-09-24 ソニー株式会社 情報処理装置、情報処理システム、情報処理方法及びプログラム
EP3371778A4 (en) * 2015-11-06 2019-06-26 Mursion, Inc. CONTROL SYSTEM FOR VIRTUAL FIGURES
CN106056989B (zh) * 2016-06-23 2018-10-16 广东小天才科技有限公司 一种语言学习方法及装置、终端设备
KR20190100428A (ko) * 2016-07-19 2019-08-28 게이트박스 가부시키가이샤 화상 표시장치, 화제 선택 방법, 화제 선택 프로그램, 화상 표시 방법 및 화상 표시 프로그램
CN106873773B (zh) * 2017-01-09 2021-02-05 北京奇虎科技有限公司 机器人交互控制方法、服务器和机器人
CN107340859B (zh) * 2017-06-14 2021-04-06 北京光年无限科技有限公司 多模态虚拟机器人的多模态交互方法和***
CN107861626A (zh) * 2017-12-06 2018-03-30 北京光年无限科技有限公司 一种虚拟形象被唤醒的方法及***
TWI658377B (zh) * 2018-02-08 2019-05-01 佳綸生技股份有限公司 機器人輔助互動系統及其方法
CN108942919B (zh) * 2018-05-28 2021-03-30 北京光年无限科技有限公司 一种基于虚拟人的交互方法及***
CN110176284A (zh) * 2019-05-21 2019-08-27 杭州师范大学 一种基于虚拟现实的言语失用症康复训练方法
JP2019212325A (ja) * 2019-08-22 2019-12-12 株式会社Novera 情報処理装置、ミラーデバイス、プログラム
CN110815258B (zh) * 2019-10-30 2023-03-31 华南理工大学 基于电磁力反馈和增强现实的机器人遥操作***和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348297A (ja) * 1993-06-10 1994-12-22 Osaka Gas Co Ltd 発音練習装置
CN110853614A (zh) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 虚拟对象口型驱动方法、装置及终端设备
CN109599113A (zh) * 2019-01-22 2019-04-09 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110413841A (zh) * 2019-06-13 2019-11-05 深圳追一科技有限公司 多态交互方法、装置、***、电子设备及存储介质
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN111459451A (zh) 2020-07-28
SG11202109201XA (en) 2021-11-29
KR20210124306A (ko) 2021-10-14
TW202138987A (zh) 2021-10-16
TWI759039B (zh) 2022-03-21
JP2022531056A (ja) 2022-07-06

Similar Documents

Publication Publication Date Title
WO2021169431A1 (zh) 交互方法、装置、电子设备以及存储介质
TWI766499B (zh) 互動物件的驅動方法、裝置、設備以及儲存媒體
EP3612878B1 (en) Multimodal task execution and text editing for a wearable system
WO2021196646A1 (zh) 交互对象的驱动方法、装置、设备以及存储介质
TWI760015B (zh) 互動物件的驅動方法、裝置、設備以及儲存媒體
WO2021196644A1 (zh) 交互对象的驱动方法、装置、设备以及存储介质
JP7193015B2 (ja) コミュニケーション支援プログラム、コミュニケーション支援方法、コミュニケーション支援システム、端末装置及び非言語表現プログラム
EP3142359A1 (en) Display device and video call performing method therefor
US10388325B1 (en) Non-disruptive NUI command
CN110162598B (zh) 一种数据处理方法和装置、一种用于数据处理的装置
WO2021232876A1 (zh) 实时驱动虚拟人的方法、装置、电子设备及介质
JP2024513640A (ja) 仮想対象のアクション処理方法およびその装置、コンピュータプログラム
WO2022222572A1 (zh) 交互对象的驱动方法、装置、设备以及存储介质
WO2021196647A1 (zh) 交互对象的驱动方法、装置、设备以及存储介质
TW202248994A (zh) 互動對象驅動和音素處理方法、設備以及儲存媒體
CN110166844B (zh) 一种数据处理方法和装置、一种用于数据处理的装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021549865

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217027681

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929643

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 521430719

Country of ref document: SA