TW202132967A

TW202132967A - Interaction methods, apparatuses thereof, electronic devices and computer readable storage media

Info

Publication number: TW202132967A
Application number: TW109145727A
Authority: TW
Inventors: 張子隆; 孫林; 路露
Original assignee: 大陸商北京市商湯科技開發有限公司
Priority date: 2020-02-27
Filing date: 2020-12-23
Publication date: 2021-09-01
Also published as: CN111541908A; WO2021169431A1; JP2022524944A; TWI778477B; KR20210110620A; SG11202109192QA

Abstract

The present disclosure provides interaction methods, apparatuses thereof, electronic devices and computer readable storage media. A method includes: receiving a first message from a client; obtaining, based on an instruction content included in the first message, driving data matching the instruction content; controlling a display interface of the client to play a response animation of an interactive object by using the driving data.

Description

互動方法、裝置、電子設備以及儲存媒體Interactive method, device, electronic equipment and storage medium

本公開涉及計算機技術領域，具體涉及一種互動方法、裝置、電子設備以及儲存媒體。The present disclosure relates to the field of computer technology, and in particular to an interactive method, device, electronic equipment, and storage medium.

隨著互聯網的快速發展，直播成為重要的資訊傳播方式。由於不同觀眾觀看網路直播的時間段不同，真人主播無法24小時進行直播以滿足不同觀眾的需求。利用數位人進行直播可以解決這一問題，然而，數位人主播與觀眾之間的互動技術有待研究和開發。With the rapid development of the Internet, live broadcasting has become an important means of information dissemination. Due to the different time periods for different viewers to watch the webcast, live broadcasters cannot live broadcast 24 hours a day to meet the needs of different viewers. The use of digital people for live broadcasting can solve this problem. However, the interactive technology between digital anchors and viewers needs to be researched and developed.

根據本公開的一方面，提供一種互動方法，所述方法包括：接收來自客戶端的第一消息；基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據；利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫。According to an aspect of the present disclosure, there is provided an interaction method, the method includes: receiving a first message from a client; based on the instruction content included in the first message, obtaining driving data matching the instruction content; using the The driving data controls the display interface of the client to play the response animation of the interactive object.

結合本公開提供的任一實施方式，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。With reference to any one of the embodiments provided in the present disclosure, the acquiring driving data matching the instruction content based on the instruction content included in the first message includes: acquiring response content for the instruction content, the response content A response text is included; based on at least one target text contained in the response text, a control parameter of a set action of an interactive object matching the target text is obtained.

結合本公開提供的任一實施方式，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。With reference to any one of the embodiments provided in the present disclosure, the acquiring driving data matching the instruction content based on the instruction content included in the first message includes: acquiring response content for the instruction content, the response content Including a phoneme sequence; obtaining control parameters of the interactive object matching the phoneme sequence.

結合本公開提供的任一實施方式，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取與所述音素序列匹配的互動物件的控制參數，包括：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。With reference to any one of the embodiments provided in the present disclosure, the control parameter of the interactive object includes at least one local area gesture control vector, and the acquiring the control parameter of the interactive object matching the phoneme sequence includes: Perform feature encoding to obtain a first coding sequence corresponding to the phoneme sequence; obtain a feature code corresponding to at least one phoneme according to the first coding sequence; obtain at least one partial region of the interactive object corresponding to the feature code Attitude control vector.

結合本公開提供的任一實施方式，所述方法還包括：向所述客戶端發送包括所述應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。With reference to any of the embodiments provided in the present disclosure, the method further includes: sending instruction information including the response content to the client, so that the client displays the response content based on the instruction information.

結合本公開提供的任一實施方式，所述利用所述驅動數據，控制所述客戶端在顯示介面中播放所述互動物件的回應動畫，包括：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。With reference to any one of the embodiments provided in the present disclosure, the using the driving data to control the client to play the response animation of the interactive object on the display interface includes: sending the driving data of the interactive object to the The client, so that the client generates a response animation according to driving data; controls the client to play the response animation on a display interface; or, based on the driving data, adjusts the virtual model parameters of the interactive object; After adjusting the virtual model parameters, a rendering engine is used to generate a response animation of the interactive object, and the response animation is sent to the client.

根據本公開的一方面，提供一種互動方法，所述方法包括：響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。According to an aspect of the present disclosure, there is provided an interactive method, the method includes: in response to a user input operation from a client, sending a first message including instruction content to a server; In response to the second message, the response animation of the interactive object is played on the display interface of the client.

結合本公開提供的任一實施方式，所述指示內容包括文本內容；所述方法還包括：在所述客戶端中顯示所述文本內容，和/或，播放所述文本內容對應的音訊文件。With reference to any one of the embodiments provided in the present disclosure, the instruction content includes text content; the method further includes: displaying the text content in the client, and/or playing an audio file corresponding to the text content.

結合本公開提供的任一實施方式，所述在所述客戶端中顯示所述文本內容，包括：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。With reference to any of the embodiments provided in the present disclosure, the displaying the text content in the client includes: generating barrage information of the text content; and displaying the barrage information on the display interface of the client .

結合本公開提供的任一實施方式，所述第二消息中包括針對所述指示內容的應答文本；所述方法還包括：在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。With reference to any of the embodiments provided in the present disclosure, the second message includes a response text for the instruction content; the method further includes: displaying the response text on the display interface of the client, and/or, Determine and play the audio file corresponding to the response text.

結合本公開提供的任一實施方式，所述第二消息中包括所述互動物件的驅動數據；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與所述應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的用於所述互動物件的設定動作的控制參數。With reference to any one of the embodiments provided in the present disclosure, the second message includes driving data of the interactive object; the second message based on the server's response to the first message is displayed on the client Playing the response animation of the interactive object in the interface includes: adjusting the virtual model parameters of the interactive object based on the driving data; using a rendering engine to generate the response animation of the interactive object based on the adjusted virtual model parameters, and Displayed on the display interface of the client; wherein, the drive data includes control parameters for the interactive object matching the phoneme sequence corresponding to the response text, and/or, and the response text included At least one target text matches the control parameter used for the setting action of the interactive object.

結合本公開提供的任一實施方式，所述第二消息包括所述互動物件對所述指示內容做出的回應動畫。With reference to any one of the embodiments provided in the present disclosure, the second message includes a response animation of the interactive object to the instruction content.

結合本公開提供的任一實施方式，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；響應於來自客戶端的使用者輸入操作，獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊；基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。With reference to any of the embodiments provided in the present disclosure, the user's input operation includes: the user follows the body operation screen displayed in the display interface to make a corresponding human body posture; responding to the user input operation from the client , Acquire a user behavior image including the human body posture; identify the human body posture information in the user behavior image; based on the human body posture information, drive the interactive object displayed on the display interface to respond.

結合本公開提供的任一實施方式，所述基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應，包括：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。With reference to any one of the embodiments provided in the present disclosure, the driving the interactive object displayed on the display interface to respond based on the human body posture information includes: determining the human body posture information and the human posture in the limb operation screen Matching degree; based on the matching degree, driving the interactive object displayed on the display interface to respond.

結合本公開提供的任一實施方式，所述基於所述匹配度，驅動所述互動物件進行回應，包括：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。With reference to any of the embodiments provided in the present disclosure, the driving the interactive object to respond based on the matching degree includes: in the case that the matching degree reaches a set condition, instructing the interactive object displayed on the display interface to do A first response, wherein the first response includes displaying a qualified body movement and/or voice prompt; and displaying the next body operation screen; in the case that the matching degree does not meet the set condition, instructing the display interface The displayed interactive object makes a second response, where the second response includes displaying the body motion and/or voice prompt that the posture is unqualified; and keeping the current body operation screen displayed.

根據本公開的一方面，提出一種互動裝置，所述裝置包括：接收單元，用於接收來自客戶端的第一消息；獲取單元，用於基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的；驅動單元，用於利用所述驅動數據，控制所述客戶端的顯示介面中播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interactive device is provided. The device includes: a receiving unit, configured to receive a first message from a client; Indicating that the content matches; the driving unit is used to use the driving data to control the response animation of the interactive object to be played on the display interface of the client.

結合本公開提供的任一實施方式，所述獲取單元具體用於：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。With reference to any of the embodiments provided in the present disclosure, the acquiring unit is specifically configured to: acquire response content for the instruction content, the response content including a response text; based on at least one target text contained in the response text, Obtain the control parameter of the setting action of the interactive object matching the target text.

結合本公開提供的任一實施方式，所述獲取單元用於，基於所述應答文本中所獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。With reference to any one of the embodiments provided in the present disclosure, the acquisition unit is configured to, based on the response content for the indication content acquired in the response text, the response content including a phoneme sequence; The control parameter of the interactive object.

結合本公開提供的任一實施方式，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取單元獲取與所述音素序列匹配的互動物件的第二控制參數時，用於：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。With reference to any one of the embodiments provided in the present disclosure, the control parameter of the interactive object includes at least one local area posture control vector, and when the acquiring unit acquires the second control parameter of the interactive object matching the phoneme sequence, it is used for : Perform feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; obtain a feature code corresponding to at least one phoneme according to the first coding sequence; obtain the interactive object corresponding to the feature code The attitude control vector of at least one local area.

結合本公開提供的任一實施方式，所述裝置還包括發送單元，用於向所述客戶端發送包括針對所述指示內容的應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。With reference to any one of the embodiments provided in the present disclosure, the device further includes a sending unit for sending instruction information including response content to the instruction content to the client, so that the client is based on the instruction information Display the response content.

結合本公開提供的任一實施方式，所述驅動單元用於：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的二維或三維虛擬模型參數；基於調整後的二維或三維虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。With reference to any one of the embodiments provided in the present disclosure, the driving unit is configured to: send driving data of the interactive object to the client, so that the client generates a response animation according to the driving data; and controls the client Play the response animation on the display interface; or, adjust the two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; use the rendering engine to generate the two-dimensional or three-dimensional virtual model parameters based on the adjusted two-dimensional or three-dimensional virtual model parameters. The response animation of the interactive object, and the response animation is sent to the client.

根據本公開的一方面，提出一種互動裝置，所述裝置包括：發送單元，用於響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；播放單元，用於基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interactive device is provided. The device includes: a sending unit for sending a first message including instruction content to a server in response to a user input operation from a client; and a playing unit for sending a message based on The second message that the server responds to the first message plays the response animation of the interactive object on the display interface of the client.

結合本公開提供的任一實施方式，所述指示內容包括文本內容；所述裝置還包括第一顯示單元，用於在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。With reference to any of the embodiments provided in the present disclosure, the instruction content includes text content; the device further includes a first display unit, configured to display the text content on the display interface of the client, and/or determine and The audio file corresponding to the text content is played.

結合本公開提供的任一實施方式，所述第一顯示單元在用於在所述客戶端中顯示所述文本內容時，具體用於：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。With reference to any of the embodiments provided in the present disclosure, when the first display unit is used to display the text content in the client, it is specifically used to: generate barrage information of the text content; The barrage information is displayed on the display interface of the terminal.

結合本公開提供的任一實施方式，所述第二消息中包括針對所述指示內容的應答文本；所述裝置還包括第二顯示單元，用於在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。With reference to any one of the embodiments provided in the present disclosure, the second message includes a response text for the instruction content; the device further includes a second display unit for displaying the response on the display interface of the client Text, and/or, determine and play the audio file corresponding to the response text.

結合本公開提供的任一實施方式，所述第二消息中包括所述互動物件的驅動數據；所述播放單元用於：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與針對所述指示內容的應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數。With reference to any one of the embodiments provided in the present disclosure, the second message includes driving data of the interactive object; the playback unit is configured to: adjust the virtual model parameters of the interactive object based on the driving data; After the virtual model parameters, the rendering engine is used to generate the response animation of the interactive object and displayed on the display interface of the client; wherein, the driving data includes the phoneme sequence matching corresponding to the response text for the instruction content The control parameter for the interactive object, and/or the control parameter for the setting action of the interactive object that matches at least one target text contained in the response text.

結合本公開提供的任一實施方式，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；所述生成單元用於：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。With reference to any one of the embodiments provided in the present disclosure, the user’s input operation includes: the user follows the body operation screen displayed in the display interface to make a corresponding human body posture; the generating unit is used to: The user behavior image of the human body posture; recognize the human body posture information in the user behavior image, and drive the interactive object displayed on the display interface to respond based on the human body posture information.

結合本公開提供的任一實施方式，所述生成單元具體用於：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。With reference to any one of the embodiments provided in the present disclosure, the generating unit is specifically configured to: determine the degree of matching between the human body posture information and the human posture in the limb operation screen; based on the matching degree, drive the display interface to display Responds with interactive objects.

結合本公開提供的任一實施方式，所述生成單元具體用於：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。With reference to any of the embodiments provided in the present disclosure, the generating unit is specifically configured to: in the case where the matching degree reaches a set condition, instruct the interactive object displayed on the display interface to make a first response, wherein the first The response includes displaying a qualified body movement and/or voice prompt; and displaying the next body operation screen; in the case that the matching degree does not meet the set condition, instructing the interactive object displayed on the display interface to make a second response, Wherein, the second response includes displaying the unqualified body movements and/or voice prompts; and keeping the current body operation screen displayed.

根據本公開的一方面，提出一種電子設備，所述設備包括記憶體、處理器，所述記憶體用於儲存在處理器上可運行的計算機指令，所述處理器用於在執行所述計算機指令時實現本公開任一實施方式所提出的互動方法。According to one aspect of the present disclosure, an electronic device is provided. The device includes a memory and a processor. The memory is used to store computer instructions executable on the processor, and the processor is used to execute the computer instructions. When realizing the interactive method proposed in any embodiment of the present disclosure.

根據本公開的一方面，提出一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器執行時實現本公開任一實施方式所提出的互動方法。According to one aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and the program is executed by a processor to implement the interactive method proposed in any embodiment of the present disclosure.

這裡將詳細地對範例性實施例進行說明，其範例表示在附圖中。下面的描述涉及附圖時，除非另有表示，不同附圖中的相同數位表示相同或相似的要素。以下範例性實施例中所描述的實施方式並不代表與本公開相一致的所有實施方式。相反，它們僅是與如所附請求項中所詳述的、本公開的一些方面相一致的裝置和方法的例子。Here, exemplary embodiments will be described in detail, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same digits in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. On the contrary, they are only examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

本文中術語“和/或”，僅僅是一種描述關聯物件的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only an association relationship describing related objects, which means that there can be three relationships. For example, A and/or B can mean: A alone exists, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, and may mean including those made from A, B, and C Any one or more elements selected in the set.

利用數位人做主播，可以在任意時段進行直播，並且可以實現24小時不間斷直播，滿足了不同觀眾對於觀看直播的時間的不同需求。數位人作為直播過程中使用者的互動物件，如何對於使用者所提出的問題及時地進行反饋，以及如何與使用者進行生動、自然的互動，是亟需解決的問題。By using several people as anchors, live broadcasts can be carried out at any time, and 24-hour uninterrupted live broadcasts can be realized, which meets the different needs of different audiences for the time to watch the live broadcast. As the interactive objects of the users in the live broadcast process, how to provide timely feedback to the problems raised by the users and how to interact with the users vividly and naturally are problems that need to be solved urgently.

有鑑於此，本公開提出供了一種互動方案，所述互動方案可應用於網路直播等任何涉及與虛擬的互動物件進行互動的場景。In view of this, the present disclosure proposes an interactive solution, which can be applied to any scene involving interaction with virtual interactive objects such as webcast.

本公開實施例所提出的互動方法可應用於終端設備或者伺服器，終端設備例如可以是安裝有客戶端的電子設備，如手機、平板電腦等，本公開對於終端設備的形式並不限定。客戶端例如為視訊直播客戶端，包括直播視訊客戶端、體感互動客戶端等等。伺服器可以為任一能夠提供互動物件的處理能力的伺服器。The interaction method proposed in the embodiments of the present disclosure can be applied to a terminal device or a server. The terminal device may be, for example, an electronic device with a client installed, such as a mobile phone, a tablet computer, etc. The present disclosure does not limit the form of the terminal device. The client is, for example, a live video client, including a live video client, a motion sensing interactive client, and so on. The server can be any server that can provide processing capabilities of interactive objects.

互動物件可以是任意一種能夠與使用者進行互動的互動物件，其可以是虛擬人物，還可以是虛擬動物、虛擬物品、卡通形象等等其他能夠實現互動功能的虛擬形象，互動物件可以基於二維虛擬模型來構建，也可以基於三維虛擬模型來構建，互動物件通過對二維或三維虛擬模型進行渲染得到。所述使用者可以是真人，也可以是機器人，還可以是其他智能設備。所述互動物件和所述使用者之間的互動方式可以是主動互動方式，也可以是被動互動方式。The interactive object can be any interactive object that can interact with the user. It can be a virtual character, virtual animal, virtual object, cartoon image, etc. other virtual images that can achieve interactive functions. The interactive object can be based on two-dimensional The virtual model can also be constructed based on the three-dimensional virtual model, and the interactive object can be obtained by rendering the two-dimensional or three-dimensional virtual model. The user can be a real person, a robot, or other smart devices. The interaction between the interactive object and the user may be an active interaction or a passive interaction.

範例性的，在視訊直播場景下，客戶端的顯示介面中可顯示互動物件的動畫，使用者可以在終端設備的客戶端中執行輸入操作，比如輸入文本、輸入語音、動作觸發、按鍵觸發等操作，來實現與互動物件的互動。Exemplarily, in the live video scene, the display interface of the client can display the animation of the interactive object, and the user can perform input operations in the client of the terminal device, such as inputting text, inputting voice, triggering actions, and triggering keystrokes. , To achieve interaction with interactive objects.

圖1繪示根據本公開至少一個實施例的互動方法的流程圖，該互動方法可以應用於伺服器端。如圖1所示，所述方法包括步驟101~步驟103。FIG. 1 shows a flowchart of an interactive method according to at least one embodiment of the present disclosure, and the interactive method can be applied to the server side. As shown in FIG. 1, the method includes step 101 to step 103.

在步驟101中，接收來自客戶端的第一消息。In step 101, the first message from the client is received.

範例性的，所述第一消息中攜帶的指示內容可以包括所述使用者通過客戶端執行輸入操作所輸入的資訊，使用者的輸入操作包括輸入文本操作、輸入語音操作、動作觸發操作、按鍵觸發操作等等。輸入的資訊可以由客戶端發送至伺服器；或者在客戶端向伺服器發送輸入的資訊時，該輸入的資訊可以直接在所述客戶端進行顯示。所述第一消息中攜帶的指示內容的形式包括但不限於文本、語音、圖像（例如表情、動作圖像）、視訊等等。所述第一消息的具體形式與應用場景相關。例如，在視訊直播場景下，所述客戶端可以是支持觀看視訊直播功能的客戶端，所述第一消息可以在客戶端採集到使用者在顯示介面輸入文本內容後發送出去，第一消息攜帶的指示內容例如為輸入的文本內容，且該指示內容可以通過彈幕的形式顯示在顯示介面中；又例如，在體感互動場景下，所述第一消息可以在客戶端採集到使用者行為圖像後發送出去，第一消息攜帶的指示內容例如為採集的使用者行為圖像。當然，具體實施中本公開對第一消息的發送機制以及第一消息中攜帶的指示內容的形式並不進行限制。Exemplarily, the instruction content carried in the first message may include information input by the user through an input operation performed by the client. The input operation of the user includes a text input operation, a voice input operation, an action trigger operation, and a button. Trigger actions and so on. The input information can be sent by the client to the server; or when the client sends the input information to the server, the input information can be directly displayed on the client. The form of the instruction content carried in the first message includes, but is not limited to, text, voice, image (for example, emoticon, action image), video, and so on. The specific form of the first message is related to the application scenario. For example, in a live video broadcast scenario, the client may be a client that supports the function of watching live video, the first message may be sent after the client has collected the text content entered by the user on the display interface, and the first message carries The instruction content of is, for example, the input text content, and the instruction content can be displayed in the display interface in the form of a barrage; for another example, in a somatosensory interactive scene, the first message can be collected on the client side. After the image is sent out, the instruction content carried in the first message is, for example, a collected user behavior image. Of course, in specific implementation, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message.

在步驟102中，基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據。In step 102, based on the instruction content included in the first message, drive data matching the instruction content is obtained.

範例性的，所述驅動數據包括聲音驅動數據、表情驅動數據、動作驅動數據中的一項或多項。一種實施方式中，所述驅動數據可以是預先儲存在伺服器或者其他關聯的業務伺服器中的，在接收到來自客戶端的第一消息後，可以根據所述指示內容在所述伺服器或其他關聯的業務伺服器中進行檢索，以獲得與所述指示內容匹配的驅動數據。另一種實施方式中，所述驅動數據可以是根據所述指示內容生成的，比如通過將所述指示內容輸入到預先訓練好的深度學習模型中，以預測得到與該指示內容對應的驅動數據。Exemplarily, the driving data includes one or more of sound driving data, expression driving data, and motion driving data. In one embodiment, the drive data may be pre-stored in the server or other related service servers. After receiving the first message from the client, the drive data may be stored on the server or other servers according to the instruction content. Perform a search in the associated business server to obtain driving data that matches the instruction content. In another embodiment, the driving data may be generated according to the instruction content, for example, by inputting the instruction content into a pre-trained deep learning model to predict and obtain the driving data corresponding to the instruction content.

在步驟103中，利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫。In step 103, the driving data is used to control the display interface of the client to play the response animation of the interactive object.

在本公開實施例中，所述互動物件為對虛擬模型諸如二維或三維虛擬模型渲染得到的。所述虛擬模型可以是自定義生成的，也可以對一角色的圖像或視訊進行轉換而得到的。本公開實施例對於虛擬模型的生成方式不進行限制。In the embodiment of the present disclosure, the interactive object is obtained by rendering a virtual model such as a two-dimensional or three-dimensional virtual model. The virtual model can be self-defined and can also be obtained by converting an image or video of a character. The embodiment of the present disclosure does not limit the generation method of the virtual model.

所述回應動畫可以根據所述驅動數據生成，通過控制客戶端的顯示介面，例如視訊直播介面，播放所述互動物件的回應動畫，能夠顯示所述互動物件對於來自客戶端的第一消息的回應，該回應包括輸出一段語言，和/或做出一些動作、表情等等。The response animation can be generated according to the drive data, and by controlling the client's display interface, such as a live video interface, to play the response animation of the interactive object, the response of the interactive object to the first message from the client can be displayed. The response includes outputting a paragraph of language, and/or making some actions, expressions, etc.

在本公開實施例中，伺服器接收來自客戶端的第一消息，並根據所述第一消息所包含的指示內容來獲取匹配的驅動數據，並利用所述驅動數據來控制客戶端的顯示介面中播放所述互動物件的回應動畫，顯示互動物件的回應，使互動物件可以對於使用者的指示內容進行及時反饋，實現與使用者的及時互動。In the embodiment of the present disclosure, the server receives the first message from the client, and obtains matching driving data according to the instruction content contained in the first message, and uses the driving data to control the playback on the display interface of the client The response animation of the interactive object displays the response of the interactive object, so that the interactive object can provide timely feedback to the user's instruction content and realize timely interaction with the user.

圖2為本公開至少一個實施例所提出的互動方法應用於直播過程的範例性說明。如圖2所示，所述互動物件為具有醫生形象的三維虛擬人物。在客戶端的顯示介面中可顯示所述三維虛擬人物作為主播進行直播的過程，客戶端上的使用者可以通過在顯示介面中執行輸入指示內容，以發送攜帶指示內容的第一消息，相應地，伺服器在接收來自客戶端的第一消息後，可以識別到指示內容，比如為“如何洗手”，進而可根據該指示內容獲取匹配的驅動數據，根據所述驅動數據，可以控制所述客戶端顯示該三維虛擬人物對於“如何洗手”這一指示內容的回應。例如，控制該三維虛擬人物輸出與“如何洗手”相對應的語音，並同時做出與輸出的語音相匹配的動作和/或表情。FIG. 2 is an exemplary illustration of the application of the interactive method proposed in at least one embodiment of the present disclosure to a live broadcast process. As shown in Figure 2, the interactive object is a three-dimensional virtual character with the image of a doctor. The display interface of the client can display the three-dimensional virtual character as the host for live broadcast. The user on the client can execute the input instruction content in the display interface to send the first message carrying the instruction content. Accordingly, After receiving the first message from the client, the server can recognize the instruction content, such as "how to wash hands", and then can obtain matching drive data according to the instruction content, and control the client to display according to the drive data. The three-dimensional virtual character's response to the instruction "how to wash hands". For example, the three-dimensional virtual character is controlled to output a voice corresponding to "how to wash hands", and at the same time make an action and/or expression matching the output voice.

在一些實施例中，所述指示內容包括文本內容。可以根據如下方式獲取針對指示內容的應答內容：基於自然語言處理（Natural Language Processing，NLP）算法，識別所述文本內容所表達的語言意圖，並獲取與所述語言意圖匹配的應答內容。In some embodiments, the instruction content includes text content. The response content for the instruction content may be obtained in the following manner: Based on a natural language processing (NLP) algorithm, the linguistic intention expressed by the text content is recognized, and the response content matching the linguistic intention is obtained.

在一些實施例中，可以利用預先訓練的用於自然語言處理的神經網路模型對所述文本內容進行處理，例如卷積神經網路（Convolutional Neural Networks，CNN）、循環神經網路（Recurrent Neural Network，RNN）、長短期記憶網路（Long Short Term Memory network，LTSM）等等。通過將所述第一消息包括的文本內容輸入至所上述神經網路模型，通過對文本內容所表徵的語言意圖進行分類，從而確定所述文本內容所表達的語言意圖類別。In some embodiments, a pre-trained neural network model for natural language processing may be used to process the text content, such as convolutional neural networks (Convolutional Neural Networks, CNN), and Recurrent Neural Networks (Recurrent Neural Networks). Network, RNN), Long Short Term Memory network (LTSM), etc. By inputting the text content included in the first message into the aforementioned neural network model, and categorizing the linguistic intention represented by the text content, the linguistic intention category expressed by the text content is determined.

由於第一消息所包括的文本內容可能包含了多層的含義，通過利用自然語言處理算法，可以識別出使用者實際想表達的意圖，從而能夠直接反饋所述使用者真正想獲取的內容，提升了使用者的互動體驗。Since the text content included in the first message may contain multiple meanings, by using natural language processing algorithms, it is possible to identify the user's actual intention to express, so as to directly feed back the content that the user really wants to obtain, which improves The user’s interactive experience.

在一些實施例中，可以根據所述語言意圖，從預設的數據庫中查找與所述語言意圖匹配的、符合所述語言意圖的應答內容，進一步地，伺服器可以基於所述應答內容，生成用於使所述互動物件表達所述應答內容的驅動數據。其中，所述數據庫可以部署在伺服器中，也可以部署在雲端，本公開對此不進行限制。In some embodiments, the response content that matches the language intention and conforms to the language intention may be searched from a preset database according to the language intention. Further, the server may generate response content based on the response content. The driving data used to make the interactive object express the response content. Wherein, the database can be deployed in a server or in the cloud, which is not limited in the present disclosure.

在識別出語言意圖的情況下，伺服器可以從所述文本內容中提取與所述語言意圖相關的參數，也即實體。例如可以通過系統分詞、資訊抽取等方式確定實體。在所述語言意圖分類所對應的數據中，通過實體可以進一步確定符合所述語言意圖的應答文本。本領域技術人員應當理解，以上方式僅用於範例，也可以利用其他方式獲得與所述語言意圖匹配的應答文本，本公開對此不進行限制。In the case of recognizing the linguistic intention, the server may extract the parameters related to the linguistic intention, that is, the entity, from the text content. For example, the entity can be determined by means of systematic word segmentation and information extraction. In the data corresponding to the linguistic intention classification, the entity can further determine the response text that meets the linguistic intention. Those skilled in the art should understand that the above method is only used as an example, and other methods may also be used to obtain the response text matching the language intention, which is not limited in the present disclosure.

在一些實施例中，伺服器可以根據所述應答內容生成語音驅動數據，所述語音驅動數據例如包括所述應答內容所包含的應答文本對應的音素序列。通過生成所述音素序列對應的語音，並控制所述客戶端輸出所述語音，可以使所述互動物件輸出表達所述應答文本所表徵的內容的語音。In some embodiments, the server may generate voice-driven data according to the response content, and the voice-driven data includes, for example, a phoneme sequence corresponding to the response text contained in the response content. By generating the voice corresponding to the phoneme sequence and controlling the client to output the voice, the interactive object can be made to output the voice that expresses the content represented by the response text.

在一些實施例中，伺服器可以根據所述應答內容生成動作驅動數據，以使所述互動物件做出表達所述應答內容的動作。In some embodiments, the server may generate action driving data according to the response content, so that the interactive object can perform an action expressing the response content.

在一個範例中，在應答內容包括應答文本的情況下，可以利用以下方式根據所述應答內容生成動作驅動數據：基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。In an example, when the response content includes response text, the action-driven data can be generated according to the response content in the following manner: based on at least one target text contained in the response text, obtaining a match with the target text The control parameters of the set action of the interactive object.

所述目標文本可以是設置的關鍵字、關鍵詞、關鍵句等等。以關鍵詞為“洗手”為例，在所述應答文本中包含了“洗手”的情況下，則可以確定應答文本中包含了目標文本。可以預先為每一個目標文本設置匹配的設定動作，而每個設定動作可以通過一組控制參數序列來實現，例如多個骨骼點的位移形成一組控制參數，利用多組控制參數形成的控制參數序列來調整所述互動物件的模型參數，可以使互動物件做出所述設定動作。The target text may be set keywords, keywords, key sentences, etc. Taking the keyword "washing hands" as an example, if the response text includes "washing hands", it can be determined that the target text is included in the response text. Matching setting actions can be set for each target text in advance, and each setting action can be realized by a set of control parameter sequences, for example, the displacement of multiple bone points forms a set of control parameters, and the control parameters formed by multiple sets of control parameters The sequence is used to adjust the model parameters of the interactive object, so that the interactive object can perform the setting action.

在本公開實施例中，通過使互動物件以動作的形式來對第一消息進行回應，使使用者能夠獲得對於第一消息的直觀、生動的回應，提升了使用者的互動體驗。In the embodiments of the present disclosure, by making the interactive object respond to the first message in the form of an action, the user can obtain an intuitive and vivid response to the first message, which improves the user's interactive experience.

在一些實施例中，可以確定所述目標文本對應的語音資訊；獲取輸出所述語音資訊的時間資訊；根據所述時間資訊確定所述目標文本對應的設定動作的執行時間；根據所述執行時間，以所述目標文本對應的控制參數控制所述互動物件執行所述設定動作。In some embodiments, the voice information corresponding to the target text may be determined; the time information for outputting the voice information may be obtained; the execution time of the set action corresponding to the target text may be determined according to the time information; according to the execution time , Using the control parameter corresponding to the target text to control the interactive object to perform the setting action.

在根據所述應答文本對應的音素序列控制所述客戶端輸出語音的情況下，可以確定輸出所述目標文本所對應的語音的時間資訊，例如開始輸出所述目標文本對應的語音的時間、結束輸出的時間以及持續時間。可以根據所述時間資訊確定所述目標文本對應的設定動作的執行時間，在所述執行時內，或者在執行時間的一定範圍內，以所述目標文本對應的控制參數控制所述互動物件執行所述設定動作。In the case that the client is controlled to output the voice according to the phoneme sequence corresponding to the response text, the time information of the voice corresponding to the target text can be determined, such as the time when the voice corresponding to the target text starts to be output, and the end Output time and duration. The execution time of the setting action corresponding to the target text can be determined according to the time information, within the execution time, or within a certain range of the execution time, the control parameters corresponding to the target text are used to control the execution of the interactive object The setting action.

在本公開實施例中，對於每個目標文本，輸出對應的語音的持續時間，與根據對應的控制參數控制動作的持續時間，是一致的或者相近的，以使互動物件輸出目標文本所對應的語音與進行動作的時間是匹配的，從而使互動物件的語音和動作同步、協調，使使用者產生所述互動物件在直播過程中做出回應的感覺，提高了使用者在直播過程中與主播進行互動的體驗。In the embodiment of the present disclosure, for each target text, the duration of outputting the corresponding voice is the same or similar to the duration of controlling the action according to the corresponding control parameter, so that the interactive object outputs the corresponding target text The voice and the time of the action are matched, so that the voice and action of the interactive object are synchronized and coordinated, so that the user has the feeling that the interactive object responds during the live broadcast process, and improves the user’s interaction with the host during the live broadcast. Have an interactive experience.

在一些實施例中，可以根據所述應答文本生成姿態驅動數據，以使所述客戶端顯繪示與應答文本對應的語音相匹配的所述互動物件的姿態，例如做出相應的表情和動作。In some embodiments, gesture-driven data may be generated based on the response text, so that the client can display the gesture of the interactive object that matches the voice corresponding to the response text, for example, make corresponding expressions and actions .

在一個範例中，應答內容還可以包括音素序列，或者，在應答內容包括應答文本的情況下，也可以提取應答文本對應的音素序列，在獲取到包括音素序列的應答內容後，可以獲取與所述音素序列匹配的用於所述互動物件的控制參數。其中，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取與所述音素序列匹配的用於互動物件的控制參數，包括：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。In an example, the response content may also include a phoneme sequence, or, when the response content includes a response text, the phoneme sequence corresponding to the response text may also be extracted. After the response content including the phoneme sequence is obtained, the corresponding phoneme sequence can be obtained The control parameters for the interactive object matched by the phoneme sequence. Wherein, the control parameter of the interactive object includes a posture control vector of at least one local area, and the obtaining the control parameter matching the phoneme sequence for the interactive object includes: performing feature encoding on the phoneme sequence to obtain all A first coding sequence corresponding to the phoneme sequence; obtaining a feature code corresponding to at least one phoneme according to the first coding sequence; obtaining a posture control vector of at least one partial region of the interactive object corresponding to the feature code.

在一些實施例中，通過控制客戶端在播放所述應答文本對應的語音並使客戶端顯繪示所述語音相匹配的所述互動物件的姿態的回應動畫，使得所述互動物件的回應更加擬人化，更加生動、自然，提升了使用者的互動體驗。In some embodiments, by controlling the client to play the voice corresponding to the response text and causing the client to display the response animation of the gesture of the interactive object that matches the voice, the response of the interactive object is more improved. Personification is more vivid and natural, which enhances the user’s interactive experience.

在所述互動物件的控制參數包括至少一個局部區域的姿態控制向量的實施例中，可以通過以下方式獲得姿態控制向量。In an embodiment where the control parameter of the interactive object includes the attitude control vector of at least one local area, the attitude control vector can be obtained in the following manner.

首先，對所述應答文本對應的音素序列進行特徵編碼，獲得所述音素序列對應的編碼序列。此處，為了與後續提到的編碼序列進行區分，將所述文本數據的音素序列對應的編碼序列稱為第一編碼序列。First, feature encoding is performed on the phoneme sequence corresponding to the response text to obtain the encoding sequence corresponding to the phoneme sequence. Here, in order to distinguish it from the coding sequence mentioned later, the coding sequence corresponding to the phoneme sequence of the text data is referred to as the first coding sequence.

針對所述音素序列包含的多種音素，生成每種音素對應的子編碼序列。For multiple phonemes included in the phoneme sequence, a sub-coding sequence corresponding to each phoneme is generated.

在一個範例中，檢測各時間點上是否對應有第一音素，所述第一音素為所述多個音素中的任一種；將有所述第一音素對應的時間點上的編碼值設置為第一數值，將沒有所述第一音素的時間點上的編碼值設置為第二數值，在對各個時間點上的編碼值進行賦值之後可得到第一音素對應的子編碼序列。例如，可以在有所述第一音素的時間點上的編碼值設置為1，在沒有所述第一音素的時間上的編碼值為0。本領域技術人員應當理解，上述編碼值的設置僅為範例，也可以將編碼值設置為其他值，本公開對此不進行限制。In an example, it is detected whether there is a first phoneme corresponding to each time point, and the first phoneme is any one of the plurality of phonemes; the encoding value at the time point corresponding to the first phoneme is set to For the first value, the code value at the time point without the first phoneme is set to the second value, and the sub-coding sequence corresponding to the first phoneme can be obtained after assigning the code value at each time point. For example, the coding value at the time when the first phoneme is present may be set to 1, and the coding value at the time when the first phoneme is not present is 0. Those skilled in the art should understand that the above setting of the encoding value is only an example, and the encoding value can also be set to other values, which is not limited in the present disclosure.

之後，根據所述多種音素分別對應的子編碼序列，獲得所述音素序列對應的第一編碼序列。Afterwards, the first coding sequence corresponding to the phoneme sequence is obtained according to the sub coding sequences respectively corresponding to the multiple phonemes.

在一個範例中，對於第一音素對應的子編碼序列，可利用高斯濾波器對所述第一音素在時間上的連續值進行高斯卷積操作，以對特徵編碼所對應的矩陣進行濾波，平滑每一個音素轉換時，嘴部區域過渡的動作。In an example, for the sub-coding sequence corresponding to the first phoneme, a Gaussian filter may be used to perform a Gaussian convolution operation on the continuous values of the first phoneme in time, so as to filter and smooth the matrix corresponding to the feature encoding. The transition of the mouth area when each phoneme is converted.

圖3繪示了本公開至少一個實施例提出的獲得姿態控制向量的方法流程圖。如圖3所示，音素序列310含音素j、i1、j、ie4（為簡潔起見，只繪示部分音素），針對每種音素j、i1、ie4分別獲得與上述各音素分別對應的子編碼序列321、322、323。在各個子編碼序列中，在有所述音素的時間（圖3中以秒(s)為時間單位）上對應的編碼值為第一數值（例如為1），在沒有所述音素的時間（圖3中以秒(s)為時間單位）上對應的編碼值為第二數值（例如為0）。以子編碼序列321為例，在音素序列310中有音素j的時間上，子編碼序列321的值為第一數值，在沒有音素j的時間上，子編碼序列321的值為第二數值。所有子編碼序列構成第一編碼序列320。Fig. 3 shows a flowchart of a method for obtaining attitude control vectors proposed by at least one embodiment of the present disclosure. As shown in Figure 3, the phoneme sequence 310 contains phonemes j, i1, j, and ie4 (for brevity, only some of the phonemes are shown). For each phoneme j, i1, and ie4, the sub-phones corresponding to the above-mentioned phonemes are obtained. Encoding sequence 321, 322, 323. In each sub-coding sequence, the corresponding code value at the time when there is the phoneme (in Figure 3, the time unit is seconds (s)), the corresponding code value is the first value (for example, 1), and at the time when there is no phoneme ( In Fig. 3, the corresponding code value is the second value (for example, 0) with the second (s) as the time unit). Taking the sub-coding sequence 321 as an example, at the time when there is phoneme j in the phoneme sequence 310, the value of the sub-coding sequence 321 is the first value, and at the time when there is no phoneme j, the value of the sub-coding sequence 321 is the second value. All the sub-coding sequences constitute the first coding sequence 320.

接下來，根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼。Next, according to the first coding sequence, a feature code corresponding to at least one phoneme is obtained.

根據音素j、i1、ie4分別對應的子編碼序列321、322、323的編碼值，以及該三個子編碼序列中對應的音素的持續時間，也即在子編碼序列321中j的持續時間、在子編碼序列322中i1的持續時間、在子編碼序列323中ie4的持續時間，可以獲得子編碼序列321、322、323的特徵資訊。According to the encoding values of the sub-coding sequences 321, 322, and 323 corresponding to phonemes j, i1, and ie4, and the duration of the corresponding phonemes in the three sub-coding sequences, that is, the duration of j in the sub-coding sequence 321, From the duration of i1 in the sub-coding sequence 322 and the duration of ie4 in the sub-coding sequence 323, the characteristic information of the sub-coding sequences 321, 322, and 323 can be obtained.

在一個範例中，可以利用高斯濾波器對分別子編碼序列321、322、323中的音素j、i1、ie4在時間上的連續值進行高斯卷積操作，以對特徵編碼進行平滑，得到平滑後的第一編碼序列330。也即，通過高斯濾波器對於音素的0-1的時間上的連續值進行高斯卷積操作，使得各個編碼序列中編碼值從第二數值到第一數值或者從第一數值到第二數值的變化階段變得平滑。例如，編碼序列的值除了0和1也呈現出中間狀態的值，例如0.2、0.3等等，而根據這些中間狀態的值所獲取的姿態控制向量，使得互動人物的動作過度、表情變化更加平緩、自然，提高了目標物件的互動體驗。In an example, a Gaussian filter can be used to perform a Gaussian convolution operation on the consecutive values of phonemes j, i1, and ie4 in the respective sub-encoding sequences 321, 322, and 323 to smooth the feature encoding and obtain the smoothed的第一coding sequence 330. That is, the Gaussian convolution operation is performed on the time continuous values of the phoneme from 0 to 1 through the Gaussian filter, so that the coded value in each code sequence is from the second value to the first value or from the first value to the second value. The change phase becomes smooth. For example, in addition to 0 and 1, the values of the coding sequence also present intermediate state values, such as 0.2, 0.3, etc., and the posture control vector obtained according to the values of these intermediate states makes the interactive characters excessively move and change their expressions more smoothly , Naturally, improve the interactive experience of the target object.

在一些實施例中，可以通過在所述第一編碼序列上進行滑動視窗的方式獲取至少一個音素對應的特徵編碼。其中，所述第一編碼序列可以是經過高斯卷積操作後的編碼序列。In some embodiments, the feature code corresponding to at least one phoneme may be obtained by performing a sliding window on the first code sequence. Wherein, the first coding sequence may be a coding sequence after a Gaussian convolution operation.

以設定長度的時間視窗和設定步長，對所述編碼序列進行滑動視窗，將所述時間視窗內的特徵編碼作為所對應的至少一個音素的特徵編碼，在完成滑動視窗後，根據得到的多個特徵編碼，則可以獲得第二編碼序列。如圖3所示，通過在第一編碼序列320或者平滑後的第一編碼序列330上，滑動設定長度的時間視窗，分別獲得特徵編碼1、特徵編碼2、特徵編碼3，以此類推，在遍歷第一編碼序列後，獲得特徵編碼1、2、3、…、M，從而得到了第二編碼序列340。其中，M為正整數，其數值根據第一編碼序列的長度、時間視窗的長度以及時間視窗滑動的步長確定。A sliding window is performed on the coding sequence with a time window of a set length and a set step size, and the feature code in the time window is used as the feature code of the corresponding at least one phoneme. After the sliding window is completed, according to the obtained multiple A feature code, then a second code sequence can be obtained. As shown in FIG. 3, by sliding a time window of a set length on the first coding sequence 320 or the smoothed first coding sequence 330, feature code 1, feature code 2, feature code 3 are obtained respectively, and so on, in After traversing the first coding sequence, feature codes 1, 2, 3,..., M are obtained, thereby obtaining the second coding sequence 340. Wherein, M is a positive integer, and its value is determined according to the length of the first coding sequence, the length of the time window, and the sliding step of the time window.

根據特徵編碼1、2、3、…、M，分別可以獲得相應的姿態控制向量1、2、3、…、M，從而獲得姿態控制向量的序列350。According to the feature codes 1, 2, 3,..., M, the corresponding attitude control vectors 1, 2, 3,..., M can be obtained respectively, so as to obtain the sequence 350 of attitude control vectors.

姿態控制向量的序列350與第二編碼序列340在時間上是對齊的，由於所述第二編碼序列中的每個編碼特徵是根據音素序列中的至少一個音素獲得的，因此姿態控制向量的序列350中的每個特徵向量同樣是根據音素序列中的至少一個音素獲得的。在播放文本數據所對應的音素序列的同時，根據所述姿態控制向量的序列驅動所述互動物件做出動作，即能夠實現驅動互動物件發出文本內容所對應的聲音的同時，做出與聲音同步的動作，給目標物件以所述互動物件正在說話的感覺，提升了目標物件的互動體驗。The sequence of attitude control vectors 350 and the second encoding sequence 340 are aligned in time. Since each encoding feature in the second encoding sequence is obtained according to at least one phoneme in the phoneme sequence, the sequence of attitude control vectors Each feature vector in 350 is also obtained from at least one phoneme in the phoneme sequence. While playing the phoneme sequence corresponding to the text data, the interactive object is driven to make an action according to the sequence of the attitude control vector, that is, the interactive object can be driven to emit the sound corresponding to the text content, and it is synchronized with the sound. The action of, gives the target object the feeling that the interactive object is speaking, and enhances the interactive experience of the target object.

假設在第一個時間視窗的設定時刻開始輸出編碼特徵，可以將在所述設定時刻之前的姿態控制向量設置為默認值，也即在剛開始播放音素序列時，使所述互動物件做出默認的動作，在所述設定時刻之後開始利用根據第一編碼序列所得到的姿態控制向量的序列驅動所述互動物件做出動作。以圖3為例，在t0時刻開始輸出編碼特徵1，在t0時刻之前對應的是默認姿態控制向量。Assuming that the encoding feature starts to be output at the set time of the first time window, the attitude control vector before the set time can be set to the default value, that is, when the phoneme sequence is just started to be played, the interactive object is made to default After the set time, start to use the sequence of the attitude control vector obtained according to the first coding sequence to drive the interactive object to make an action. Taking Fig. 3 as an example, the encoding feature 1 starts to be output at time t0, and before time t0 corresponds to the default attitude control vector.

在一些實施例中，在所述音素序列中音素之間的時間間隔大於設定閾值的情況下，根據所述局部區域的設定姿態控制向量，驅動所述互動物件做出動作。也即，在互動人物說話停頓較長的時候，則驅動互動物件做出設定的動作。例如，在輸出的聲音停頓較大時，可以使互動人物做出微笑的表情，或者做出身體微微的擺動，以避免在停頓較長時互動人物面無表情地直立，使得互動物件說話的過程自然、流暢，提高目標物件的互動感受。In some embodiments, when the time interval between phonemes in the phoneme sequence is greater than a set threshold, the interactive object is driven to perform an action according to the set attitude control vector of the local area. That is, when the interactive character pauses for a long time, the interactive object is driven to make a set action. For example, when the output sound has a large pause, the interactive character can be made to make a smiling expression, or make a slight body swing, so as to avoid the interactive character standing upright without expression when the pause is long, making the interactive object speak. Natural and smooth, improve the interactive experience of the target object.

在一些實施例中，對於所述應答文本中所包含的至少一個目標文本，獲取與所述至少一個目標文本匹配的用於互動物件的設定動作的控制參數，來驅動所述互動物件執行所述設定動作；對於所述至少一個目標文本以外的應答內容，可以根據所述應答內容所對應的音素來獲取所述互動物件的控制參數，從而驅動所述互動物件做出與所述應答內容的發音相匹配的姿態，例如表情和動作。In some embodiments, for at least one target text contained in the response text, a control parameter matching the at least one target text for a setting action of an interactive object is obtained to drive the interactive object to execute the Setting actions; for the response content other than the at least one target text, the control parameters of the interactive object can be obtained according to the phoneme corresponding to the response content, so as to drive the interactive object to pronounce with the response content Matching gestures, such as expressions and actions.

以圖2所示的直播過程為例，在所接收的第一消息包含文本內容“如何洗手”的情況下，通過自然語言處理算法，可以識別出使用者的語言意圖是“諮詢如何洗手”。通過在預設的數據庫中進行檢索，可以獲得符合回答如何洗手的內容，並將該內容作為應答文本。通過根據所述應答文本生成動作驅動數據、聲音驅動數據、姿態驅動數據，可以使所述互動物件在通過語音回答“如何洗手”這一問題的同時，做出與發音相匹配的表情、動作，並同時用肢體動作來演示如何洗手。Taking the live broadcast process shown in FIG. 2 as an example, in the case where the received first message contains the text content "how to wash hands", through natural language processing algorithms, it can be recognized that the user's verbal intention is "to consult how to wash hands". By searching in a preset database, you can obtain content that matches the answer to how to wash your hands, and use this content as the response text. By generating action-driven data, sound-driven data, and posture-driven data according to the response text, the interactive object can make expressions and actions that match the pronunciation while answering the question "How to wash hands" through voice. And at the same time use body movements to demonstrate how to wash hands.

在一些實施例中，還可以向所述客戶端發送包括所述應答文本的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答文本。In some embodiments, instruction information including the response text may also be sent to the client, so that the client displays the response text based on the instruction information.

例如，對於回應“如何洗手”這一問題的應答文本，可以通過將包含所述應答文本的指示資訊發送至客戶端，以在所述客戶端上以文本的形式顯示所述指示消息，以使使用者能夠更加準確地接收到互動物件所傳達的資訊。For example, for the response text in response to the question of "How to wash your hands", the instruction information including the response text may be sent to the client to display the instruction message in the form of text on the client, so that The user can more accurately receive the information conveyed by the interactive object.

在一些實施例中，所述互動物件對應的虛擬模型（虛擬模型既可以是二維虛擬模型也可以是三維虛擬模型）可以儲存於客戶端。在這種情況下，可以將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端播放所述回應動畫。例如，可以控制所述客戶端根據所述驅動數據所包含的控制參數來調整所互動物件的虛擬模型參數；並基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並播放所述回應動畫來對所述第一消息進行回應。在虛擬模型為二維虛擬模型的情況下，虛擬模型參數為二維虛擬模型參數，在虛擬模型為三維虛擬模型的情況下，虛擬模型參數為三維虛擬模型參數。又例如，伺服器可以基於驅動數據，確定用於控制互動物件的回應方式的控制指令，並向客戶端發送所述控制指令，以使所述客戶端基於所述控制指令顯示進行回應的互動物件的畫面。In some embodiments, the virtual model corresponding to the interactive object (the virtual model can be either a two-dimensional virtual model or a three-dimensional virtual model) can be stored on the client. In this case, the driving data of the interactive object may be sent to the client, so that the client generates a response animation according to the driving data; and the client is controlled to play the response animation. For example, the client can be controlled to adjust the virtual model parameters of the interactive object according to the control parameters included in the driving data; and based on the adjusted virtual model parameters, a rendering engine can be used to generate a response animation of the interactive object, and The response animation is played to respond to the first message. When the virtual model is a two-dimensional virtual model, the virtual model parameters are two-dimensional virtual model parameters, and when the virtual model is a three-dimensional virtual model, the virtual model parameters are three-dimensional virtual model parameters. For another example, the server may determine a control instruction for controlling the response mode of an interactive object based on the driving data, and send the control instruction to the client, so that the client displays the interactive object that responds based on the control instruction Picture.

在互動物件的虛擬模型的數據量較小，對於客戶端的性能佔用不高的情況下，可以通過將所述驅動數據發送至所述客戶端，使所述客戶端根據所述驅動數據生成回應動畫，從而可以方便靈活地顯繪示進行回應的互動物件的畫面。In the case where the data volume of the virtual model of the interactive object is small and the performance of the client is not high, the driving data can be sent to the client so that the client can generate a response animation according to the driving data , So that the screen of the interactive object that responds can be displayed conveniently and flexibly.

在一些實施例中，所述互動物件對應的虛擬模型儲存於伺服器端或雲端。在這種情況下，可以基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫，所述回應動畫中顯示所述互動物件的動作或表情。通過將所述回應動畫發送至客戶端來實現所述互動物件的回應，可以避免客戶端進行渲染導致的卡頓，並且能夠在客戶端顯示高質量的回應動畫，提升了使用者的互動體驗。In some embodiments, the virtual model corresponding to the interactive object is stored on the server or in the cloud. In this case, the virtual model parameters of the interactive object can be adjusted based on the driving data; based on the adjusted virtual model parameters, the rendering engine is used to generate the response animation of the interactive object and send it to the client In the response animation, the action or expression of the interactive object is displayed in the response animation. By sending the response animation to the client to realize the response of the interactive object, it is possible to avoid jams caused by the rendering of the client, and can display high-quality response animations on the client, which improves the user's interactive experience.

圖4繪示根據本公開至少一個實施例的另一種互動方法的流程圖。該互動方法可應用於客戶端。所述方法包括步驟401~402。FIG. 4 shows a flowchart of another interactive method according to at least one embodiment of the present disclosure. This interactive method can be applied to the client. The method includes steps 401-402.

在步驟401中，響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息。In step 401, in response to a user input operation from the client, a first message including instruction content is sent to the server.

範例性的，使用者輸入操作包括輸入文本操作、輸入語音操作、動作觸發操作、按鍵觸發操作等等，響應於所述使用者輸入操作，向伺服器發送的第一消息，第一消息中攜帶的指示內容包括但不限於文本、語音、圖像（例如表情、動作圖像）、視訊等中的一種或多種。例如，在視訊直播場景下，所述客戶端可以是支持觀看視訊直播功能的客戶端，所述第一消息可以在客戶端採集到使用者在顯示介面輸入文本內容後發送出去，第一消息攜帶的指示內容例如為輸入的文本內容，且該指示內容可以通過彈幕的形式顯示在顯示介面中。又例如，在體感互動場景下，所述第一消息可以在客戶端採集到使用者行為圖像後發送出去，第一消息攜帶的指示內容例如為採集的使用者行為圖像。當然，具體實施中本公開對第一消息的發送機制以及第一消息中攜帶的指示內容的形式並不進行限制。Exemplarily, the user input operation includes text input operation, voice input operation, action trigger operation, key trigger operation, etc., in response to the user input operation, a first message sent to the server, and the first message carries The instruction content of includes, but is not limited to, one or more of text, voice, images (such as emoticons, action images), and video. For example, in a live video broadcast scenario, the client may be a client that supports the function of watching live video, the first message may be sent after the client has collected the text content entered by the user on the display interface, and the first message carries The instruction content of is, for example, the input text content, and the instruction content can be displayed on the display interface in the form of a barrage. For another example, in a somatosensory interaction scenario, the first message may be sent after the client terminal collects the user behavior image, and the instruction content carried in the first message is, for example, the collected user behavior image. Of course, in specific implementation, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message.

在步驟402中，基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。In step 402, based on the second message that the server responds to the first message, the response animation of the interactive object is played on the display interface of the client.

所述第二消息為所述伺服器響應於所述第一消息所包含的指示內容所生成的，用於使所述客戶端顯示對所述指示內容做出的回應的互動物件。The second message is generated by the server in response to the instruction content included in the first message, and is used to make the client display an interactive object that responds to the instruction content.

在本公開實施例中，通過根據使用者輸入操作向伺服器發送包括指示內容的第一消息，基於所述伺服器響應於所述第一消息回應的第二消息，在客戶端中顯示互動物件對所述指示內容做出的回應，可以使互動物件可以對於使用者的指示內容進行及時反饋，實現與使用者的及時互動。In the embodiment of the present disclosure, by sending a first message including instruction content to the server according to a user input operation, based on the second message that the server responds to the first message, the interactive object is displayed in the client The response to the instruction content can enable the interactive object to provide timely feedback to the user's instruction content, thereby realizing timely interaction with the user.

在一些實施例中，所述指示內容包括文本內容；所述方法還包括：在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。也即，可以在客戶端顯示使用者輸入的文本內容；還可以在客戶端播放所述文本內容對應的音訊文件，輸出所述文本內容對應的語音。In some embodiments, the instruction content includes text content; the method further includes: displaying the text content on a display interface of the client, and/or determining and playing an audio file corresponding to the text content. That is, the text content input by the user can be displayed on the client; the audio file corresponding to the text content can also be played on the client, and the voice corresponding to the text content can be output.

在一些實施例中，所述在所述客戶端中顯示所述文本內容，包括：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。In some embodiments, the displaying the text content in the client includes: generating barrage information of the text content; and displaying the barrage information on a display interface of the client.

在視訊直播場景下，對於使用者輸入的文本內容，可以生成對應的彈幕資訊，並在客戶端的顯示介面顯示所述彈幕資訊。以圖2為例，在使用者在客戶端的直播互動介面輸入“如何洗手”的情況下，在顯示介面可以顯示該文本內容對應的彈幕資訊“如何洗手”。In the live video scenario, for the text content input by the user, corresponding barrage information can be generated, and the barrage information can be displayed on the display interface of the client. Taking Figure 2 as an example, in the case where the user inputs "how to wash hands" on the live interactive interface of the client, the barrage information "how to wash hands" corresponding to the text content can be displayed on the display interface.

在一些實施例中，所述第二消息中包括針對所述指示內容的應答文本；所述方法還包括：在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。In some embodiments, the second message includes a response text for the instruction content; the method further includes: displaying the response text on the display interface of the client, and/or, determining and playing the The audio file corresponding to the answer text.

所述指示內容的應答文本可以通過以下方式獲得：識別所述文本內容所表達的語言意圖，並從預設的數據庫中查找與所述語言意圖匹配的應答文本。具體方法參見上述實施例所述，在此不再贅述。The response text indicating the content may be obtained in the following manner: recognizing the linguistic intention expressed by the text content, and searching for a response text matching the linguistic intention from a preset database. For the specific method, refer to the description in the foregoing embodiment, and will not be repeated here.

以視訊直播場景為例，在顯示介面可以同樣以彈幕資訊的形式，顯示對於使用者的彈幕資訊進行回復的應答文本；並且可以在顯示介面播放所述應答文本對應的音訊文件，也即輸出所述應答文本對應的語音，從而可以對使用者的彈幕資訊進行精准、直觀的回復，提升使用者的互動體驗。Taking the live video scene as an example, the display interface can also display the response text in response to the user's barrage information in the form of barrage information; and the audio file corresponding to the response text can be played on the display interface, that is, The voice corresponding to the response text is output, so that the user's bullet screen information can be accurately and intuitively responded to, and the user's interactive experience can be improved.

在一些實施例中，所述第二消息中包括與所述應答文本對應的音素序列匹配的所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中。其中，生成與所述應答文本對應的音素序列匹配的所述互動物件的控制參數，以及生成與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數的具體方法參見上述實施例所述，在此不再贅述。In some embodiments, the second message includes a control parameter of the interactive object that matches the phoneme sequence corresponding to the response text, and/or matches at least one target text contained in the response text The control parameter of the setting action of the interactive object; the second message based on the server's response to the first message, playing the response animation of the interactive object on the display interface of the client, includes: Based on the driving data, adjust the virtual model parameters of the interactive object; based on the adjusted virtual model parameters, use a rendering engine to generate a response animation of the interactive object, and display it on the display interface of the client. Wherein, generating the control parameter of the interactive object matching the phoneme sequence corresponding to the response text, and generating the control parameter of the setting action of the interactive object matching at least one target text contained in the response text For the specific method, refer to the description in the foregoing embodiment, and will not be repeated here.

在互動物件的虛擬模型的數據量較小，對於客戶端的性能佔用不高的情況下，所述客戶端獲取所述驅動數據，並根據所述驅動數據生成回應動畫，從而可以方便靈活地顯繪示進行回應的互動物件的畫面。When the amount of data of the virtual model of the interactive object is small and the performance of the client is not high, the client obtains the driving data, and generates a response animation based on the driving data, so that the display can be conveniently and flexibly displayed The screen showing the interactive object that responds.

在一些實施例中，所述第二消息還包括所述互動物件對所述指示內容做出的回應動畫；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：在所述客戶端的顯示介面中顯示所述回應動畫。In some embodiments, the second message further includes a response animation of the interactive object to the instruction content; the second message based on the server's response to the first message, in the Playing the response animation of the interactive object on the display interface of the client includes: displaying the response animation on the display interface of the client.

在一些實施例中，所述互動物件對應的虛擬模型儲存於伺服器端或雲端。在這種情況下，可以在伺服器端或雲端生成回應動畫。生成回應動畫的具體方式參見上述實施例，在此不再贅述。In some embodiments, the virtual model corresponding to the interactive object is stored on the server or in the cloud. In this case, the response animation can be generated on the server side or in the cloud. For the specific method of generating the response animation, refer to the above-mentioned embodiment, which will not be repeated here.

通過將所述回應動畫發送至客戶端來實現所述互動物件的回應，可以避免客戶端進行渲染導致的卡頓，並且能夠在客戶端顯示高質量的回應動畫，提升了使用者的互動體驗。By sending the response animation to the client to realize the response of the interactive object, it is possible to avoid jams caused by the rendering of the client, and can display high-quality response animations on the client, which improves the user's interactive experience.

在一些實施例中，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；該情況下，響應於來自客戶端的使用者輸入操作，所述方法還包括：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。In some embodiments, the input operation of the user includes that the user follows the body operation screen displayed in the display interface to make a corresponding human body posture; in this case, in response to the user input operation from the client The method further includes: acquiring a user behavior image including the human body posture; identifying the human body posture information in the user behavior image, and driving the interactive object displayed on the display interface based on the human body posture information Respond.

在一些實施例中，所述基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應，包括：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。In some embodiments, the driving the interactive object displayed on the display interface to respond based on the body posture information includes: determining the degree of matching between the body posture information and the body posture in the limb operation screen; based on The matching degree drives the interactive object displayed on the display interface to respond.

在一些實施例中；所述基於所述匹配度，驅動所述互動物件進行回應，包括：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。In some embodiments; the driving the interactive object to respond based on the matching degree includes: in the case where the matching degree reaches a set condition, instructing the interactive object displayed on the display interface to make a first response , Wherein the first response includes displaying a qualified body movement and/or voice prompt; and displaying the next body operation screen; in the case that the matching degree does not reach the set condition, instructing the interactive object displayed on the display interface A second response is made, where the second response includes displaying the physical actions and/or voice prompts that are not qualified; and keeping the current physical operation screen displayed.

範例性的，以下為本公開實施例應用在視訊直播平臺的場景下的一些實施例：Exemplarily, the following are some embodiments in which the embodiments of the present disclosure are applied in the scenario of a live video platform:

在一些實施例中，所接收的來自客戶端的第一消息是直播平臺傳送的使用者彈幕文本。In some embodiments, the first message received from the client is a user barrage text transmitted by the live broadcast platform.

在一些實施例中，通過自然語言處理算法分析彈幕的意圖後，得到對應的回答，之後通過互動物件播報所述回答的內容。並且，還可以通過互動物件顯示所述回答的內容對應的動作。In some embodiments, after analyzing the intention of the barrage by a natural language processing algorithm, a corresponding answer is obtained, and then the content of the answer is broadcast through an interactive object. In addition, the action corresponding to the content of the answer can also be displayed through an interactive object.

在一些實施例中，直接集成客戶端的自然語言處理能力，對所述第一消息包括的指示內容進行自然語言處理，得到與所述指示內容的語言意圖匹配的、符合所述語言意圖的應答文本，並將所輸出的所述應答文本對應的文字直接提供給互動物件進行播放。In some embodiments, the natural language processing capability of the client is directly integrated, and natural language processing is performed on the instruction content included in the first message to obtain a response text that matches the language intent of the instruction content and conforms to the language intent. , And provide the text corresponding to the output response text directly to the interactive object for playback.

在一些實施例中，互動物件可以模仿使用者的說話內容。例如，對於使用者通過客戶端輸入的語音，通過將所述語音轉換成文本，並根據語音獲取所述使用者的聲音特徵，並基於所述聲音特徵輸出文本對應的語音，即能夠實現互動物件模仿使用者的說話內容。In some embodiments, the interactive object can imitate the user's speech content. For example, for the voice input by the user through the client, the interactive object can be realized by converting the voice into text, acquiring the user's voice feature based on the voice, and outputting the voice corresponding to the text based on the voice feature Imitate what the user said.

在一些實施例中，互動物件還可以根據自然語言處理返回的內容進行頁面顯示，可按照預先設計的需顯示的內容，以及互動方式顯示UI內容進行顯示，從而使回應內容的顯示更加醒目，吸引使用者的注意力。In some embodiments, the interactive object can also display the page according to the content returned by the natural language processing, and can display the UI content according to the pre-designed content to be displayed and the interactive way, so that the display of the response content is more eye-catching and attractive. The user's attention.

在上述實施例中可以直播實時互動，直播過程中，使用者可與互動物件進行實時互動，得到反饋。還可以不間斷直播，還可以可自動生產視訊內容，是一種新的電視直播方式。In the above embodiment, real-time interaction can be broadcast live, and during the live broadcast process, the user can interact with the interactive object in real-time and get feedback. It can also broadcast live without interruption, and can also automatically produce video content, which is a new way of live TV broadcast.

範例性的，互動物件可以表現為三維形式的數位人。數位人將人工智能（Artificial Intelligence, AI）仿真動畫生成能力與自然語言理解能力相結合，可以像真人一樣聲型並茂和使用者進行交流。數位人可以根據回答內容生成相應的嘴形、表情、眼神及全身動作，最終輸出高質量、音視訊同步的語音和多維動畫內容，將完整的數位人形象自然地呈現給使用者。Exemplarily, interactive objects can be represented as digital people in three-dimensional form. Several people combine Artificial Intelligence (AI) simulation animation generation capabilities with natural language comprehension capabilities, and they can communicate with users in the same voice as a real person. Digital people can generate corresponding mouth shapes, expressions, eye expressions, and body movements based on the content of the answers, and finally output high-quality, synchronized audio and video voice and multi-dimensional animation content, which naturally presents the complete digital image to users.

在一些實施例中，可以快速對接不同知識領域的內容服務庫，高效應用到更多行業，同時還可針對不同場景需要，提供超寫實、卡通等多種風格的數位人形象，支持通過人臉識別、手勢識別等AI技術與使用者進行智能互動。例如，超寫實風格的數位人可打造銀行、營業廳、服務大廳的智能前臺，與客戶進行真實有效的觸達，提高服務質量和客戶滿意度。In some embodiments, content service libraries in different knowledge fields can be quickly connected to more industries efficiently. At the same time, it can also provide super-realistic, cartoon and other styles of digital images for different scenarios, and support facial recognition. , Gesture recognition and other AI technologies interact with users intelligently. For example, a super-realistic style of digital people can build intelligent front desks for banks, business halls, and service halls, and have real and effective contact with customers, improving service quality and customer satisfaction.

在一些實施例中，卡通風格的數位人可應用於以趣味互動為導向的場景，如線下商超中的智能引導員，或者是智能教練、虛擬教師等，達到顧客引流、激發興趣、強化教學效果等目的。In some embodiments, cartoon-style digital people can be used in interesting interaction-oriented scenarios, such as smart guides in offline supermarkets, or smart coaches, virtual teachers, etc., to attract customers, stimulate interest, and strengthen Teaching effect and other purposes.

本公開至少一個實施例還提供了一種互動裝置，可應用於伺服器。如圖5所示，所述裝置50包括：接收單元501，用於接收來自客戶端的第一消息；獲取單元502，用於基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據；驅動單元503，用於利用所述驅動數據，控制所述客戶端的顯示介面中播放所述互動物件的回應動畫。At least one embodiment of the present disclosure also provides an interactive device, which can be applied to a server. As shown in FIG. 5, the device 50 includes: a receiving unit 501, configured to receive a first message from a client; and an acquiring unit 502, configured to acquire a match with the instruction content based on the instruction content included in the first message The driving data; the driving unit 503 is used to use the driving data to control the display interface of the client to play the response animation of the interactive object.

在一些實施例中，獲取單元502用於：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。In some embodiments, the obtaining unit 502 is configured to: obtain response content for the instruction content, the response content including a response text; and based on at least one target text contained in the response text, obtain a reference to the target text The control parameter of the set action of the matched interactive object.

在一些實施例中，獲取單元502用於：獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。In some embodiments, the obtaining unit 502 is configured to: obtain response content for the indication content, the response content including a phoneme sequence; and obtain control parameters of the interactive object matching the phoneme sequence.

在一些實施例中，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取單元502獲取與所述音素序列匹配的互動物件的控制參數時，用於：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。In some embodiments, the control parameter of the interactive object includes a posture control vector of at least one local area. When the acquisition unit 502 acquires the control parameter of the interactive object matching the phoneme sequence, it is used to: The sequence is feature-coded to obtain the first coding sequence corresponding to the phoneme sequence; according to the first coding sequence, the feature code corresponding to at least one phoneme is obtained; at least one partial region of the interactive object corresponding to the feature code is obtained Attitude control vector.

在一些實施例中，所述方法還包括發送單元，用於向所述客戶端發送包括針對所述指示內容的應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。In some embodiments, the method further includes a sending unit for sending instruction information including response content to the instruction content to the client, so that the client displays the response based on the instruction information content.

在一些實施例中，驅動單元503用於：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的二維或三維虛擬模型參數；基於調整後的二維或三維虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。In some embodiments, the driving unit 503 is configured to: send the driving data of the interactive object to the client, so that the client generates a response animation according to the driving data; and control the client to play in the display interface The response animation; or, based on the driving data, adjust the two-dimensional or three-dimensional virtual model parameters of the interactive object; based on the adjusted two-dimensional or three-dimensional virtual model parameters, use a rendering engine to generate the response animation of the interactive object , And send the response animation to the client.

本公開至少一個實施例還提供了另一種互動裝置，可應用於客戶端。如圖6所示，所述裝置60包括：發送單元601，用於響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；播放單元602，用於基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。所述互動物件為通過虛擬模型諸如二維或三維虛擬模型渲染得到的。At least one embodiment of the present disclosure also provides another interactive device, which can be applied to a client. As shown in FIG. 6, the device 60 includes: a sending unit 601, configured to send a first message including an instruction content to the server in response to a user input operation from the client; The second message that the device responds to the first message plays the response animation of the interactive object on the display interface of the client. The interactive object is rendered through a virtual model such as a two-dimensional or three-dimensional virtual model.

在一些實施例中，所述指示內容包括文本內容；所述裝置還包括第一顯示單元，用於在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。In some embodiments, the instruction content includes text content; the device further includes a first display unit for displaying the text content on the display interface of the client, and/or, determining and playing the text The audio file corresponding to the content.

在一些實施例中，所述第一顯示單元在用於在所述客戶端中顯示所述文本內容時，具體用於：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。In some embodiments, when the first display unit is used to display the text content in the client, it is specifically used to: generate barrage information of the text content; in the display interface of the client Display the bullet screen information.

在一些實施例中，所述第二消息中包括針對所述指示內容的應答文本；所述裝置還包括第二顯示單元，用於在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。In some embodiments, the second message includes a response text for the instruction content; the device further includes a second display unit for displaying the response text on the display interface of the client, and/ Or, determine and play the audio file corresponding to the response text.

在一些實施例中，所述第二消息中包括所述互動物件的驅動數據；所述播放單元602用於：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與針對所述指示內容的應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數。In some embodiments, the second message includes driving data of the interactive object; the playing unit 602 is configured to: adjust the virtual model parameters of the interactive object based on the driving data; Model parameters, using a rendering engine to generate a response animation of the interactive object, and display it on the display interface of the client; wherein, the driving data includes a phoneme sequence matching the phoneme sequence corresponding to the response text for the instruction content The control parameter of the interactive object, and/or the control parameter of the setting action of the interactive object that matches at least one target text contained in the response text.

在一些實施例中，所述第二消息包括所述互動物件對所述指示內容做出的回應動畫。In some embodiments, the second message includes a response animation of the interactive object to the instruction content.

在一些實施例中，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；生成單元601用於：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。In some embodiments, the input operation of the user includes that the user follows the body operation screen displayed on the display interface to make a corresponding human body posture; the generating unit 601 is configured to: obtain information including the human body posture User behavior image; recognize the human body posture information in the user behavior image, and based on the human body posture information, drive the interactive object displayed on the display interface to respond.

在一些實施例中，生成單元601具體用於：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。In some embodiments, the generating unit 601 is specifically configured to: determine the degree of matching between the human body posture information and the human posture in the limb operation screen; based on the degree of matching, drive the interactive object displayed on the display interface to respond .

在一些實施例中，生成單元601具體用於：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。In some embodiments, the generating unit 601 is specifically configured to: when the matching degree reaches a set condition, instruct the interactive object displayed on the display interface to make a first response, wherein the first response includes a qualified display posture And display the next body operation screen; in the case that the matching degree does not reach the set condition, instruct the interactive object displayed on the display interface to make a second response, wherein the second The response includes displaying unqualified body movements and/or voice prompts; and keeping the current body operation screen displayed.

本公開至少一個實施例還提供了一種電子設備，如圖7所示，電子設備70包括記憶體701和處理器702，所述記憶體701用於儲存可在處理器702上運行的計算機指令，所述處理器702用於在執行所述計算機指令時實現本公開涉及伺服器實施例所述的互動方法。At least one embodiment of the present disclosure also provides an electronic device. As shown in FIG. 7, the electronic device 70 includes a memory 701 and a processor 702. The memory 701 is used to store computer instructions that can run on the processor 702, The processor 702 is configured to implement the interactive method described in the server embodiments of the present disclosure when the computer instructions are executed.

本說明書至少一個實施例還提供了一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器701執行時實現本公開涉及伺服器實施例所述的互動方法。At least one embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and the program is executed by the processor 701 to implement the interactive method described in the server embodiments of the present disclosure.

本公開至少一個實施例還提供了一種電子設備，如圖8所示，電子設備80包括記憶體801和處理器802，所述記憶體801用於儲存可在處理器802上運行的計算機指令，所述處理器802用於在執行所述計算機指令時實現本公開涉及客戶端實施例所述的互動方法。At least one embodiment of the present disclosure also provides an electronic device. As shown in FIG. 8, the electronic device 80 includes a memory 801 and a processor 802. The memory 801 is used to store computer instructions that can run on the processor 802. The processor 802 is configured to implement the interactive method described in the embodiments of the present disclosure related to the client when the computer instructions are executed.

本說明書至少一個實施例還提供了一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器802執行時實現本公開涉及客戶端實施例所述的互動方法。At least one embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and the program is executed by the processor 802 to implement the interactive method described in the embodiments of the present disclosure related to the client.

本領域技術人員應明白，本說明書一個或多個實施例可提供為方法、系統或計算機程式產品。因此，本說明書一個或多個實施例可採用完全硬體實施例、完全軟體實施例或結合軟體和硬體方面的實施例的形式。而且，本說明書一個或多個實施例可採用在一個或多個其中包含有計算機可用程式代碼的計算機可用儲存媒體（包括但不限於磁碟記憶體、CD-ROM、光學記憶體等）上實施的計算機程式產品的形式。Those skilled in the art should understand that one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification can be implemented on one or more computer-usable storage media (including but not limited to magnetic disk memory, CD-ROM, optical memory, etc.) containing computer-usable program codes. In the form of a computer program product.

本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於數據處理設備實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

上述對本說明書特定實施例進行了描述。其他實施例在所附請求項的範圍內。在一些情況下，在請求項中記載的行為或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外，在附圖中描繪的過程不一定要求繪示的特定順序或者連續順序才能實現期望的結果。在某些實施方式中，多任務處理和並行處理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the request items may be executed in a different order from the embodiment and still achieve the desired result. In addition, the processes depicted in the drawings do not necessarily require the specific sequence or sequential sequence depicted in order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

本說明書中描述的主題及功能操作的實施例可以在以下中實現：數位電子電路、有形體現的計算機軟體或韌體、包括本說明書中公開的結構及其結構性等同物的計算機硬體、或者它們中的一個或多個的組合。本說明書中描述的主題的實施例可以實現為一個或多個計算機程式，即編碼在有形非暫時性程式載體上以被數據處理裝置執行或控制數據處理裝置的操作的計算機程式指令中的一個或多個模組。可替代地或附加地，程式指令可以被編碼在人工生成的傳播訊號上，例如機器生成的電、光或電磁訊號，該訊號被生成以將資訊編碼並傳輸到合適的接收機裝置以由數據處理裝置執行。計算機儲存媒體可以是機器可讀儲存設備、機器可讀儲存基板、隨機或序列存取記憶體設備、或它們中的一個或多個的組合。The embodiments of the subject and functional operations described in this specification can be implemented in the following: digital electronic circuits, tangible computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or A combination of one or more of them. The embodiments of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or one of the computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Multiple modules. Alternatively or additionally, program instructions may be encoded on artificially generated propagating signals, such as machine-generated electrical, optical, or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission. The processing device executes. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

本說明書中描述的處理及邏輯流程可以由執行一個或多個計算機程式的一個或多個可編程計算機執行，以通過根據輸入數據進行操作並生成輸出來執行相應的功能。所述處理及邏輯流程還可以由專用邏輯電路—例如FPGA（現場可編程門陣列）或ASIC（專用積體電路）來執行，並且裝置也可以實現為專用邏輯電路。The processing and logic flow described in this specification can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output. The processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Dedicated Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.

適合用於執行計算機程式的計算機包括，例如通用和/或專用微處理器，或任何其他類型的中央處理單元。通常，中央處理單元將從只讀記憶體和/或隨機存取記憶體接收指令和數據。計算機的基本組件包括用於實施或執行指令的中央處理單元以及用於儲存指令和數據的一個或多個記憶體設備。通常，計算機還將包括用於儲存數據的一個或多個大容量儲存設備，例如磁碟、磁光碟或光碟等，或者計算機將可操作地與此大容量儲存設備耦接以從其接收數據或向其傳送數據，抑或兩種情況兼而有之。然而，計算機不是必須具有這樣的設備。此外，計算機可以嵌入在另一設備中，例如移動電話、個人數位助理（PDA）、移動音訊或視訊播放器、遊戲操縱臺、全球定位系統（GPS）接收機、或例如通用序列匯流排（USB）快閃記憶體驅動器的便攜式儲存設備，僅舉幾例。Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit. Generally, the central processing unit will receive commands and data from read-only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical discs, or the computer will be operably coupled to this mass storage device to receive data or Send data to it, or both. However, the computer does not have to have such equipment. In addition, the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) Portable storage devices with flash drives, to name a few.

適合於儲存計算機程式指令和數據的計算機可讀媒體包括所有形式的非揮發性記憶體、媒體和記憶體設備，例如包括半導體記憶體設備（例如EPROM、EEPROM和快閃記憶體設備）、磁碟（例如內部硬碟或可移動碟）、磁光碟以及CD ROM和DVD-ROM。處理器和記憶體可由專用邏輯電路補充或併入專用邏輯電路中。Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (Such as internal hard disk or removable disk), magneto-optical disk, and CD ROM and DVD-ROM. The processor and memory can be supplemented by or incorporated into a dedicated logic circuit.

雖然本說明書包含許多具體實施細節，但是這些不應被解釋為限制任何發明的範圍或所要求保護的範圍，而是主要用於描述特定發明的具體實施例的特徵。本說明書內在多個實施例中描述的某些特徵也可以在單個實施例中被組合實施。另一方面，在單個實施例中描述的各種特徵也可以在多個實施例中分開實施或以任何合適的子組合來實施。此外，雖然特徵可以如上所述在某些組合中起作用並且甚至最初如此要求保護，但是來自所要求保護的組合中的一個或多個特徵在一些情況下可以從該組合中去除，並且所要求保護的組合可以指向子組合或子組合的變型。Although this specification contains many specific implementation details, these should not be construed as limiting the scope of any invention or the scope of the claimed protection, but are mainly used to describe the features of specific embodiments of a particular invention. Certain features described in multiple embodiments in this specification can also be implemented in combination in a single embodiment. On the other hand, various features described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. In addition, although features may function in certain combinations as described above and even initially claimed as such, one or more features from the claimed combination may in some cases be removed from the combination, and the claimed The combination of protection can be directed to a sub-combination or a variant of the sub-combination.

類似地，雖然在附圖中以特定順序描繪了操作，但是這不應被理解為要求這些操作以所示的特定順序執行或順次執行、或者要求所有例示的操作被執行，以實現期望的結果。在某些情況下，多任務和並行處理可能是有利的。此外，上述實施例中的各種系統模組和組件的分離不應被理解為在所有實施例中均需要這樣的分離，並且應當理解，所描述的程式組件和系統通常可以一起集成在單個軟體產品中，或者封裝成多個軟體產品。Similarly, although operations are depicted in a specific order in the drawings, this should not be construed as requiring these operations to be performed in the specific order shown or performed sequentially, or requiring all illustrated operations to be performed to achieve desired results . In some cases, multitasking and parallel processing may be advantageous. In addition, the separation of various system modules and components in the above embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can usually be integrated together in a single software product. Or packaged into multiple software products.

由此，主題的特定實施例已被描述。其他實施例在所附請求項的範圍以內。在某些情況下，請求項中記載的動作可以以不同的順序執行並且仍實現期望的結果。此外，附圖中描繪的處理並非必需所示的特定順序或順次順序，以實現期望的結果。在某些實現中，多任務和並行處理可能是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recorded in the request can be executed in a different order and still achieve the desired result. In addition, the processes depicted in the drawings are not necessarily in the specific order or sequential order shown in order to achieve the desired result. In some implementations, multitasking and parallel processing may be advantageous.

以上所述僅為本說明書一個或多個實施例的一些實施例而已，並不用以限制本說明書一個或多個實施例，凡在本說明書一個或多個實施例的精神和原則之內，所做的任何修改、等同替換、改進等，均應包含在本說明書一個或多個實施例保護的範圍之內。The above descriptions are only some of the embodiments of one or more embodiments of this specification, and are not used to limit one or more embodiments of this specification. All within the spirit and principle of one or more embodiments of this specification, all Any modification, equivalent replacement, improvement, etc., shall be included in the protection scope of one or more embodiments of this specification.

101:接收來自客戶端的第一消息的步驟 102:基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據的步驟 103:利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟 401:響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息的步驟 402:基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟 501:接收單元 502:獲取單元 503:驅動單元 50:互動裝置 601:發送單元 602:播放單元 60:互動裝置 702:處理器 701:記憶體 70:電子設備101: Step of receiving the first message from the client 102: A step of obtaining driving data matching the instruction content based on the instruction content included in the first message 103: Using the driving data to control the display interface of the client to play the response animation of the interactive object 401: In response to a user input operation from the client, a step of sending a first message including the instruction content to the server 402: A step of playing the response animation of the interactive object on the display interface of the client based on the second message that the server responded to the first message 501: receiving unit 502: Get Unit 503: drive unit 50: Interactive installation 601: sending unit 602: Play Unit 60: Interactive device 702: processor 701: memory 70: electronic equipment

圖1繪示根據本公開至少一個實施例的一種互動方法的流程圖。圖2繪示本公開至少一個實施例所提出的互動方法應用於直播過程的示意圖。圖3繪示了本公開至少一個實施例提出的獲得姿態控制向量的方法流程圖。圖4繪示根據本公開至少一個實施例的另一種互動方法的流程圖。圖5繪示根據本公開至少一個實施例的一種互動裝置的結構示意圖。圖6繪示根據本公開至少一個實施例的另一種互動裝置的結構示意圖。圖7繪示根據本公開至少一個實施例的一種電子設備的結構示意圖。圖8繪示根據本公開至少一個實施例的另一種電子設備的結構示意圖。Fig. 1 shows a flowchart of an interactive method according to at least one embodiment of the present disclosure. FIG. 2 is a schematic diagram illustrating the application of the interactive method proposed by at least one embodiment of the present disclosure to a live broadcast process. Fig. 3 shows a flowchart of a method for obtaining attitude control vectors proposed by at least one embodiment of the present disclosure. FIG. 4 shows a flowchart of another interactive method according to at least one embodiment of the present disclosure. FIG. 5 is a schematic structural diagram of an interactive device according to at least one embodiment of the present disclosure. FIG. 6 is a schematic structural diagram of another interactive device according to at least one embodiment of the present disclosure. FIG. 7 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure. FIG. 8 is a schematic structural diagram of another electronic device according to at least one embodiment of the present disclosure.

101:接收來自客戶端的第一消息的步驟101: Step of receiving the first message from the client

102:基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據的步驟102: A step of obtaining driving data matching the instruction content based on the instruction content included in the first message

103:利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟103: Using the driving data to control the display interface of the client to play the response animation of the interactive object

Claims

一種互動方法，包括：接收來自客戶端的第一消息；基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據；利用所述驅動數據，控制所述客戶端的顯示介面播放互動物件的回應動畫。An interactive method that includes: Receive the first message from the client; Obtaining driving data matching the instruction content based on the instruction content included in the first message; Using the driving data, the display interface of the client is controlled to play the response animation of the interactive object.

如請求項1所述的互動方法，其中，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本，並基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數；和/或，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括音素序列，並獲取與所述音素序列匹配的所述互動物件的控制參數。The interactive method according to claim 1, wherein: The acquiring driving data matching the instruction content based on the instruction content included in the first message includes: Acquiring response content for the instruction content, the response content including a response text, and based on at least one target text contained in the response text, acquiring a control parameter of a setting action of an interactive object matching the target text; and / or, The acquiring driving data matching the instruction content based on the instruction content included in the first message includes: Acquire response content for the instruction content, where the response content includes a phoneme sequence, and acquire control parameters of the interactive object matching the phoneme sequence.

如請求項2所述的互動方法，其中，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取與所述音素序列匹配的互動物件的控制參數，包括：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。The interactive method according to claim 2, wherein the control parameter of the interactive object includes a gesture control vector of at least one local area, and the obtaining the control parameter of the interactive object matching the phoneme sequence includes: Performing feature encoding on the phoneme sequence to obtain a first encoding sequence corresponding to the phoneme sequence; Obtaining a feature code corresponding to at least one phoneme according to the first coding sequence; Obtain a posture control vector of at least one partial area of the interactive object corresponding to the feature code.

如請求項1所述的互動方法，其中，所述利用所述驅動數據，控制所述客戶端在顯示介面中播放所述互動物件的回應動畫，包括：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。The interactive method according to claim 1, wherein the using the driving data to control the client to play the response animation of the interactive object on the display interface includes: Sending the driving data of the interactive object to the client, so that the client generates a response animation according to the driving data; controlling the client to play the response animation on a display interface; Alternatively, based on the driving data, adjust the virtual model parameters of the interactive object; based on the adjusted virtual model parameters, use a rendering engine to generate a response animation of the interactive object, and send the response animation to the client.

一種互動方法，包括：響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。An interactive method that includes: In response to a user input operation from the client, sending a first message including the instruction content to the server; Based on the second message that the server responds to the first message, the response animation of the interactive object is played on the display interface of the client.

如請求項5所述的互動方法，其中，所述指示內容包括文本內容；所述方法還包括：在所述客戶端中顯示所述文本內容，和/或，播放所述文本內容對應的音訊文件；其中，所述在所述客戶端中顯示所述文本內容，包括：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面顯示所述彈幕資訊。The interactive method according to claim 5, wherein the instruction content includes text content; The method further includes: displaying the text content in the client, and/or playing an audio file corresponding to the text content; Wherein, the displaying the text content in the client includes: generating barrage information of the text content; and displaying the barrage information on a display interface of the client.

如請求項5所述的互動方法，其中，所述第二消息中包括針對所述指示內容的應答文本；所述方法還包括：在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。The interactive method according to claim 5, wherein the second message includes a response text to the instruction content; the method further includes: The response text is displayed on the display interface of the client, and/or the audio file corresponding to the response text is determined and played.

如請求項5至7任一所述的互動方法，其中，所述第二消息中包括所述互動物件的驅動數據；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與所述應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的用於所述互動物件的設定動作的控制參數。The interactive method according to any one of claim items 5 to 7, wherein the second message includes driving data of the interactive object; The playing the response animation of the interactive object on the display interface of the client based on the second message that the server responds to the first message includes: Adjusting the virtual model parameters of the interactive object based on the driving data; Based on the adjusted virtual model parameters, use a rendering engine to generate a response animation of the interactive object, and display it on the display interface of the client; Wherein, the driving data includes control parameters for the interactive object that match the phoneme sequence corresponding to the response text, and/or, for all the interactive objects that match the at least one target text contained in the response text The control parameters of the setting action of the interactive object are described.

如請求項5所述的互動方法，其中，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；響應於來自客戶端的使用者輸入操作，所述方法還包括：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊；基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。The interactive method according to claim 5, wherein: The input operation of the user includes that the user follows the body operation screen displayed in the display interface to make a corresponding human body posture; In response to a user input operation from the client, the method further includes: Acquiring a user behavior image including the human body posture; Identifying human body posture information in the user behavior image; Based on the human body posture information, the interactive object displayed on the display interface is driven to respond.

如請求項9所述的互動方法，其中，所述基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應，包括：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。The interactive method according to claim 9, wherein the driving the interactive object displayed on the display interface to respond based on the human body posture information includes: Determining the degree of matching between the human body posture information and the human posture in the limb operation screen; Based on the matching degree, the interactive object displayed on the display interface is driven to respond.

如請求項10所述的互動方法，其中，所述基於所述匹配度，驅動所述互動物件進行回應，包括：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。The interactive method according to claim 10, wherein the driving the interactive object to respond based on the matching degree includes: In the case that the matching degree reaches the set condition, instruct the interactive object displayed on the display interface to make a first response, wherein the first response includes displaying a qualified body movement and/or voice prompt; and displaying the next one Physical operation screen; In the case that the matching degree does not reach the set condition, instruct the interactive object displayed on the display interface to make a second response, wherein the second response includes the physical actions and/or voice prompts that the displayed posture is unqualified; and hold Display the current body operation screen.

一種電子設備，所述設備包括記憶體、處理器，所述記憶體用於儲存在所述處理器上可運行的計算機指令，所述處理器用於在執行所述計算機指令時實現請求項1至4中任一項所述的互動方法，或者，所述處理器用於在執行所述計算機指令時實現請求項5至11中任一項所述的互動方法。An electronic device, the device includes a memory and a processor, the memory is used to store computer instructions runnable on the processor, and the processor is used to implement request items 1 to 1 when the computer instructions are executed. The interactive method according to any one of 4, or the processor is configured to implement the interactive method according to any one of claim items 5 to 11 when the computer instruction is executed.

一種計算機可讀儲存媒體，其上儲存有計算機程式，所述計算機程式被處理器執行時實現請求項1至4中任一項所述的互動方法，或者，所述計算機程式被處理器執行時實現請求項5至11中任一項所述的互動方法。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the interactive method described in any one of claim items 1 to 4 is realized, or when the computer program is executed by the processor Implement the interactive method described in any one of Claims 5 to 11.