TW201913300A

TW201913300A - Human-computer interaction method and human-computer interaction system

Info

Publication number: TW201913300A
Application number: TW107122724A
Authority: TW
Inventors: 田善晉
Original assignee: 英華達股份有限公司
Priority date: 2017-08-17
Filing date: 2018-07-02
Publication date: 2019-04-01
Also published as: CN107562195A; TWI681317B

Abstract

A human-computer interaction method and a human-computer interaction system are provided. The human-computer interaction method includes the following steps. One or more shape features of a specified object are obtained from a picture and/or a video containing the specified object. A virtual character is created by using the shape features. One or more voice features of the specified object are obtained from an audio and/or the video containing the specified object. A voice indication of a user is identified. A reply statement corresponding to the voice indication is inquired in a local and/or cloud database. The Virtual character interacts with the user according to the reply statement corresponding to the voice features of the specified object.

Description

人機互動方法及系統Human-computer interaction method and system

本發明係有關於人機互動領域，尤其係有關於一種人機互動方法及系統。The invention relates to the field of human-computer interaction, in particular to a human-computer interaction method and system.

目前，智慧設備的發展越來越迅速，人機互動成為了重點研究的重點之一。現有的智慧設備透過智慧應用，如Siri，來實現人機互動。然而這些虛擬角色僅僅提供語音的回饋，且語音的語調、節奏以及口音非常單一，不會變化。At present, the development of smart devices is becoming more and more rapid, and human-computer interaction has become one of the key research priorities. Existing smart devices implement human-computer interaction through smart applications such as Siri. However, these virtual characters only provide feedback of the voice, and the tone, rhythm and accent of the voice are very simple and will not change.

此外，在當前社會背景下，智慧陪護機器人的需求越來越多，無論是對於老人還是小孩的陪護，僅僅透過單一的語音的互動和回饋，對於被陪護人員而言是遠遠不夠的。In addition, in the current social context, the demand for smart escort robots is increasing. Whether it is for the elderly or children, the interaction and feedback of a single voice is not enough for the escorts.

本發明為了克服上述現有技術存在的缺點，提供一種人機互動方法及系統，其可實現形象化的虛擬角色互動。In order to overcome the shortcomings of the prior art described above, the present invention provides a human-machine interaction method and system, which can realize visualized virtual character interaction.

根據本發明的一個方面，提供一種人機互動方法，包括：自包含指定對象的圖片和/或視頻數據中獲取指定對象的至少一個外形特徵；利用外形特徵生成一虛擬角色；自包含指定對象的音頻和/或視頻數據中獲取指定對象的至少一個語音特徵；識別用戶的語音指示，在一本地或雲端資料庫中查詢該對應該語音指示的回覆語句；以及透過虛擬角色以指定對象的語音特徵及回覆語句與所述用戶進行互動。According to an aspect of the present invention, a method for human-computer interaction is provided, comprising: acquiring at least one shape feature of a specified object from a picture and/or video data containing a specified object; generating a virtual character by using the shape feature; Obtaining at least one voice feature of the specified object in the audio and/or video data; identifying a voice indication of the user, querying a reply statement corresponding to the voice indication in a local or cloud database; and specifying a voice feature of the object through the virtual character And the reply statement interacts with the user.

選擇性地，自包含指定對象的音頻和/或視頻數據中識別一個或多個對話，每個對話包括語音指示及回覆語句，將至少一個所述對話與虛擬角色關聯地儲存在本地或雲端資料庫中。按所述對話的出現頻率進行排序，將出現頻率最高的N個對話與虛擬角色關聯地儲存在本地或雲端資料庫中，N為大於0的整數。每個所述對話還包括語音特徵，不同的語音指示對應不同的語音特徵。Optionally, identifying one or more conversations from audio and/or video data comprising the specified object, each conversation comprising a voice indication and a reply statement, storing at least one of the conversations in association with the virtual character in local or cloud data In the library. Sorting according to the frequency of occurrence of the dialog, storing the N most frequently occurring conversations in a local or cloud database in association with the virtual character, where N is an integer greater than zero. Each of the conversations also includes a voice feature, and different voice indications correspond to different voice features.

選擇性地，還包括：識別用戶的語音指示查詢相對應的回覆動作；虛擬角色以該回覆動作進行互動。自包含指定對象的視頻數據中識別至少一個對話，每個對話包括語音指示、回覆語句及回覆動作，將一個或多個所述對話與所述虛擬角色關聯地儲存在所述本地或雲端資料庫中。顯示虛擬角色還包括將虛擬角色顯示於一虛擬場景中。自包含指定對象的視頻數據中識別至少一個對話，每個所述對話包括語音指示、回覆語句及形成虛擬場景的場景特徵，將至少一個對話與虛擬角色關聯地儲存在本地或雲端資料庫中。場景特徵包括時間、地點、天氣中的一項或多項。Optionally, the method further includes: identifying a response action corresponding to the voice indication query of the user; and interacting with the virtual character by the reply action. Identifying at least one conversation from video data containing the specified object, each conversation including a voice indication, a replying statement, and a replying action, storing one or more of the conversations in association with the virtual character in the local or cloud database in. Displaying a virtual character also includes displaying the virtual character in a virtual scene. At least one conversation is identified from the video data containing the specified object, each of the conversations including a voice indication, a reply statement, and a scene feature forming a virtual scene, and the at least one conversation is stored in a local or cloud database in association with the virtual character. Scene features include one or more of time, location, and weather.

選擇性地，虛擬角色的外形特徵和語音特徵經由更新的圖片、音頻、視頻數據而擴增。外形特徵包括性別、年齡、身材比例、服裝樣式、髮型以及五官中的至少一項。語音特徵包括語調、節奏以及口音中的至少一項。Optionally, the physical features and voice features of the virtual character are amplified via updated picture, audio, and video data. Shape features include at least one of gender, age, body proportion, clothing style, hairstyle, and facial features. The speech feature includes at least one of a tone, a rhythm, and an accent.

本發明還提供一種人機互動系統，包括：分析模組，配置成：自包含指定對象的圖片和/或視頻數據中獲取指定對象的至少一個外形特徵；利用外形特徵生成一虛擬角色；自包含所述指定對象的音頻和/或視頻數據中獲取指定對象的一個或多個語音特徵；顯示模組，配置成顯示所述虛擬角色，所述虛擬角色具有所述指定對象的外形特徵；語音處理模組，配置成識別用戶的語音輸入中的語音指示，在一本地或雲端資料庫中查詢該對應該語音指示的回覆語句，並透過虛擬角色以所述指定對象的語音特徵及回覆語句與用戶進行互動。The present invention also provides a human-computer interaction system, comprising: an analysis module configured to: acquire at least one shape feature of a specified object from a picture and/or video data containing the specified object; generate a virtual character by using the shape feature; Acquiring one or more voice features of the specified object in the audio and/or video data of the specified object; the display module is configured to display the virtual character, the virtual character having a shape feature of the specified object; and voice processing The module is configured to identify a voice indication in the voice input of the user, query a reply statement corresponding to the voice indication in a local or cloud database, and use the virtual character to use the voice feature and the reply statement of the specified object with the user Engage.

與現有技術相比，本發明具有如下優勢：Compared with the prior art, the present invention has the following advantages:

(1)透過圖片、音頻、視頻數據中識別指定對象的外形特徵以及語音特徵，以實現對應指定對象的虛擬角色的全息投影和會話回饋。(1) Identifying the shape features and the voice features of the specified object through the picture, audio, and video data to realize holographic projection and session feedback of the virtual character corresponding to the specified object.

(2)除了採用本地或雲端資料庫中通用的會話情況，透過圖片、音頻、視頻數據還可以識別當前虛擬角色對於語音指示的回覆語句、回覆動作及相關的虛擬場景，使得虛擬角色更貼近指定對象。(2) In addition to adopting the common session situation in the local or cloud database, the reply, the reply action and the related virtual scene of the current virtual character for the voice indication can be identified through the picture, audio and video data, so that the virtual character is closer to the specified Object.

(3)透過多次輸入的圖片、音頻、視頻數據或文字數據，可對虛擬對象的外形特徵及語音特徵進行更新和完善。(3) Through the multiple input of pictures, audio, video data or text data, the physical features and voice features of the virtual object can be updated and improved.

為了對本發明之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下：In order to better understand the above and other aspects of the present invention, the following detailed description of the embodiments and the accompanying drawings

現在將參考附圖更全面地描述實施例實施方式。然而，實施例實施方式能夠以多種形式實施，且不應被理解為限於在此闡述的實施方式；在圖中相同的附圖標記表示相同或類似的結構，因而將省略對它們的重複描述。Embodiment embodiments will now be described more fully with reference to the accompanying drawings. However, the embodiment of the present invention can be implemented in various forms, and should not be construed as being limited to the embodiments set forth herein; the same reference numerals are used for the same or similar structures in the drawings, and the repeated description thereof will be omitted.

下面結合附圖描述本發明提供的多個實施例。The various embodiments provided by the present invention are described below in conjunction with the drawings.

下面首先參見第1圖，第1圖顯示了根據本發明實施例的人機互動方法的流程圖，包含下列步驟：Referring first to FIG. 1, FIG. 1 is a flow chart showing a human-computer interaction method according to an embodiment of the present invention, which includes the following steps:

步驟S101：自包含指定對象的圖片和/或視頻數據中獲取所述指定對象的一個或多個外形特徵。Step S101: Acquire one or more shape features of the specified object from the picture and/or video data containing the specified object.

具體而言，外形特徵可以包括性別、年齡、身材比例、服裝樣式、髮型以及五官中的一項或多項。無法從圖片和/或視頻數據中識別出的特徵，可透過用戶手動輸入或者向用戶提供模板以供選擇。Specifically, the shape feature may include one or more of gender, age, body proportion, clothing style, hairstyle, and facial features. Features that cannot be identified from the image and/or video data can be manually selected by the user or provided to the user for selection.

在一些變化例中，可以供一模板庫，在包含指定對象的圖片和/或視頻數據中識別外形特徵的參數，按照參數與模板庫進行匹配，將相似度最高的模板作為最終決定的外形特徵。例如，對於眼睛，可從圖片和/或視頻數據中識別指定對象的眼睛的參數，包括：寬度、高度、寬度與臉長的比例、高度與臉長的比例、內眼瞼和外眼瞼的高度差等。依據這些參數可在模板庫中匹配到相應的眼型，例如桃花眼、瑞鳳眼、睡鳳眼、柳葉眼、杏眼、狐狸眼、銅鈴眼、龍眼、丹鳳眼和小鹿眼等等。利用模板庫中該眼型的數據作為外形特徵。In some variations, a template library may be provided to identify parameters of the shape features in the image and/or video data containing the specified object, match the template library according to the parameters, and use the template with the highest similarity as the final determined shape feature. . For example, for an eye, the parameters of the eye of the specified object can be identified from the image and/or video data, including: width, height, ratio of width to face length, ratio of height to face length, height difference between inner eyelid and outer eyelid. Wait. According to these parameters, the corresponding eye types can be matched in the template library, such as peach eye, Ruifeng eye, sleeping phoenix eye, willow eye, apricot eye, fox eye, copper bell eye, longan, Danfeng eye and deer eye. The data of the eye shape in the template library is used as the shape feature.

步驟S102：利用外形特徵進行3D人物建模以生成一虛擬角色。Step S102: Perform 3D character modeling using the shape feature to generate a virtual character.

具體而言，利用上述識別的、輸入的、選擇的外形特徵，於3D建模軟體中進行人物建模，以生成一虛擬角色。該虛擬角色具有指定對象的外形特徵。Specifically, character modeling is performed in the 3D modeling software to generate a virtual character using the identified, input, and selected shape features described above. This virtual character has the appearance characteristics of the specified object.

步驟S103：自包含指定對象的音頻和/或視頻數據中獲取指定對象的一個或多個語音特徵。Step S103: Acquire one or more voice features of the specified object from the audio and/or video data containing the specified object.

具體而言，語音特徵包括語調、節奏以及口音中的一項或多項。進一步地，可透過包含指定對象的音頻和/或視頻數據識別指定對象的語音波形圖。依據波形圖中的頻率、強度、振幅等信息獲得語調、節奏以及口音等語音特徵。具體而言，語調可依據聲波圖中的頻率來確定。節奏可透過超過某一設定振幅與下一次超過某一設定振幅的時間差來確定。口音可透過不同口音的模板匹配來確定。In particular, the speech features include one or more of intonation, rhythm, and accent. Further, the voice waveform of the specified object can be identified by audio and/or video data containing the specified object. According to the frequency, intensity, amplitude and other information in the waveform diagram, speech characteristics such as intonation, rhythm and accent are obtained. In particular, the intonation can be determined based on the frequency in the sonogram. The rhythm can be determined by the time difference between a certain set amplitude and the next time a certain set amplitude is exceeded. The accent can be determined by template matching of different accents.

進一步地，虛擬角色的外形特徵和語音特徵經由更新的圖片、音頻、視頻數據而增加或更新。Further, the physical features and voice features of the virtual character are added or updated via updated picture, audio, and video data.

步驟S104：全息投影虛擬角色，虛擬角色具有對應指定對象的外形特徵。Step S104: The holographic projection virtual character has a shape feature corresponding to the specified object.

步驟S105：識別用戶的語音指示，在一本地或雲端資料庫中查詢該對應該語音指示的回覆語句。Step S105: Identify the voice indication of the user, and query the reply statement corresponding to the voice indication in a local or cloud database.

在一些實施例中，本地或雲端資料庫中儲存有通用的回覆語句。在一些變化例中，可自步驟S101和步驟S103中的音頻和/或視頻數據識別一個或多個對話，每個對話包括語音指示及回覆語句，將一個或多個對話與虛擬角色關聯地儲存在本地或雲端資料庫中。較佳地，按對話的出現頻率進行排序，將出現頻率最高的N個對話與所述虛擬角色關聯地儲存在本地或雲端資料庫中，N為大於0的整數。由此，可直接透過構建虛擬角色的數據來儲存與該虛擬角色互動時可能出現的會話，這樣可以使得虛擬角色更貼近指定對象。In some embodiments, a generic reply statement is stored in the local or cloud repository. In some variations, one or more conversations may be identified from the audio and/or video data in steps S101 and S103, each conversation including a voice indication and a reply statement, storing one or more conversations in association with the virtual character In a local or cloud repository. Preferably, the rankings are performed according to the frequency of occurrence of the conversation, and the N conversations with the highest frequency of occurrence are stored in the local or cloud database in association with the virtual character, and N is an integer greater than 0. Thus, the session that may occur when interacting with the virtual character can be stored directly by constructing the data of the virtual character, which can make the virtual character closer to the specified object.

步驟S106：所述虛擬角色以指定對象的語音特徵回饋所述回覆語句以與用戶進行互動。Step S106: The virtual character returns the reply sentence with the voice feature of the specified object to interact with the user.

在一個變化例中，可自步驟S101和步驟S103中的音頻和/或視頻數據識別的對話還可以包括語音特徵，不同的語音指示對應不同的語音特徵。例如，某些語音指示對應的語音特徵節奏較快，語調較高，虛擬人物應以該較快的節奏和較高的語調來播放回覆語句，以表示愉快的情緒；某些語音指示對應的語音特徵節奏較慢，語調較低，虛擬人物應以該較慢的節奏和較低的語調來播放回覆語句，以表示低落的情緒。In one variation, the dialogs that may be identified from the audio and/or video data in steps S101 and S103 may also include speech features, with different speech indications corresponding to different speech features. For example, some voice indications correspond to a faster speech tempo and a higher pitch, and the avatar should play a reply sentence with the faster tempo and higher tone to indicate a happy mood; some voices indicate corresponding voices. The feature rhythm is slower and the tone is lower. The avatar should play the reply statement with the slower rhythm and lower tone to indicate the low mood.

在另一個變化例中，虛擬對象在回饋回覆語句的同時還可進行回覆動作。回覆動作可以包括虛擬對象的動作、表情等。具體而言，當識別用戶的語音指示，還需在一本地或雲端資料庫中查詢該對應該語音指示的回覆語句及回覆動作。虛擬角色以指定對象的外形特徵回饋回覆動作，並以指定對象的語音特徵回饋回覆語句以與用戶進行互動。可自步驟S101和步驟S103中的音頻和/或視頻數據識別一個或多個對話，每個對話包括語音指示、回覆語句及回覆動作，將一個或多個對話與虛擬角色關聯地儲存在本地或雲端資料庫中，以使虛擬對象在播放回覆語句時更為生動。In another variation, the virtual object can also perform a reply action while feeding back the reply statement. The reply action may include an action, an expression, and the like of the virtual object. Specifically, when the voice indication of the user is identified, the reply statement and the reply action corresponding to the voice indication need to be queried in a local or cloud database. The virtual character feeds back the reply action with the shape feature of the specified object, and returns a reply statement with the voice feature of the specified object to interact with the user. One or more conversations may be identified from the audio and/or video data in steps S101 and S103, each conversation including a voice indication, a reply statement, and a reply action, storing one or more conversations in association with the virtual character locally or In the cloud database, the virtual object is more vivid when playing the reply statement.

在另一個變化例中，在全息投影虛擬角色時還可使虛擬角色位於一虛擬場景中。該虛擬場景的場景特徵可以包括時間、地點、天氣中的一項或多項，且場景特徵可由用戶指定，也可自動生成或變換。In another variation, the avatar may also be placed in a virtual scene when the holographic projection virtual character. The scene features of the virtual scene may include one or more of time, place, weather, and the scene features may be specified by the user, or may be automatically generated or transformed.

具體而言，當識別用戶的語音指示，還可在一本地或雲端資料庫中查詢對應該語音指示的回覆語句及虛擬場景。虛擬角色位於虛擬場景中並以指定對象的語音特徵回饋回覆語句以與用戶進行互動。與上述實施例類似，本變化例可自包含指定對象的視頻數據中識別一個或多個對話，每個對話包括語音指示、回覆語句及形成虛擬場景的場景特徵，將一個或多個對話與虛擬角色關聯地儲存在本地或雲端資料庫中。進一步地，虛擬場景的場景特徵可自包含指定對象的視頻數據中識別的各語句中出現與時間、地點、天氣相關的詞，則將該詞作為虛擬場景的場景特徵。虛擬場景的場景特徵也可自包含指定對象的視頻數據中識別光線方向、明暗或者環境對象（例如，建築、傢俱、道路等可以判斷大致地點的環境對象）來確定。在另一些變化例中，用戶還可輸入用戶與該指定對象的關係，例如，朋友、家人等。Specifically, when the voice indication of the user is recognized, the reply statement corresponding to the voice indication and the virtual scene may also be queried in a local or cloud database. The virtual character is located in the virtual scene and responds to the user with the voice features of the specified object. Similar to the above embodiment, the present variation may identify one or more conversations from video data containing a specified object, each conversation including a voice indication, a reply statement, and a scene feature forming a virtual scene, one or more conversations and virtual The roles are stored in association with the local or cloud repository. Further, the scene feature of the virtual scene may be a scene feature related to time, place, and weather in each sentence recognized in the video data of the specified object, and the word is used as a scene feature of the virtual scene. The scene features of the virtual scene may also be determined from the video data containing the specified object to identify the direction of the light, the shading, or the environmental object (eg, an architectural, furniture, road, etc. environment object that can determine the approximate location). In other variations, the user may also enter a relationship of the user to the specified object, such as friends, family, and the like.

在上述實施例的又一個變化例中，將上述步驟應用在一遊戲的應用場景中。具體而言，用戶首先打開第一設備（例如電腦），並在電腦上運行第一應用程式，該第一應用程式可以是一遊戲程式。該遊戲程式可具有多個角色，用戶可在該多個角色中選取一個或多個角色進行控制。在對角色的控制中，該角色的運行參數會發生變化，例如，角色在運行中獲取的金幣數量、裝備的武器、由於裝備武器進而改變的各項屬性值等等。當用戶在第一設備上運行該第一應用程式時，於第二設備上開啟第二應用程式，該第二應用程式與該第一應用程式關聯，並即時獲取該第一應用程式的數據。具體而言，當用戶打開第二應用程式時，第二應用程式可獲取用戶當前在第一應用程式中所選擇的角色，將該角色作為指定對象。第二應用程式可自第一應用程式獲得該指定對象的音視頻數據以生成虛擬角色，並在第二設備上顯示。用戶可向虛擬角色發出語音指示，語音指示例如可以是詢問指定對象當前獲得的金幣數量、詢問指定對象當前在隊伍中金幣數量的排名、詢問指定對象當前攻擊力、詢問指定對象敵對隊伍某一角色的攻擊力等等。這些語音指示的回覆語句可即時抓取第一應用程式中指定對象的運行參數，以向用戶回覆上述信息。由此，用戶可在遊戲過程中實現即時對話以獲取即時數據，並且該即時數據無需用戶在第一應用程式中進行額外的操作，以妨礙用戶在第一應用程式中對角色的控制。以上僅僅是本發明的一個具體應用場景，本領域技術人員可以實現更多的變化例，本發明並非以此為限。In still another variation of the above embodiment, the above steps are applied in an application scenario of a game. Specifically, the user first opens a first device (such as a computer) and runs a first application on the computer, and the first application can be a game program. The game program can have multiple roles, and the user can select one or more characters among the multiple characters for control. In the control of the character, the running parameters of the character will change, for example, the number of gold coins acquired by the character during the operation, the weapon of the equipment, the value of each attribute changed by the weapon, and the like. When the user runs the first application on the first device, the second application is opened on the second device, and the second application is associated with the first application, and the data of the first application is immediately acquired. Specifically, when the user opens the second application, the second application can acquire the role selected by the user in the first application, and the role is the designated object. The second application can obtain the audio and video data of the specified object from the first application to generate a virtual character and display it on the second device. The user can issue a voice indication to the virtual character. The voice indication can be, for example, asking the specified object for the amount of gold coins currently obtained, asking the specified object for the current ranking of the number of gold coins in the team, asking the specified target for the current attack power, and asking the designated target hostile team for a certain role. Attack power and so on. The reply statements of the voice instructions can immediately capture the running parameters of the specified object in the first application to reply to the user. Thus, the user can implement an instant conversation during the game to obtain real-time data, and the instant data does not require the user to perform additional operations in the first application to hinder the user's control of the character in the first application. The above is only one specific application scenario of the present invention, and those skilled in the art can implement more variations, and the present invention is not limited thereto.

下面參見第2圖，第2圖顯示了根據本發明實施例的建立或更新虛擬角色的流程圖。用戶開始進行虛擬角色的設置(步驟S201)。新建一虛擬角色(步驟S202)。獲取用戶輸入的虛擬角色的名稱。可以該虛擬角色的名稱來命名後續儲存的與該虛擬角色相關的特徵數據的文件(步驟S204)。可透過話筒、攝影機等輸入裝置採集指定對象的圖片、音視頻數據，或者可透過檔案傳輸來獲取包含指定對象的圖片、音視頻數據(步驟S205)。依據包含指定對象的圖片、音視頻數據分析獲得指定對象的外形特徵和語音特徵(步驟S206)。依據這些外形特徵和語音特徵構建一虛擬角色，使虛擬角色具有指定對象的外形特徵和語音特徵(步驟S207)。將分析獲得的指定對象的外形特徵和語音特徵與虛擬角色（例如虛擬角色的名稱）關聯地儲存在本地或雲端資料庫中(步驟S208)。Referring now to Figure 2, a second diagram shows a flow diagram for establishing or updating a virtual character in accordance with an embodiment of the present invention. The user starts the setting of the virtual character (step S201). A new virtual character is created (step S202). Get the name of the virtual character entered by the user. The file of the feature data related to the virtual character that is subsequently stored may be named by the name of the virtual character (step S204). The picture, audio and video data of the specified object may be collected through an input device such as a microphone or a camera, or the picture, audio and video data including the specified object may be acquired through file transmission (step S205). The outline feature and the voice feature of the specified object are obtained based on the picture, audio and video data analysis including the specified object (step S206). A virtual character is constructed based on the shape features and the phonetic features, so that the virtual character has the appearance feature and the voice feature of the specified object (step S207). The physical features and voice features of the specified object obtained by the analysis are stored in a local or cloud database in association with the virtual character (for example, the name of the virtual character) (step S208).

此外，在步驟S201之後，還可以執行步驟S203，修改虛擬角色。步驟S203之後，執行步驟S204至步驟S208以更新或增加已有虛擬角色的外形特徵和語音特徵。In addition, after step S201, step S203 may also be performed to modify the virtual character. After step S203, steps S204 through S208 are performed to update or increase the shape features and voice features of the existing virtual character.

第3圖顯示了根據本發明實施例的投影虛擬角色的流程圖。首先，顯示虛擬角色列表(步驟S301)。可以理解，該虛擬角色列表中的角色可以是在本地創建的，或其他用戶創建後上傳至雲端資料庫的。用戶在該虛擬角色的列表中選擇一虛擬角色，獲取用戶的選擇(步驟S302)。判斷本地資料庫中是否存在該虛擬角色(步驟S303)。若有，則執行步驟S304，在本地資料庫中獲取該虛擬角色的數據。若沒有，則執行步驟S305，在雲端資料庫中獲取該虛擬角色的數據。之後，執行步驟S306，利用全息投影，投影該虛擬角色。並在步驟S307中開始對話。Figure 3 shows a flow diagram of a projected virtual character in accordance with an embodiment of the present invention. First, a virtual character list is displayed (step S301). It can be understood that the roles in the virtual character list can be created locally or uploaded to the cloud database after other users create. The user selects a virtual character in the list of the virtual characters to acquire the user's selection (step S302). It is judged whether or not the virtual character exists in the local database (step S303). If yes, step S304 is executed to obtain data of the virtual character in the local database. If not, step S305 is executed to acquire data of the virtual character in the cloud database. After that, step S306 is executed to project the virtual character by using the holographic projection. And the conversation is started in step S307.

下面參見第4圖，第4圖顯示了根據本發明實施例的與虛擬角色互動的流程圖。首先是步驟S401，接收用戶的語音信息。步驟S402，自該語音信息中識別語音指示。語音指示可以是用戶所說的一句話。之後步驟S403，判斷本地資料庫中是否存在該語音指示的回覆語句。若存在，則執行步驟S404，自本地資料庫中獲取該回覆語句。若不存在，則執行步驟S405，在雲端資料庫中搜索語音指示，以獲取回覆語句。具體而言，上述語音指示的查詢，可以透過先查詢語音指示中的詞語，如果有多個查詢結果，再從結果中和語音指示的完整句子比較，選擇最接近的。如果都沒有查詢到則可將「對不起，我沒聽懂」或者「對不起，我不知道該如何回答」作為答覆語句。之後執行步驟S406，使虛擬角色根據語音特徵播放回覆語句以與用戶完成對話和互動。Referring now to Figure 4, a fourth diagram shows a flow diagram for interacting with a virtual character in accordance with an embodiment of the present invention. First, in step S401, the user's voice information is received. Step S402, identifying a voice indication from the voice information. The voice indication can be a sentence spoken by the user. Then, in step S403, it is determined whether a reply statement of the voice indication exists in the local database. If yes, step S404 is executed to obtain the reply statement from the local database. If not, step S405 is performed to search for a voice indication in the cloud database to obtain a reply statement. Specifically, the query of the voice indication may first query the words in the voice indication, and if there are multiple query results, compare the complete sentences with the voice indication from the result, and select the closest one. If you don't find it, you can say "I'm sorry, I didn't understand" or "I'm sorry, I don't know how to answer" as a reply. Then step S406 is executed to enable the virtual character to play a reply statement according to the voice feature to complete the dialogue and interaction with the user.

下面參見第5圖，第5圖顯示了根據本發明實施例的人機互動系統的示意圖。人機互動系統500包括分析模組501、顯示模組503及語音處理模組505。Referring next to Figure 5, Figure 5 shows a schematic diagram of a human-machine interaction system in accordance with an embodiment of the present invention. The human-machine interaction system 500 includes an analysis module 501, a display module 503, and a voice processing module 505.

分析模組501配置成自包含指定對象的圖片和/或視頻數據中獲取所述指定對象的一個或多個外形特徵。分析模組501還配置成利用所述外形特徵生成一虛擬角色。分析模組501還配置成自包含所述指定對象的音頻和/或視頻數據中獲取指定對象的一個或多個語音特徵。顯示模組503配置成顯示所述虛擬角色，所述虛擬角色具有所述指定對象的外形特徵。語音處理模組505配置成識別用戶的語音輸入中的語音指示，在一本地或雲端資料庫中查詢該對應該語音指示的回覆語句，並使所述虛擬角色以所述指定對象的語音特徵回饋所述回覆語句以與所述用戶進行互動。The analysis module 501 is configured to acquire one or more contour features of the specified object from pictures and/or video data containing the specified object. The analysis module 501 is also configured to generate a virtual character using the shape features. The analysis module 501 is further configured to acquire one or more voice features of the specified object from audio and/or video data comprising the specified object. The display module 503 is configured to display the virtual character, the virtual character having a shape feature of the specified object. The voice processing module 505 is configured to identify a voice indication in the voice input of the user, query a reply statement corresponding to the voice indication in a local or cloud database, and enable the virtual character to feed back the voice feature of the specified object. The reply statement interacts with the user.

人機互動系統500還可以包括移動模組502、網路通訊模組504、感測器模組506及本地儲存模組507中的一個或多個模組。移動模組502可控制人機互動系統500進行移動。網路通訊模組504控制人機互動系統500與雲端資料庫相通訊。感測器模組506可以包括距離感測器、溫度感測器、攝影鏡頭等，以增加人機互動系統500的其他功能。本地儲存模組507可作為本地資料庫儲存虛擬角色的信息和會話式數據。The human-machine interaction system 500 can also include one or more modules of the mobile module 502, the network communication module 504, the sensor module 506, and the local storage module 507. The mobile module 502 can control the human-machine interaction system 500 to move. The network communication module 504 controls the human-machine interaction system 500 to communicate with the cloud database. The sensor module 506 can include a distance sensor, a temperature sensor, a photographic lens, etc. to add other functions of the human-machine interaction system 500. The local storage module 507 can store the information of the virtual character and the conversational data as a local database.

本發明提供的人機互動系統500可作為具有聊天功能的陪伴機器人，本提案所述之聊天機器人可以透過顯示模組503將虛擬人物顯示出來，例如透過投影的方式將虛擬人物投射成平面或立體圖像，用戶在使用陪伴機器人前，可以設定對象人物的外形、語音等特徵，達到不同用戶，在不同時間都有不同的聊天互動體驗。這樣的情感陪伴機器人應用相當廣泛，可以代替子女陪伴獨居老人，甚至可以使已經離世的親人或朋友音容再現，本發明的應用場景並非以此為限。The human-machine interaction system 500 provided by the present invention can be used as a companion robot with a chat function. The chat robot described in the present proposal can display a virtual character through the display module 503, for example, projecting a virtual character into a plane or a stereo by means of projection. Image, before using the companion robot, the user can set the shape, voice and other characteristics of the target person to reach different users and have different chat interaction experiences at different times. Such an emotional companion robot is widely used, and can replace a child to accompany an elderly person living alone, and can even reproduce the sound of a deceased relative or friend. The application scenario of the present invention is not limited thereto.

(3)透過多次輸入的圖片、音頻、視頻數據或文字數據可對虛擬對象的外形特徵及語音特徵進行更新和完善。(3) The shape and voice features of the virtual object can be updated and improved through multiple input of pictures, audio, video data or text data.

綜上所述，雖然本發明已揭露實施例如上，然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作各種之更動與潤飾。因此，本發明之保護範圍當視後附之申請專利範圍所界定者為準。In summary, although the invention has been disclosed, it is not intended to limit the invention. A person skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims.

S101～S106、S201～S208、S301～S307、S401～S406‧‧‧流程步驟S101-S106, S201-S208, S301-S307, S401-S406‧‧‧ process steps

500‧‧‧人機互動系統500‧‧‧Human Machine Interactive System

501‧‧‧分析模組501‧‧‧Analysis module

502‧‧‧移動模組502‧‧‧Mobile Module

503‧‧‧顯示模組503‧‧‧Display module

504‧‧‧網路通訊模組504‧‧‧Network communication module

505‧‧‧語音處理模組505‧‧‧Voice Processing Module

506‧‧‧感測器模組506‧‧‧Sensor module

507‧‧‧本地儲存模組507‧‧‧Local Storage Module

透過參照附圖詳細描述其示例實施方式，本發明的上述和其它特徵及優點將變得更加明顯。第1圖顯示了根據本發明實施例的人機互動方法的流程圖。第2圖顯示了根據本發明實施例的建立或更新虛擬角色的流程圖。第3圖顯示了根據本發明實施例的投影虛擬角色的流程圖。第4圖顯示了根據本發明實施例的與虛擬角色互動的流程圖。第5圖顯示了根據本發明實施例的人機互動系統的示意圖。The above and other features and advantages of the present invention will become more apparent from the detailed description of the exemplary embodiments. Figure 1 shows a flow chart of a human-computer interaction method in accordance with an embodiment of the present invention. Figure 2 shows a flow diagram for establishing or updating a virtual character in accordance with an embodiment of the present invention. Figure 3 shows a flow diagram of a projected virtual character in accordance with an embodiment of the present invention. Figure 4 shows a flow diagram for interacting with a virtual character in accordance with an embodiment of the present invention. Figure 5 shows a schematic diagram of a human-machine interaction system in accordance with an embodiment of the present invention.

Claims

一種人機互動方法，包括：自包含指定對象的圖片和/或視頻數據中獲取所述指定對象的一個或多個外形特徵；利用所述外形特徵生成一虛擬角色；自包含所述指定對象的音頻和/或所述視頻數據中獲取所述指定對象的一個或多個語音特徵；識別用戶的語音指示，在一本地或雲端資料庫中查詢對應該語音指示的回覆語句；以及顯示所述虛擬角色並以該回覆語句與所述用戶進行互動，其中所述虛擬角色具有所述指定對象的至少一個所述外形特徵及至少一個所述語音特徵。A human-computer interaction method, comprising: acquiring one or more shape features of the specified object from a picture and/or video data containing a specified object; generating a virtual character by using the shape feature; self-containing the specified object Obtaining one or more voice features of the specified object in the audio and/or the video data; identifying a voice indication of the user, querying a reply statement corresponding to the voice indication in a local or cloud database; and displaying the virtual The character interacts with the user in the reply statement, wherein the virtual character has at least one of the contour features of the specified object and at least one of the voice features.

如申請專利範圍第1項所述的人機互動方法，更包含：自所述指定對象的所述音頻和/或所述視頻數據中識別一個或多個對話，各所述對話包括所述語音指示及所述回覆語句，將出現頻率最高的N個對話與所述虛擬角色關聯地儲存在所述本地或雲端資料庫中，N為大於0的整數。The human-computer interaction method of claim 1, further comprising: identifying one or more conversations from the audio and/or the video data of the specified object, each of the conversations including the voice And the reply statement, wherein the N most frequently occurring conversations are stored in the local or cloud database in association with the virtual character, and N is an integer greater than 0.

如申請專利範圍第2項所述的人機互動方法，其中各所述對話更包括所述語音特徵，用以對應不同的語音指示。The human-computer interaction method of claim 2, wherein each of the conversations further includes the voice feature to correspond to different voice indications.

如申請專利範圍第1項所述的人機互動方法，更包括：識別所述用戶的所述語音指示，在所述本地或雲端資料庫中查詢對應該語音指示的所述回覆語句及回覆動作；以及所述虛擬角色以所述指定對象的所述外形特徵回饋所述回覆動作，並以所述指定對象的所述語音特徵回饋所述回覆語句以與所述用戶進行互動。The human-computer interaction method of claim 1, further comprising: identifying the voice indication of the user, querying, in the local or cloud database, the reply statement and the reply action corresponding to the voice indication And the virtual character feeds back the reply action with the shape feature of the specified object, and feeds back the reply statement with the voice feature of the specified object to interact with the user.

如申請專利範圍第1項所述的人機互動方法，其中顯示所述虛擬角色包括：顯示所述虛擬角色，使所述虛擬角色位於虛擬場景中；識別用戶的語音指示，在所述本地或雲端資料庫中查詢該對應該語音指示的回覆語句及所述虛擬場景；以及所述虛擬角色位於所述虛擬場景中並以所述指定對象的語音特徵回饋所述回覆語句以與所述用戶進行互動。The human-computer interaction method of claim 1, wherein displaying the virtual character comprises: displaying the virtual character, causing the virtual character to be located in a virtual scene; identifying a voice indication of the user, in the local or Querying, in the cloud database, the reply statement corresponding to the voice indication and the virtual scene; and the virtual character is located in the virtual scene and feeding back the reply statement with the voice feature of the specified object to perform with the user interactive.

如申請專利範圍第5項所述的人機互動方法，更包括：自所述指定對象的視頻數據中識別一個或多個對話，每個所述對話包括所述語音指示、所述回覆語句及形成虛擬場景的場景特徵，將一個或多個所述對話與所述虛擬角色關聯地儲存在所述本地或雲端資料庫中。The human-computer interaction method of claim 5, further comprising: identifying one or more conversations from the video data of the specified object, each of the conversations including the voice indication, the reply statement, and Forming a scene feature of the virtual scene, storing one or more of the conversations in the local or cloud database in association with the virtual character.

如申請專利範圍第6項所述的人機互動方法，其中所述場景特徵包括時間、地點、天氣中的一項或多項。The human-computer interaction method of claim 6, wherein the scene feature comprises one or more of time, place, and weather.

如申請專利範圍第1至7項中任一項所述的人機互動方法，其中所述虛擬角色的外形特徵和語音特徵經由更新的圖片、音頻、視頻數據而增加或更新。The human-computer interaction method according to any one of claims 1 to 7, wherein the physical character and the voice feature of the virtual character are added or updated via updated picture, audio, and video data.

如申請專利範圍第1至7項中任一項所述的人機互動方法，其中所述語音特徵包括語調、節奏以及口音中的一項或多項。The human-computer interaction method according to any one of claims 1 to 7, wherein the speech feature comprises one or more of intonation, rhythm and accent.

一種人機互動系統，包括：一分析模組，配置成：自包含指定對象的圖片和/或視頻數據中獲取所述指定對象的至少一個外形特徵；利用所述外形特徵生成一虛擬角色；以及自包含所述指定對象的音頻和/或所述視頻數據中獲取所述指定對象的一個或多個語音特徵；一顯示模組，配置成顯示所述虛擬角色，所述虛擬角色具有所述指定對象的外形特徵；以及一語音處理模組，配置成識別用戶的語音輸入中的語音指示，在一本地或雲端資料庫中查詢對應該語音指示的回覆語句，並使所述虛擬角色以所述指定對象的所述語音特徵回饋所述回覆語句以與所述用戶進行互動。A human-computer interaction system, comprising: an analysis module configured to: acquire at least one shape feature of the specified object from a picture and/or video data containing a specified object; generate a virtual character by using the shape feature; Obtaining one or more voice features of the specified object from audio and/or the video data including the specified object; a display module configured to display the virtual character, the virtual character having the specified a shape feature of the object; and a voice processing module configured to identify a voice indication in the voice input of the user, query a reply statement corresponding to the voice indication in a local or cloud database, and make the virtual character The speech feature of the specified object is fed back to the reply statement to interact with the user.