WO2020024692A1 - 一种人机交互方法和装置 - Google Patents

一种人机交互方法和装置 Download PDF

Info

Publication number
WO2020024692A1
WO2020024692A1 PCT/CN2019/089209 CN2019089209W WO2020024692A1 WO 2020024692 A1 WO2020024692 A1 WO 2020024692A1 CN 2019089209 W CN2019089209 W CN 2019089209W WO 2020024692 A1 WO2020024692 A1 WO 2020024692A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
action instruction
action
terminal device
sender
Prior art date
Application number
PCT/CN2019/089209
Other languages
English (en)
French (fr)
Inventor
荣涛
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020024692A1 publication Critical patent/WO2020024692A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment

Definitions

  • This specification relates to the field of computer technology, and in particular, to a method and device for human-computer interaction.
  • Augmented reality (AR) technology is to increase the user's perception of the real world through the information provided by the computer system. It applies virtual information to the real world and superimposes computer-generated virtual objects, scenes, or system prompts to the real scene. In order to achieve the enhancement of reality and achieve a sensory experience beyond reality.
  • AR Augmented reality
  • VR virtual reality
  • Users can play games, activities or perform certain specific operations in this virtual reality world. The whole process is as if it were real. It is general in the world, providing users with a full range of simulation experiences such as sight, hearing, and touch.
  • MR Mixed reality
  • augmented reality refers to a new visual environment created by combining real and virtual worlds.
  • physical and virtual objects ie digital objects
  • AR, VR, and MR technologies are still in the development stage, and the human-computer interaction technologies related to the above technologies are not yet mature, so it is necessary to provide a human-computer interaction solution.
  • the embodiments of the present specification provide a human-machine interaction method and device, which are used to implement human-machine interaction.
  • a human-machine interaction method including: acquiring an image for instructing a terminal device to perform an action; determining a matching action instruction based on an image characteristic of the image; Action instructions match the operation.
  • a human-computer interaction method which is applied to a receiver and includes: receiving an action instruction from a sender; and responding to the action instruction, displaying an effect corresponding to the action instruction, and the effect
  • the effect corresponding to the action instruction includes at least one of the following: a processing effect on the sender's avatar of the terminal device and / or a processing effect on the receiver's avatar; and a processing effect on the color of the message frame that communicates with the sender. ; Screen vibration is reversed; or video or animation playback.
  • a human-machine interaction device including: an image acquisition module that acquires an image for instructing a terminal device to perform an action; an action instruction determination module that determines a matching action instruction based on image characteristics of the image; an execution module In response to the action instruction, an operation matching the action instruction is performed.
  • a human-machine interaction device including: a receiving module that receives an action instruction from a sender; and an effect display module that displays an effect corresponding to the action instruction in response to the action instruction.
  • the effect corresponding to the action instruction includes at least one of the following: a processing effect on the sender's avatar of the terminal device and / or a processing effect on the receiver's avatar; and the color of the frame of the message communicating with the sender. Processing effects; screen vibration inversion; or video or animation playback.
  • an electronic device including: a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the computer program is implemented as follows when executed by the processor: Operation: Acquire an image for instructing the terminal device to perform an action; determine a matching action instruction based on the image characteristics of the image; and perform an operation that matches the action instruction in response to the action instruction.
  • an electronic device including: a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the computer program is implemented as follows when executed by the processor: Operation: receiving an action instruction from a sender; in response to the action instruction, displaying an effect corresponding to the action instruction, the effect corresponding to the action instruction including at least one of the following: sending to a terminal device The processing effect of the party avatar and / or the processing effect of the receiving party avatar of the terminal device; the processing effect of the color of the message frame communicating with the sender; the screen vibration is reversed; or the video or animation is played.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following operation is performed: acquiring an image for instructing a terminal device to perform an action. Determining a matching action instruction based on the image characteristics of the image; and in response to the action instruction, performing an operation that matches the action instruction.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following operations are performed: receiving an action instruction from a sender; and responding
  • the action instruction displays an effect corresponding to the action instruction, and the effect corresponding to the action instruction includes at least one of the following: a processing effect on a sender's avatar of the terminal device and / or a terminal device
  • the processing effect of the receiver's avatar the processing effect of the color of the message border communicating with the sender; the screen vibration is reversed; or the video or animation is played.
  • the at least one technical solution adopted in the embodiment of the present specification can achieve the following beneficial effects: determining a matching action instruction based on the image characteristics of the acquired image, and performing an operation matching the action instruction in response to the action instruction, thereby achieving Human-computer interaction based on acquired images.
  • FIG. 1 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present specification
  • FIG. 2 is a schematic flowchart of a human-computer interaction method according to another embodiment of the present specification.
  • FIG. 3 is a schematic diagram of a display interface in the embodiment shown in FIG. 2;
  • FIG. 4 is a schematic flowchart of a human-computer interaction method according to another embodiment of the present specification.
  • FIG. 5 is a schematic diagram of a display interface in the embodiment shown in FIG. 4;
  • FIG. 6 is a schematic flowchart of a human-computer interaction method according to another embodiment of the present specification.
  • FIG. 7 is a schematic diagram of a display interface in the embodiment shown in FIG. 6;
  • FIG. 8 is a schematic diagram of an initial interface of a human-computer interaction method according to an embodiment of the present specification.
  • FIG. 9 is another schematic diagram of an initial interface of a human-computer interaction method according to an embodiment of the present specification.
  • FIG. 10 is a schematic flowchart of a human-computer interaction method provided by a next embodiment of the present specification.
  • FIG. 11 is a schematic diagram of a display interface in the embodiment shown in FIG. 10;
  • FIG. 12 is a schematic structural diagram of a human-computer interaction device according to an embodiment of the present specification.
  • FIG. 13 is a schematic structural diagram of a human-computer interaction device according to another embodiment of the present specification.
  • FIG. 14 is a schematic diagram of effects that can be achieved by various embodiments of this specification.
  • FIG. 15 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present specification.
  • an embodiment of the present specification provides a human-computer interaction method 100 including the following steps:
  • S102 Acquire an image used to instruct the terminal device to perform an action.
  • the images used to instruct the terminal device to perform actions obtained in the embodiments of the present specification may be gesture images, face images, human body images of the entire body of the user, or partial images of the user's body, etc., and are not specifically limited in this specification.
  • the image acquired in the embodiment of the present specification may be a single image or a multi-frame image in a captured video stream.
  • the acquired image in this step may be an image of a single user or an image of multiple users.
  • This step may be acquiring images from multiple images stored in advance, or acquiring images in real time. If the above image can be stored in advance, in this way, step S102 can obtain an image from the stored multiple images, for example, obtain an image selected by the user. In addition, if the above-mentioned images are still acquired in real time, in this way, step S102 may acquire images in real time based on the image sensor of the terminal device.
  • S104 Determine a matching action instruction based on the image characteristics of the image.
  • the image feature in this step corresponds to the acquired image, and may specifically be extracted from the acquired image. For example, if a gesture image is acquired, the image feature there may be a gesture characteristic; the acquired If the image is a human face image, the image feature at this place may be a human face feature; if the acquired image is a human body image, the image feature at this place may be a pose or action feature of the human body, and so on.
  • a mapping relationship table between image features and motion instructions may be established in advance.
  • a matching motion instruction may be directly determined by looking up the table.
  • the same image feature can also correspond to different action instructions. Therefore, before this embodiment is executed, the mapping relationship between image features and action instructions can also be established in different scenarios. Table, this embodiment may be executed in a determined scenario. For example, this embodiment may be executed in a scenario selected by a user, and for example, this embodiment may also be executed in a scenario obtained based on an AR scan. , Or executed in a preset VR environment, or executed in a preset MR environment, and so on.
  • an operation matching the action instruction is performed.
  • a rendering instruction may be specifically generated based on the action instruction; The target object related to the action instruction is rendered.
  • the action instruction may also be sent to the receiver, so that the receiver generates a rendering instruction based on the action instruction, and Rendering a target object related to the action instruction.
  • the target object of the above-mentioned augmented reality display is also displayed on the sender side.
  • the aforementioned target objects may specifically be augmented reality scenes, virtual reality scenes, mixed reality scenes, etc.
  • the display effects and related display technologies mentioned in the embodiments of the present specification may be implemented based on the OpenCV vision library.
  • the above-mentioned sending of the action instruction to the receiver may specifically be sending the action instruction to the server, and then the server sends the action instruction to the receiver; or, if there is no server, it is directly In a client-to-client scenario, the sender can directly send the action instruction to the receiver.
  • the human-computer interaction method determines a matching action instruction based on the image characteristics of the acquired image, and performs an operation matching the action instruction in response to the action instruction, thereby realizing the operation based on the acquired image. Human-computer interaction.
  • the embodiments of the present specification can also be applied in scenarios such as AR, VR, and MR.
  • FIG. 2 and FIG. 3 another embodiment of the specification provides a human-computer interaction method 200, which includes the following steps:
  • S202 Obtain a selected gesture image, a face image, or a human body image in response to a user's selection operation on the displayed preset image.
  • multiple gesture images can be displayed on the display interface in advance. For details, see the box under the text "Gesture Selection" on the right side of Figure 3.
  • the user clicks to select one of the gestures In the case of an image, the above gesture image can be obtained in this step.
  • multiple facial expression images, human action posture images, and the like can be displayed in advance.
  • the above facial expression images or human action images can be obtained.
  • the above-mentioned gesture image displayed in advance may include a gesture image of the left hand; a gesture image of the right hand; may also include a gesture image of one-handed fist or fingers closed; a gesture image of one-handed release or finger extension; and the middle finger and The ring finger closes the gesture image of love with other fingers spread out and so on.
  • the aforementioned facial expression image displayed in advance may be a smiling expression image, a sad expression image, a crying expression image, and the like.
  • the above-mentioned pre-shown human action posture image may be a human posture image bent at 90 degrees, a standing human posture image, or the like.
  • S204 Determine an action instruction based on the image characteristics of the selected image in a preset scene.
  • the correspondence between the above-mentioned image and image features can be stored before execution of this embodiment, so that the image features can be directly determined based on the image selected by the user. For example, if the gesture image selected by the user is an image of a single-handed fist, the gesture feature can be Represents the characteristics of a single-handed fist.
  • a mapping relationship table between image features and motion instructions may be established in advance.
  • a matching motion instruction may be directly determined by looking up the table.
  • the same image feature can also correspond to different action instructions. Therefore, before this embodiment is executed, the mapping relationship between image features and action instructions can also be established in different scenarios. Table, this embodiment may be executed in a determined scenario. For example, this embodiment may be executed in a scenario selected by a user, and for example, this embodiment may also be executed in a scenario obtained based on an AR scan. , Or in a preset VR scene, or in a preset MR scene, etc. In this way, before this embodiment is executed, a scene image can be obtained in advance, and the implementation is performed in the obtained scene. example.
  • the current application scenario when determining an action instruction based on the image feature, the current application scenario may be determined first, and then the action instruction corresponding to the image feature obtained in the current application scenario may be determined. For example, in the scenario of a stand-alone fighting game, based on one hand The gesture characteristics of the fist can determine the action instruction of the fist.
  • performing an operation matching the action instruction may specifically generate a rendering instruction based on the action instruction, and render a target object related to the action instruction, for example, in FIG. 3
  • the box to the left of the gesture image displayed in advance displays target objects of strong reality, virtual reality, or mixed reality.
  • the displayed target objects can be images of augmented reality, virtual reality, or mixed reality.
  • the action instruction may also be sent to the receiver so that the receiver generates a rendering instruction based on the action instruction to The target object related to the motion instruction is rendered.
  • the above-mentioned sending of the action instruction to the receiver may specifically be sending the action instruction to the server, and then the server sends the action instruction to the receiver; or, if there is no server, it is directly In a client-to-client scenario, the sender can directly send the action instruction to the receiver.
  • the interaction method provided in the embodiment of the present specification determines a matching action instruction based on the image characteristics of the acquired image, and performs an operation matching the action instruction in response to the action instruction, thereby realizing human-computer interaction based on the acquired image. .
  • a plurality of gesture images, face images, or human body images are stored in advance. This makes it easy for users to quickly select and improve the user experience.
  • the order of the gesture images displayed in advance in the display interface shown in FIG. 3, or the display order of the face image or the human body image in other embodiments may be sorted based on the user's historical usage frequency. For example, the user One-handed fist gesture image is selected most frequently, and the one-handed fist gesture image is displayed first, which is further convenient for users to select and improve the user experience.
  • step S202 a gesture image selected by a user such as A, B, and C from a plurality of displayed gesture images is acquired; in steps S204 and S206, in a preset scenario where A, B, and C interact with each other,
  • the above image features are sent to users such as A, B, and C based on the image features of the gesture images selected respectively.
  • each terminal device can collect the gesture image of each user in real time. If it matches the pre-selected image characteristics to a certain degree, it will perform subsequent logical operations.
  • the scene selected by terminal devices such as A, B, and C is an ancient temple. , There is a stone door in front, when multiple devices recognize the movement of the hand pushing forward, the stone door will slowly open and so on.
  • a gesture image, a face image, or a human body image is displayed in advance, considering that the number of images displayed is limited; and the content of the image displayed in advance is not rich enough, in order to further improve the image Quantity, and increase the richness of images, enhance user interaction, and increase user interaction fun.
  • a human-computer interaction method 400 including the following steps:
  • S402 Acquire an image feature, where the image feature includes at least one of the following: a gesture image feature, a face image feature, a human image feature, and an action feature.
  • the terminal device includes components that can be used to collect images.
  • the components used to collect images on the terminal device can include an infrared camera. After arriving at the image, acquire image features based on the acquired image.
  • the above motion characteristics include, for example, the characteristics of punching out, the characteristics of waving, the characteristics of releasing the palm, the characteristics of running, the characteristics of standing and standing, the characteristics of shaking the head, and the characteristics of nodding.
  • an application scenario may be identified in advance.
  • the application scenario may specifically include a scenario in which a sender and a receiver chat with each other; an application scenario in a network fighting game; a scenario in which multiple terminal devices chat and interact with each other Wait.
  • a gesture feature classification model may be used to acquire gesture features.
  • the input parameters of the gesture feature classification model may be collected gesture images (or pre-processed gesture images, which will be described in the next paragraph), and the output parameters may be gesture features.
  • the gesture feature classification model can be generated through machine learning based on algorithms such as Support Vector Machine (SVM), Convolutional Neural Network (CNN), or DL.
  • this step may further preprocess the collected gesture image in order to remove noise.
  • the pre-processing operation on the gesture image may include, but is not limited to, performing image enhancement on the acquired gesture image; image binarization; image graying and denoising processing.
  • a gesture image, a face image, a human body image, and an action image may be collected in advance, and then gesture image features, face image features, human image features, and motion features are extracted based on the collected images.
  • this embodiment may also determine whether to perform image preprocessing or determine the image preprocessing method to be used according to image feature accuracy requirements and performance requirements (such as response speed requirements). Specifically, for example, in an application scenario of a network fighting game with a high response speed requirement, the gesture image may not be preprocessed; in a scenario with a high requirement for gesture accuracy, the collected image may be preprocessed.
  • image feature accuracy requirements and performance requirements such as response speed requirements
  • S404 Determine a matching action instruction based on the image feature and the additional dynamic feature selected by the user in a preset scene.
  • a scene image may be obtained in advance, and this embodiment is executed under the obtained scene.
  • this step when determining a matching action instruction based on the image feature and the additional dynamic feature selected by the user, first determine the current application scenario, and then determine the action instruction corresponding to the image feature and the additional dynamic feature selected by the user in the current application scenario. For example, in the scenario of a stand-alone fighting game, based on the gesture characteristics of a single-handed fist and the dynamic characteristics of an additional fireball selected by the user, an action command of a fist + fireball can be determined. As shown in the schematic diagram of the application interface in FIG. 5, in this embodiment, multiple additional dynamic effects can be displayed on the display interface in advance. Specifically, see the circle under the text “Additional Dynamic Effects” on the right side of FIG. 5. When the user clicks to select one of them, For an additional dynamic effect, this step may determine an action instruction based on the gesture feature and the additional dynamic effect feature.
  • the selected additional dynamic feature corresponds to the acquired image.
  • this may also display multiple dynamic effects related to additional faces on the display interface in advance for the user to select, and generate additional dynamic features when the user selects, to Display effects, etc. for enhanced display.
  • this may also display a plurality of additional human body or motion-related dynamic effects in advance on the display interface for the user to select, and generate additional dynamic features when the user selects them.
  • the gesture characteristics representing a single-handed fist are obtained in step S402. If the above-mentioned additional dynamic effect (or feature) is not selected, the action instruction determined in this step only represents the action instruction of the fist; "Snowball", then the action instruction determined in this step may be an action instruction with a cool effect that includes punching and firing a snowball.
  • performing an operation matching the action instruction in this step may specifically generate a rendering instruction based on the action instruction, and render a target object related to the action instruction, for example, in the figure
  • the box on the left in 5 shows the target object of augmented reality, virtual reality, or mixed reality.
  • the target object displayed can be an image of the scene of augmented reality, virtual reality, or mixed reality.
  • This embodiment may also send the action instruction to the receiver, so that the receiver generates a rendering instruction based on the action instruction to render a target object related to the action instruction.
  • the sender can also display the target of the augmented reality. Object.
  • the interaction method provided in the embodiment of the present specification acquires image features, determines an action instruction based on the image features and additional dynamic features selected by the user, and implements human-computer interaction based on the acquired image features in response to the action instructions.
  • this embodiment acquires gesture image features, face image features, human image features, and motion features based on real-time collected images. Compared to a limited number of pre-stored images, the image features that can be acquired are more abundant. And diverse.
  • additional dynamic effects are stored in advance for the user to select, so that the user can quickly select them, so as to generate more cool special effects and improve the user experience.
  • the order of the additional dynamic effects previously displayed in the display interface shown in FIG. 5, or the display order of the additional dynamic effects on the face features or the additional dynamic effects on the human features in other embodiments may be Sort based on the user's historical usage frequency. For example, users select “Fireball” the most frequently. Refer to Figure 5, the additional dynamic effects of "Fireball” will be displayed first to further facilitate user selection and improve user experience.
  • FIG. 6 and FIG. 7 another embodiment of the present specification provides a human-computer interaction method 600 including the following steps:
  • S602 Acquire a scene feature selected by a user.
  • the scene features in this embodiment are specifically shown in the schematic diagram of the application interface in FIG. 7.
  • multiple preset scenes can be displayed on the display interface in advance, such as the “avatar” scene shown in FIG. 7.
  • Multiple scenes are shown schematically with "***”.
  • the application interface of FIG. 7 further includes a “more” button, which can display more preset scenes when the user clicks.
  • S604 Determine an action instruction based on the scene feature and the acquired image feature.
  • the image feature includes at least one of the following: a gesture image feature, a face image feature, a human image feature, and an action feature.
  • the terminal device includes components that can be used to collect images.
  • the components used to collect images on the terminal device can include an infrared camera and the like.
  • the acquired image acquires image features. For a specific acquisition process, refer to the embodiment shown in FIG. 4. The following takes an example of acquiring facial features as an example.
  • a facial feature classification model can be used to obtain facial features.
  • the input parameters of the facial feature classification model may be collected facial images (or pre-processed facial images, which will be described in the next paragraph), and the output parameters may be facial features.
  • the facial feature classification model can be generated through machine learning based on algorithms such as Support Vector Machine (SVM), Convolutional Neural Network (CNN) or DL.
  • this step may also preprocess the collected face images in order to remove noise.
  • the pre-processing operation on the face image may include, but is not limited to, performing image enhancement on the collected face image; image binarization; image graying and denoising processing.
  • the image feature and the scene feature may be fused, such as a face feature Fusion with scene features to generate an action instruction for the fusion of face features and scene features.
  • a human face area is reserved in the scene selected by the user, and the user's face feature is fused and displayed in the reserved face area. Realize the seamless docking of the user's face with the selected scene, and generate the effect that the user is actually in the above scene, for example, the user is in the middle of the picture, and the face of the character in the above scene becomes the user's face.
  • This embodiment is particularly applicable to application scenarios such as group photos, artistic photo stickers, artistic modeling, and cosplay.
  • performing an operation matching the action instruction in this step may specifically generate a rendering instruction based on the action instruction to render a target object related to the action instruction; or Send the action instruction to the receiver, so that the receiver generates a rendering instruction based on the action instruction, renders a target object related to the action instruction, and finally displays the target object of augmented reality, virtual reality, or mixed reality.
  • a message carrying human face features and the scene features may also be sent to the receiver, and the receiver may obtain the receiver ’s face features, thereby realizing the sender
  • the fusion of the facial features of the user, the facial features of the receiver, and the scene selected by the sender is convenient for improving the user experience.
  • the interactive method provided in the embodiment of the present specification acquires image features and scene features, determines an action instruction based on the image features and the scene features, and responds to the action instructions to achieve the fusion of image features and various preset scenes, Facilitate user experience.
  • different preset scenes are stored in advance for the user to choose, which realizes that the acquired image changes into different shapes in different scenes, which increases interest and improves user experience.
  • this embodiment may also save the target objects of the augmented reality, virtual reality, or mixed reality shown above, which is convenient for users to use later.
  • a third-party camera device may be requested to record and record the augmented reality, virtual reality, or mixed reality view displayed on the screen of the current terminal device from the outside, thereby indirectly implementing augmented reality, virtual reality, or mixed reality view storage, which can be flexible To get the augmented reality, virtual reality, or mixed reality views that users need to store.
  • the augmented reality, virtual reality, or mixed reality view that the user sees on the display screen can also be captured and saved in a screenshot manner.
  • This implementation method not only intercepts and stores all the augmented reality, virtual reality, or mixed reality content displayed on the screen, but also selectively stores the augmented reality, virtual reality, or mixed reality views according to the needs of the user.
  • the initial display interface can refer to FIG. 8 to FIG.
  • the Card function is stored in the chat interface, as shown in Figure 8, where the ** Card can be AR Card, MR Card, or VR Card, etc.
  • a ** Card option may be popped up in the message interface for users to select and use, thereby improving the user experience.
  • FIG. 8 and FIG. 9 only schematically show a trigger execution mode.
  • the methods described in the foregoing embodiments may also be triggered by other methods, such as shaking the terminal device for automatic execution.
  • the execution of the specific voice issued by the user is not specifically limited.
  • another embodiment of the present specification provides a human-computer interaction method 1000, which is applied to a receiver and includes the following steps:
  • the action instruction in this embodiment may be the action instruction mentioned in the embodiments shown in FIG. 1 to FIG. 7 in the foregoing, that is, the embodiment is applied to the receiver, and the operation performed by the sender may be The operations of the various embodiments as shown in FIGS. 1 to 7.
  • action instructions in this embodiment may also be other action instructions, that is, independent of each of the embodiments shown in FIG. 1 to FIG. 7.
  • Screen vibration is reversed, that is, the entire terminal device screen vibrates and reverses;
  • the above animations include gif images.
  • the above video may specifically be a video file in an encoding format such as H264, H265, and the receiver may automatically play after receiving the above video file;
  • the above animation may specifically be an animation that enhances expression of a character, an artistic text of a voice-over, and some background animation effects, etc. , The receiver will play automatically after receiving the above animation.
  • the display interface of the sender can also show that the status of the three-dimensional model of the receiver changes, which can specifically show three-dimensional display effects such as augmented reality, virtual reality, or mixed reality, such as shots on the receiver and snowflakes on the receiver.
  • three-dimensional display effects such as augmented reality, virtual reality, or mixed reality, such as shots on the receiver and snowflakes on the receiver.
  • the processing effect of the avatar can also be displayed on the display interface of the sender.
  • the avatar can be turned into a turtle or other 3D display style of the receiver's avatar such as augmented reality, virtual reality, or mixed reality , Improve fun and enhance user experience.
  • the generation and extinction of the actions of both parties can be displayed on the display interface of the sender, and the final state of the receiver, such as the status and avatar; and the generation and extinction of the actions of both parties can be displayed on the display interface of the receiver.
  • the final state of the receiver such as the status and avatar
  • the generation and extinction of the actions of both parties can be displayed on the display interface of the receiver.
  • this embodiment can also receive a drag instruction, and move the displayed object on the display interface.
  • the human-computer interaction method receives a motion instruction from a sender, and displays an effect corresponding to the motion instruction in response to the motion instruction, thereby realizing human-computer interaction based on the motion instruction.
  • the effects corresponding to the action instructions may be displayed in a three-dimensional state, and specifically may be a three-dimensional augmented reality, virtual reality, or mixed reality display.
  • the following effects can also be generated in the display interface of the sender: A (sender) sends a snowball, and B (receiver) sends a fireball.
  • the fireball will weaken after the fireball and snowball collide. And fly to Party A, and then Party A's image catches fire; for example, Party A and Party B send fireballs or water polo at the same time, after the collision, they will be scattered into sparks or snowflakes, forming a fantasy artistic effect, improving fun and enhancing users Experience.
  • this specification also provides a human-machine interaction device 1200.
  • the device 1200 includes:
  • the image acquisition module 1202 may be configured to acquire an image used to instruct the terminal device to perform an action
  • the action instruction determining module 1204 may be configured to determine a matching action instruction based on the image characteristics of the image;
  • the execution module 1206 may be configured to perform an operation matching the action instruction in response to the action instruction.
  • the interaction device determines an action instruction based on the image characteristics of the acquired image and executes an operation matching the action instruction in response to the action instruction, thereby realizing human-computer interaction based on the acquired image.
  • the image acquisition module 1202 may be configured to acquire a selected image in response to a user's selection operation of the preset image displayed.
  • the image acquisition module 1202 may be configured to acquire an image of a user through a camera acquisition device.
  • the image for instructing the terminal device to perform an action includes a gesture image, a face image, or a human body image.
  • the action instruction determining module 1204 may be configured to determine a matching action instruction based on the gesture feature and the acquired additional dynamic feature.
  • the action instruction determination module 1204 may be configured to determine a matched action instruction based on an image feature of the image and the additional dynamic feature in a preset scene.
  • the action instruction determining module 1204 may be configured to determine a matching action instruction based on an image feature of the image and an acquired scene feature.
  • the apparatus 1200 further includes a saving module, which may be used to save the image feature and the scene feature.
  • the execution module 1206 may be configured to generate a rendering instruction based on the action instruction to render a target object related to the action instruction.
  • the apparatus 1200 further includes a sending module, which may be configured to send the action instruction to a receiver.
  • a sending module which may be configured to send the action instruction to a receiver.
  • the above-mentioned human-computer interaction device 1200 may refer to the flow of the human-machine interaction method shown in FIG. 1 to FIG. 9 corresponding to the embodiment of the previous text description, and each unit / module in the human-machine interaction device 1200
  • the operations and / or functions described above are respectively for the purpose of realizing the corresponding processes in the human-computer interaction method, and for the sake of brevity, they are not repeated here.
  • this specification also provides a human-computer interaction device 1300.
  • the device 1300 includes:
  • the receiving module 1302 may be used to receive an action instruction from a sender
  • the effect display module 1304 may be configured to display an effect corresponding to the action instruction in response to the action instruction, and the effect corresponding to the action instruction includes at least one of the following:
  • Video or animation playback
  • the above video can be a video file in H264, H265 and other encoding formats, or a three-dimensional model can be calculated in time. That is, the receiver can automatically play the video file after receiving the video file.
  • the above animation can be an animation that enhances the expression of a character, a voiceover. Artistic text and some background animation effects, etc., the receiver can automatically play after receiving the above animation.
  • the display interface of the sender can also show that the status of the receiver's 3D model has changed. Specifically, it can show the receiver's shot, 3D display effects such as snowflakes on the receiver, virtual reality or mixed reality, etc. .
  • the display effect of the receiver's avatar can also be displayed on the display interface of the sender.
  • the receiver's avatar becomes a turtle or other 3D display of the receiver's avatar such as augmented reality, virtual reality, or mixed reality. Change styles, improve fun, and enhance user experience.
  • the generation and extinction of the actions of both parties can be displayed on the display interface of the sender, and the final state of the receiver, such as the status and avatar; and the generation and extinction of the actions of both parties can be displayed on the display interface of the receiver.
  • the final state of the receiver such as the status and avatar
  • the generation and extinction of the actions of both parties can be displayed on the display interface of the receiver.
  • the human-computer interaction device receives a motion instruction from a sender, and displays an effect corresponding to the motion instruction in response to the motion instruction, thereby realizing human-computer interaction based on the received motion instruction.
  • the above-mentioned human-machine interaction device 1300 may refer to the flow of the human-machine interaction method shown in FIG. 10 to FIG. 11 corresponding to the embodiment of the previous text description, and each unit / module in the human-machine interaction device 1300
  • the other operations and / or functions mentioned above are for realizing the corresponding processes in the human-computer interaction method, and for the sake of brevity, they are not repeated here.
  • FIG. 14 The effects that can be achieved by the foregoing embodiments in this specification can be specifically seen in FIG. 14.
  • the user inputs, not only text input, voice input, picture input, and short video input, but also face recognition, motion recognition, scene recognition, etc. , And send different effects based on the recognized faces, actions, and scenes.
  • the user receives, not only ordinary text display, voice playback, short video playback of dynamic picture playback, etc., but also effects such as status changes, animation sound playback screen vibration feedback, etc.
  • the above status changes such as the sender ’s body Bomb, sender's avatar becomes a turtle, dynamically change the background, etc.
  • the electronic device includes a processor, and optionally, includes an internal bus, a network interface, and a memory.
  • the memory may include a memory, such as a high-speed random access memory (Random-Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Wait.
  • the electronic device may also include hardware required to implement other services.
  • the processor, network interface, and memory can be connected to each other through an internal bus, which can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an extended industry standard Structure (Extended Industry, Standard Architecture, EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry, Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a two-way arrow is used in FIG. 15, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, where the program code includes a computer operation instruction.
  • the memory may include memory and non-volatile memory, and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it to form a device for forwarding chat information on a logical level.
  • the processor executes a program stored in the memory, and is specifically configured to perform operations of the method embodiment described earlier in this specification.
  • the methods and methods executed by the devices disclosed in the embodiments shown in FIG. 1 to FIG. 11 may be applied to a processor, or implemented by a processor.
  • the processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc .; it may also be a digital signal processor (DSP), special integration Circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present specification may be directly embodied as being executed by a hardware decoding processor, or may be executed and completed by using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in a memory, and the processor reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
  • the electronic device shown in FIG. 15 can also execute the methods of FIGS. 1 to 11 and implement the functions of the embodiment of the human-computer interaction method shown in FIG. 1 to FIG. 11, which will not be described again in the embodiment of this specification.
  • the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, etc.
  • the execution body of the following processing flow is not limited to each logical unit. It can also be a hardware or logic device.
  • the embodiments of the present specification also provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the processes of the method embodiments shown in FIG. 1 to FIG. 11 are implemented. , And can achieve the same technical effect, in order to avoid repetition, will not repeat them here.
  • the computer-readable storage medium is, for example, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information can be stored by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media may be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Electrotherapy Devices (AREA)

Abstract

本说明书实施例公开了一种人机交互方法和装置,上述方法包括:获取用于指示终端设备执行动作的图像;基于所述图像的图像特征确定匹配的动作指令;响应于所述动作指令,执行与所述动作指令相匹配的操作。本说明书实施例还公开了另外一种人机交互方法和装置。

Description

一种人机交互方法和装置 技术领域
本说明书涉及计算机技术领域,尤其涉及一种人机交互方法和装置。
背景技术
增强现实(Augmented reality,AR)技术是通过计算机***提供的信息增加用户对现实世界感知,其将虚拟的信息应用到真实世界,并将计算机生成的虚拟物体、场景或***提示信息叠加到真实场景中,从而实现对现实的增强,达到超越现实的感官体验。
虚拟现实(Virtual Reality,VR)通过模拟计算产生出一个与现实场景相同或相似的三维虚拟世界,用户可以在这个虚拟现实世界中进行游戏、活动或执行某些特定的操作,整个过程如同在真实世界中进行一般,给用户提供了视觉、听觉、触觉等全方位的模拟体验。
混合现实(Mix reality,MR)技术包括增强现实和增强虚拟,指的是合并现实和虚拟世界而产生的新的可视化环境。在新的可视化环境中,物理和虚拟对象(也即数字对象)共存,并实时互动。
目前,AR、VR和MR技术还处于开发阶段,与上述技术相关的人机交互技术尚不成熟,因此有必要提供一种人机交互方案。
发明内容
本说明书实施例提供一种人机交互方法和装置,用于实现人机交互。
本说明书实施例采用下述技术方案:
第一方面,提供了一种人机交互方法,包括:获取用于指示终端设备执行动作的图像;基于所述图像的图像特征确定匹配的动作指令;响应于所述动作指令,执行与所述动作指令相匹配的操作。
第二方面,提供了一种人机交互方法,应用在接收方,包括:接收来自于发送方的动作指令;响应于所述动作指令,显示与所述动作指令对应的效果,所述与所述动作指令对应的效果包括下述至少一种:对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;对与发送方进行通讯的消息边框颜色的处理效果;屏幕振动 反转;或视频或动画播放。
第三方面,提供了一种人机交互装置,包括:图像获取模块,获取用于指示终端设备执行动作的图像;动作指令确定模块,基于所述图像的图像特征确定匹配的动作指令;执行模块,响应于所述动作指令,执行与所述动作指令相匹配的操作。
第四方面,提供了一种人机交互装置,包括:接收模块,接收来自于发送方的动作指令;效果显示模块,响应于所述动作指令,显示与所述动作指令对应的效果,所述与所述动作指令对应的效果包括下述至少一种:对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;对与发送方进行通讯的消息边框颜色的处理效果;屏幕振动反转;或视频或动画播放。
第五方面,提供了一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如下操作:获取用于指示终端设备执行动作的图像;基于所述图像的图像特征确定匹配的动作指令;响应于所述动作指令,执行与所述动作指令相匹配的操作。
第六方面,提供了一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如下操作:接收来自于发送方的动作指令;响应于所述动作指令,显示与所述动作指令对应的效果,所述与所述动作指令对应的效果包括下述至少一种:对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;对与发送方进行通讯的消息边框颜色的处理效果;屏幕振动反转;或视频或动画播放。
第七方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:获取用于指示终端设备执行动作的图像;基于所述图像的图像特征确定匹配的动作指令;响应于所述动作指令,执行与所述动作指令相匹配的操作。
第八方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:接收来自于发送方的动作指令;响应于所述动作指令,显示与所述动作指令对应的效果,所述与所述动作指令对应的效果包括下述至少一种:对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;对与发送方进行通讯的消息边框颜色的处理效果;屏幕振动反转;或视频或动画播放。
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:基于获取到图像的图像特征确定匹配的动作指令,并响应于所述动作指令执行与所述动作指令相匹配的操作,实现了基于获取的图像的人机交互。
附图说明
此处所说明的附图用来提供对本说明书的进一步理解,构成本说明书的一部分,本说明书的示意性实施例及其说明用于解释本说明书,并不构成对本说明书的不当限定。在附图中:
图1为本说明书的一个实施例提供的人机交互方法流程示意图;
图2为本说明书的另一个实施例提供的人机交互方法流程示意图;
图3为图2所示的实施例中的显示界面示意图;
图4为本说明书的再一个实施例提供的人机交互方法流程示意图;
图5为图4所示的实施例中的显示界面示意图;
图6为本说明书的又一个实施例提供的人机交互方法流程示意图;
图7为图6所示的实施例中的显示界面示意图;
图8为本说明书的一个实施例提供的人机交互方法初始界面示意图;
图9为本说明书的一个实施例提供的人机交互方法初始界面另一示意图;
图10为本说明书的下一个实施例提供的人机交互方法流程示意图;
图11为图10所示的实施例中的显示界面示意图;
图12为本说明书的一个实施例提供的人机交互装置结构示意图;
图13为本说明书的另一个实施例提供的人机交互装置结构示意图;
图14本说明书各个实施例能够实现的效果示意图。
图15为实现本说明书各个实施例的电子设备硬件结构示意图。
具体实施方式
为使本说明书的目的、技术方案和优点更加清楚,下面将结合本说明书具体实施例及相应的附图对本说明书技术方案进行清楚、完整地描述。显然,所描述的实施例仅是 本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本说明书保护的范围。
如图1所示,本说明书的一个实施例提供一种人机交互方法100,包括如下步骤:
S102:获取用于指示终端设备执行动作的图像。
本说明书实施例中获取的用于指示终端设备执行动作的图像可以是手势图像、人脸图像、用户全身的人体图像或者是用户身体的局部图像等等,本说明书不作具体限行。
本说明书实施例中获取的图像可以是单张图像,也可以是采集的视频流中的多帧图像。
另外,该步骤中获取图像可以是单个用户的图像,也可以是多个用户的图像。
该步骤可以是从预先存储的多个图像中获取图像,也可以是实时采集得到图像。如果上述图像可以是预先存储的,这样,步骤S102可以从存储的多个图像中获取一个图像,例如获取用户选择的一个图像。另外,如果上述图像还是实时采集得到,这样,步骤S102可以基于终端设备的图像传感器等实时采集图像。
S104:基于所述图像的图像特征确定匹配的动作指令。
该步骤中的图像特征和获取到的图像相对应,具体可以是从获取到的图像中提取的到,例如,获取到的是手势图像,则该处的图像特征可以是手势特征;获取到的图像是人脸图像,则该处的图像特征可以是人脸特征;获取到的图像是人体图像,则该处的图像特征可以是人体的姿势或动作特征等等。
该实施例执行之前,可以预先建立图像特征和动作指令的映射关系表,这样,步骤S104则可以直接通过查表的方式确定匹配的动作指令。
可选地,在不同的应用场景下,同一个图像特征还可以对应与不同的动作指令,因此,该实施例执行之前,还可以在不同的场景下,分别建立图像特征和动作指令的映射关系表,该实施例则可以是在已确定的场景下执行,例如,该实施例可以是在用户选择的场景下执行,又例如,该实施例还可以是在基于AR扫描获取到的场景下执行,或者是在预设的VR环境下执行,又或者是在预设的MR环境下执行,等等。
S106:响应于所述动作指令,执行与所述动作指令相匹配的操作。
该步骤中的响应于所述动作指令,执行与所述动作指令相匹配的操作,例如,在单 机人机交互的增强现实场景下,具体可以是基于所述动作指令生成渲染指令;然后以对所述动作指令相关的目标对象进行渲染。
另外,在发送方和接收方的聊天场景下,对所述动作指令相关的目标对象进行渲染的同时,还可以向接收方发送所述动作指令,以便接收方基于上述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染。同时,在发送方也显示上述增强现实显示的目标对象。上述提到的目标对象,具体可以是增强现实场景、虚拟现实场景、混合现实场景等等;另外,本说明书各个实施例提到的显示效果以及相关的显示技术可以基于Open CV视觉库实现。
上述提到的向接收方发送所述动作指令,具体可以是将所述动作指令发送至服务端,再由服务端向接收方发送所述动作指令;或者是,在不存在服务端而直接是客户端对客户端的场景下,发送方可以直接将所述动作指令发送至接收方。
本说明书实施例提供的人机交互方法,基于获取到的图像的图像特征确定匹配的动作指令,并响应于所述动作指令执行与所述动作指令相匹配的操作,实现了基于获取的图像的人机交互。
可选地,本说明书的各个实施例还可以应用在AR、VR以及MR等场景下。
为详细说明本说明书实施例提供的人机交互方法,如图2和图3所示,本说明书的另一个实施例提供一种人机交互方法200,包括如下步骤:
S202:响应于用户对展示的预设图像的选择操作,获取被选择的手势图像、人脸图像或人体图像。
如图3的应用界面示意图所示,该实施例可以预先在显示界面显示多个手势图像,具体见图3中右侧的文字“手势选择”下方的方框,当用户点击选择其中的一个手势图像时,该步骤即可获取到了上述手势图像。
可选地,该实施例还可以预先展示多个人脸表情图像、人体动作姿势图像等,当用户选取时,该步骤即可获取上述人脸表情图像或人体动作图像。
可选地,上述预先显示的手势图像可以包括左手的手势图像;右手的手势图像;还可以包括单手握拳或手指合拢的手势图像;单手放开或手指伸开的手势图像;以及中指和无名指合拢其他手指伸开的爱的手势图像等等。
上述预先展示的人脸表情图像可以是欢笑的表情图像、悲伤的表情图像、大哭的表 情图像等。
上述预先展示的人体动作姿势图像可以是弯腰90度的人体姿势图像、站军姿的人体动作姿势图像等等。
S204:在预设场景下基于选取的图像的图像特征确定动作指令。
该实施例执行之前可以预先存储上述图像和图像特征的对应关系,这样,基于用户选择的图像即可直接确定图像特征,例如,用户选取的手势图像是单手握拳的图像,则手势特征可以是表示单手握拳的特征。
该实施例执行之前,可以预先建立图像特征和动作指令的映射关系表,这样,步骤S204则可以直接通过查表的方式确定匹配的动作指令。
可选地,在不同的应用场景下,同一个图像特征还可以对应与不同的动作指令,因此,该实施例执行之前,还可以在不同的场景下,分别建立图像特征和动作指令的映射关系表,该实施例则可以是在已确定的场景下执行,例如,该实施例可以是在用户选择的场景下执行,又例如,该实施例还可以是在基于AR扫描获取到的场景下执行,或者是在预设的VR场景下执行,又或者是在预设的MR场景下执行,等等,这样,该实施例执行之前还可以预先获取场景图像,在获取到的场景下执行该实施例。
该步骤基于所述图像特征确定动作指令时,可以先确定当前的应用场景,然后确定在当前应用场景下获取到的图像特征对应的动作指令,例如,在单机格斗游戏的场景下,基于单手握拳的手势特征可以确定出拳的动作指令。
S206:响应于所述动作指令,执行与所述动作指令相匹配的操作。
该步骤中的响应于所述动作指令,执行与所述动作指令相匹配的操作具体可以是基于所述动作指令生成渲染指令,对所述动作指令相关的目标对象进行渲染,例如,在图3中预先显示的手势图像左侧的方框内展示强现实、虚拟现实或混合现实的目标对象,展示的目标对象可以是增强现实、虚拟现实或混合现实场景图像。
该步骤中提到的响应于所述动作指令,执行与所述动作指令相匹配的操作之后,还可以向接收方发送所述动作指令,以便接收方基于上述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染。
上述提到的向接收方发送所述动作指令,具体可以是将所述动作指令发送至服务端,再由服务端向接收方发送所述动作指令;或者是,在不存在服务端而直接是客户端对客 户端的场景下,发送方可以直接将所述动作指令发送至接收方。
本说明书实施例提供的交互方法,基于获取到图像的图像特征确定匹配的动作指令,并响应于所述动作指令执行与所述动作指令相匹配的操作,实现了基于获取的图像的人机交互。
另外,本说明书实施例预先保存有多个手势图像、人脸图像或人体图像。从而方便用户快速选取,提高用户体验。
可选地,在图3所示的显示界面中预先展示的手势图像的顺序,或者是其他实施例中的人脸图像或人体图像的显示顺序,可以基于用户历史使用频率进行排序,例如,用户选择单手握拳的手势图像的频率最高,则将单手握拳的手势图像排在第一位进行展示,进一步方便用户选取,提高用户体验。
需要说明的是,上述实施例还可以同时应用在多个设备多个用户交互的场景下。具体例如,通过步骤S202获取甲、乙、丙等用户从多个展示的手势图像中选取的手势图像;通过步骤S204和步骤S206,在预设的甲、乙、丙等互相交互的场景下,基于各自选取的手势图像的图像特征向甲、乙、丙等用户发送上述图像特征。同时,每个终端设备可以实时采集每个用户的手势图像,如果匹配预先选取的图像特性达到一定契合度,则执行后续逻辑操作,例如甲、乙、丙等终端设备选择的场景是一个古代庙宇,前面有道石门、当多设备识别到手往前推的动作,石门就会缓缓打开等。
在图2和图3所示的实施例中预先展示有手势图像、人脸图像或人体图像等,考虑到展示的图像的数量有限;并且预先展示的图像的内容不够丰富,为了进一步提高图像的数量,并且提高图像的丰富程度,增强用户互动,增加用户交互乐趣,如图4和图5所示,本说明书的另一个实施例提供一种人机交互方法400,包括如下步骤:
S402:获取图像特征,所述图像特征包括下述至少一种:手势图像特征、人脸图像特征、人体图像特征以及动作特征。
该实施例可以应用在终端设备上,该终端设备包括有可用于采集图像的部件,以运行增强现实应用的终端设备为例,终端设备上用于采集图像的部件可以包括红外摄像头等,在获取到图像后基于获取的图像获取图像特征。
上述动作特征,例如包括:出拳的动作特征、挥手的动作特征、出掌的动作特征、跑步的动作特征、直立静止的动作特征、摇头的动作特征、点头的动作特征等。
可选地,该实施例执行之前还可以预先识别应用场景,例如,上述应用场景具体可 以包括发送方和接收方相互聊天的场景;网络格斗游戏的应用场景;多个终端设备互相聊天交互的场景等。
该步骤在获取图像特征时,例如获取手势特征时,可使用手势特征分类模型获取手势特征。该手势特征分类模型的输入参数可以是采集到的手势图像(或者预处理后的手势图像,下一段进行介绍),输出参数可以是手势特征。该手势特征分类模型可基于支持向量机(Support Vector Machine,SVM))、卷积神经网络(Convolutional Neural Network,简称CNN)或DL等算法,通过机器学习的方式生成得到。
为了提高手势特征的识别精度,可选地,该步骤还可以对采集到的手势图像进行预处理,以便去除噪声。具体地,对手势图像的预处理操作可包括但不限于:对采集到的手势图像进行图像增强;图像二值化;图像灰度化以及去噪声处理等。
对于人脸图像特征、人体图像特征以及动作特征的获取方式与上述手势特征的获取方式类似,在此不再赘述。
该实施例执行之前可以预先采集手势图像、人脸图像、人体图像以及动作图像等,然后基于采集的图像提取手势图像特征、人脸图像特征、人体图像特征以及动作特征。
可选地,该实施例还可以根据图像特征精度要求以及性能要求(比如响应速度要求)等来确定是否进行图像预处理,或者确定所采用的图像预处理方法。具体例如,在响应速度要求比较高的网络格斗游戏的应用场景下,可以不对手势图像进行预处理;在对手势精度要求比较高的场景下,可以对采集到的图像进行预处理。
S404:在预设场景下基于所述图像特征以及用户选取的附加动态特征确定匹配的动作指令。
该实施例执行之前还可以预先获取场景图像,在获取到的场景下执行该实施例。
该步骤具体基于所述图像特征以及用户选取的附加动态特征确定匹配的动作指令时,可以先确定当前的应用场景,然后确定在当前应用场景下图像特征以及用户选取的附加动态特征对应的动作指令,例如,在单机格斗游戏的场景下,基于单手握拳的手势特征以及用户选择的附加火球的动态特征,可以确定出拳+火球的动作指令。如图5的应用界面示意图所示,该实施例可以预先在显示界面显示多个附加动态效果,具体见图5中右侧的文字“附加动态效果”下方的圆形,当用户点击选择其中的一个附加动态效果时,该步骤即可基于所述手势特征和所述附加动态效果特征确定动作指令。
该实施例中,选取的附加动态特征和获取的图像相对应。在其他的实施例中,如果 获取到的是人脸特征,这还可以预先在显示界面显示多个附加人脸相关的动态效果供用户选取,当用户选取时生成附加动态特征,以对人脸显示效果等进行增强显示。
在其他的实施例中,如果获取到的是人体图像特征或动作特征,这还可以预先在显示界面显示多个附加人体或动作相关的动态效果供用户选取,当用户选取时生成附加动态特征。
具体例如,步骤S402中获取到的是表示单手握拳的手势特征,如果不选择上述附加动态效果(或称特征),则该步骤确定的动作指令仅仅表示出拳的动作指令;如果选择附加“雪球”的附加动态效果,则该步骤确定的动作指令可以是包括出拳加发射雪球的具有炫酷效果的动作指令。
S406:响应于所述动作指令,执行与所述动作指令相匹配的操作。
该步骤中的响应于所述动作指令,执行与所述动作指令相匹配的操作,具体可以是基于所述动作指令生成渲染指令,对所述动作指令相关的目标对象进行渲染,例如,在图5中左侧的方框内展示增强现实、虚拟现实或混合现实的目标对象,展示的目标对象可以是增强现实、虚拟现实或混合现实场景图像。
该实施例还可以向接收方发送所述动作指令,以便接收方基于上述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染,当然在发送方也可以同样展示增强现实的目标对象。
本说明书实施例提供的交互方法,获取图像特征,并基于所述图像特征以及用户选取的附加动态特征确定动作指令并响应于所述动作指令,实现基于获取的图像特征的人机交互。
另外,该实施例基于实时采集的图像获取手势图像特征、人脸图像特征、人体图像特征以及动作特征等,相对于获取数量有限的、预先存储的图像而言,能够获取到的图像特征更加丰富、多样。
同时,通过实时采集用户图像并获取图像特征的方式,增加用户的互动,特别是在一些游戏场景下,提高用户的融入感和互动性,提高用户体验。
另外,本说明书实施例预先保存有附加动态效果供用户选择,从而方便用户快速选取,以便与生成更加炫酷的特技效果,提高用户体验。
可选地,在图5所示的显示界面中预先展示的附加动态效果的顺序,或者是其他实 施例中的对人脸特征的附加动态效果、或人体特征的附加动态效果等显示顺序,可以基于用户历史使用频率进行排序,例如,用户选择“火球”的频率最高,参见图5,则将“火球”的附加动态效果排在第一位进行展示,进一步方便用户选取,提高用户体验。
需要说明的是,上述实施例不仅可以应用在单个终端设备的场景下,还可以同时应用在多个设备交互的场景下。
如图6和图7所示,本说明书的另一个实施例提供一种人机交互方法600,包括如下步骤:
S602:获取用户选取的场景特征。
该实施例中的场景特征,具体如图7的应用界面示意图所示,该实施例可以预先在显示界面显示多个预设场景,例如图7所示的“阿凡达(avatar)”场景,后续的多个场景以“***”进行示意显示,当用户点击选择其中的一个场景时,该步骤即相当于是获取到的场景特征。
另外,在图7的应用界面还包括有“more”按钮,当用户点击时可以展现更多的预设场景。
S604:基于所述场景特征以及获取的图像特征确定动作指令,所述图像特征包括下述至少一种:手势图像特征、人脸图像特征、人体图像特征以及动作特征。
该实施例可以应用在终端设备上,该终端设备包括有可用于采集图像的部件,以运行增强现实应用的终端设备为例,终端设备上用于采集图像的部件可以包括红外摄像头等,并基于获取的图像获取图像特征,具体的获取过程参见图4所示的实施例,以下以获取人脸特征为例进行介绍。
在获取人脸特征时,可使用人脸特征分类模型获取人脸特征。该人脸特征分类模型的输入参数可以是采集到的人脸图像(或者预处理后的人脸图像,下一段进行介绍),输出参数可以是人脸特征。该人脸特征分类模型可基于支持向量机(Support Vector Machine,SVM))、卷积神经网络(Convolutional Neural Network,简称CNN)或DL等算法,通过机器学习的方式生成得到。
为了提高人脸特征的识别精度,可选地,该步骤还可以对采集到的人脸图像进行预处理,以便去除噪声。具体地,对人脸图像的预处理操作可包括但不限于:对采集到的人脸图像进行图像增强;图像二值化;图像灰度化以及去噪声处理等。
该步骤基于所述图像特征和所述场景特征确定匹配的动作指令时,例如,在具有发送方和接收方的网络聊天的应用场景下,可以将图像特征和场景特征融合,如将人脸特征和场景特征融合,生成人脸特征和场景特征融合的动作指令,具体例如,在用户选择的场景中预留有人脸区域,将用户的人脸特征融合展示在上述预留的人脸区域,从而实现用户人脸与选择的场景的无缝对接,生成用户真实处于上述场景中的效果,具体如,用户人在画中游、上述场景中的角色的脸部变成了用户的人脸等。
该实施例尤其适用于合影、艺术大头贴、艺术造型、cosplay等应用场景下。
S606:响应于所述动作指令,执行与所述动作指令相匹配的操作。
该步骤中的响应于所述动作指令,执行与所述动作指令相匹配的操作,具体可以是基于所述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染;还可以是向接收方发送所述动作指令,以便接收方基于上述动作指令生成渲染指令,对所述动作指令相关的目标对象进行渲染,最终展示增强现实、虚拟现实或混合现实的目标对象。
在上述合影的应用场景下,通过步骤S606的操作之后,还可以将携带有人脸特征和所述场景特征的消息发送至接收方,在接收方在获取接收方的人脸特征,从而实现发送方的人脸特征、接收方的人脸特征以及发送方选择的场景的融合,便于提高用户体验。
本说明书实施例提供的交互方法,获取图像特征以及场景特征,基于所述图像特征和所述场景特征确定动作指令并响应于所述动作指令,实现了图像特征和各种预设场景的融合,便于提升用户体验。
需要说明的是,上述实施例不仅可以应用在单个终端设备的场景下,还可以同时应用在多个设备交互的场景下。
另外,该实施例预先存储有不同的预设场景供用户选择,实现了获取的图像在不同的场景下变幻出不同的造型,增加趣味性,提高用户体验。
可选地,该实施例还可以保存上述展示的增强现实、虚拟现实或混合现实的目标对象,方便用户后续使用。在一个实施例中,可以请求第三方摄像器材从外界拍摄记录当前终端设备屏幕上所显示的增强现实、虚拟现实或混合现实视图,从而间接实现增强现实、虚拟现实或混合现实视图存储,能够灵活的获取用户所需要存储的增强现实、虚拟现实或混合现实视图。
在另一个实施例中,还可以通过截图的方式截取并保存用户在显示屏幕上所看到的增强现实、虚拟现实或混合现实视图。该实现方式不仅截取并存储屏幕上显示的所有增强现实、虚拟现实或混合现实内容,还可以根据用户需要有选择的存储增强现实、虚拟现实或混合现实视图。
对于本说明书前文图1至图7所示的实施例具体应用时,其初始显示界面可以参见图8至图9,用户点击最右侧的添加按钮则会出现**Card选项,并且将**Card功能保存在聊天界面中,如图8所示,该处的**Card可以是AR Card、MR Card或者是VR Card等等。
后续用户使用时,首先可以点击如图8所示的**Card按钮,然后即可以执行图1至图7所示的各个实施例的操作步骤;或者,检测到用户目前的场景能够执行前文图1至图7所示的实施例的方法步骤时,可以在消息界面弹出**Card选项以供用户选择使用,提高用户体验。
需要说明的是,图8和图9只是示意性地展示了一种触发执行方式,实际上,前文几个实施例介绍的方法还可以是由其他方式触发执行,例如摇一摇终端设备自动执行、通过识别用户发出的特定语音执行等等,本说明书实施例不作具体限定。
如图10和图11所示,本说明书的另一个实施例提供一种人机交互方法1000,应用在接收方,包括如下步骤:
S1002:接收来自于发送方的动作指令。
该实施例中的动作指令,可以是前文中的图1至图7所示的实施例中所提到的动作指令,也即,该实施例应用在接收方,其发送方执行的操作可以是如图1至图7所示的各个实施例的操作。
当然,该实施例中的动作指令也可以是其他的动作指令,即与图1至图7所示的各个实施例相互独立。
S1004:响应于所述动作指令,显示与所述动作指令对应的效果;
其中,所述与所述动作指令对应的效果包括下述至少一种:
对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
对与发送方进行通讯的消息边框颜色的处理效果,对于该处提到的消息边框,可以参见图11,在显示界面中,网名为***的朋友发送了三条消息,每一条消息都包括 有消息边框。
屏幕振动反转,即整个终端设备屏幕振动并发生反转;或
自动播放视频、动画以及语音等,上述动画包括gif图像。
上述视频具体可以是H264、H265等编码格式的视频文件,接收方接收到上述视频文件后即可自动播放;上述动画具体可以是强化表现人物表情的动画、画外音的艺术文字以及一些背景动画效果等,接收方接收到上述动画后自动播放。
另外,该实施例在发送方的显示界面还可以显示接收方三维模型状态发生变化,具体可以展示接收方身上中弹、接收方身上有雪花等增强现实、虚拟现实或混合现实等三维显示效果。
此外,该实施例在发送方的显示界面还可以显示头像的处理效果,例如,具体可以是接收方头像变成乌龟或其他的增强现实、虚拟现实或混合现实等接收方头像的三维显示变化样式,提高趣味性,增强用户体验。
上述显示效果中,在发送方的显示界面中可以显示出双方动作的产生到消亡,以及接收方的状态、头像等最后的状态;在接收方的显示界面中可以显示出双方动作的产生到消亡,通常不会显示上述接收方的状态、头像等最后的状态,提高趣味性,增强用户体验。
另外,该实施例还可以接收拖动指令,在显示界面移动展示的对象等。
本说明书实施例提供的人机交互方法,接收来自于发送方的动作指令,并响应于所述动作指令显示与所述动作指令对应的效果,实现了基于动作指令的人机交互。
本说明书实施例提供的人机交互方法,与所述动作指令对应的效果均可以是在三维状态下展示,具体可以是三维增强现实、虚拟现实或混合现实展示。
在一个具体的实施例中,在发送方的显示界面中还可以生成如下效果:甲(发送方)发送一个雪球,乙(接收方)发送一个火球,火球和雪球相撞后火球会削弱并飞向甲方,然后甲方图像着火等;又例如,甲方和乙方同时发送火球或同时发送水球,碰撞后会散落成火花或雪花溅落,形成奇幻的艺术效果,提高趣味性,增强用户体验。
以上说明书部分详细介绍了人机交互方法实施例,如图12所示,本说明书还提供了一种人机交互装置1200,如图12所示,装置1200包括:
图像获取模块1202,可以用于获取用于指示终端设备执行动作的图像;
动作指令确定模块1204,可以用于基于所述图像的图像特征确定匹配的动作指令;
执行模块1206,可以用于响应于所述动作指令,执行与所述动作指令相匹配的操作。
本说明书实施例提供的交互装置,基于获取到图像的图像特征确定动作指令并响应于所述动作指令,执行与所述动作指令相匹配的操作,实现了基于获取的图像的人机交互。
可选地,作为一个实施例,所述图像获取模块1202,可以用于响应于用户对展示的预设图像的选择操作,获取被选择的图像。
可选地,作为一个实施例,所述图像获取模块1202,可以用于通过摄像采集设备采集用户的图像。
可选地,作为一个实施例,所述用于指示终端设备执行动作的图像包括手势图像、人脸图像或人体图像。
可选地,作为一个实施例,所述动作指令确定模块1204,可以用于基于所述手势特征和获取的附加动态特征确定匹配的动作指令。
可选地,作为一个实施例,所述动作指令确定模块1204,可以用于在预设场景下,基于所述图像的图像特征和所述附加动态特征确定匹配的动作指令。
可选地,作为一个实施例,所述动作指令确定模块1204,可以用于基于所述图像的图像特征和获取的场景特征确定匹配的动作指令。
可选地,作为一个实施例,所述装置1200还包括保存模块,可以用于保存所述图像特征和所述场景特征。
可选地,作为一个实施例,所述执行模块1206,可以用于基于所述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染。
可选地,作为一个实施例,所述装置1200还包括发送模块,可以用于向接收方发送所述动作指令。
根据本说明书实施例的上述人机交互装置1200可以参照对应前文本说明书实施例的图1至图9所示的人机交互方法的流程,并且,该人机交互装置1200中的各个单元/模块和上述其他操作和/或功能分别为了实现人机交互方法中的相应流程,为了简洁, 在此不再赘述。
如图13所示,本说明书还提供了一种人机交互装置1300,如图13所示,该装置1300包括:
接收模块1302,可以用于接收来自于发送方的动作指令;
效果显示模块1304,可以用于响应于所述动作指令,显示与所述动作指令对应的效果,所述与所述动作指令对应的效果包括下述至少一种:
对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
对与发送方进行通讯的消息边框颜色的处理效果;
屏幕振动反转;或
视频或动画播放。
上述视频具体可以是H264、H265等编码格式的视频文件,或是三维模型及时演算动画,即接收方接收到上述视频文件后即可自动播放;上述动画具体可以是强化表现人物表情的动画、画外音的艺术文字以及一些背景动画效果等,接收方接收到上述动画后即可自动播放。
另外,该实施例在发送方的显示界面还可以显示接收方三维模型状态发生变化,具体可以是展示接收方身上中弹、接收方身上有雪花等增强现实、虚拟现实或混合现实等三维显示效果。
此外,该实施例在发送方的显示界面还可以显示接收方的头像的处理效果例如,具体可以是接收方头像变成乌龟或其他的增强现实、虚拟现实或混合现实等接收方头像的三维显示变化样式,提高趣味性,增强用户体验。
上述显示效果中,在发送方的显示界面中可以显示出双方动作的产生到消亡,以及接收方的状态、头像等最后的状态;在接收方的显示界面中可以显示出双方动作的产生到消亡,通常不会显示上述接收方的状态、头像等最后的状态,提高趣味性,增强用户体验。
本说明书实施例提供的人机交互装置,接收来自于发送方的动作指令,并响应于所述动作指令显示与所述动作指令对应的效果,实现了基于接收的动作指令的人机交互。
根据本说明书实施例的上述人机交互装置1300可以参照对应前文本说明书实施 例的图10至图11所示的人机交互方法的流程,并且,该人机交互装置1300中的各个单元/模块和上述其他操作和/或功能分别为了实现人机交互方法中的相应流程,为了简洁,在此不再赘述。
本说明书上述各个实施例能够实现的效果具体可以参见图14,在用户输入时,不仅实现了文本输入、语音输入、图片输入和短视频输入,还可以实现人脸识别、动作识别、场景识别等,并根据识别的人脸、动作和场景等变幻出不同的效果发送。用户接收时,不仅实现了普通的文本展示、语音播放、图片动态播放短视频播放等,还实现了状态发生变化、动画声音播放屏幕震动反馈等效果,上述状态发生变化,例如包括发送方身上中弹、发送方头像变成乌龟、动态更换背景等。
下面将结合图15详细描述根据本说明书实施例的电子设备。参考图15,在硬件层面,电子设备包括处理器,可选地,包括内部总线、网络接口、存储器。其中,如图15所示,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括实现其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图15中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成转发聊天信息的装置。处理器,执行存储器所存放的程序,并具体用于执行本说明书前文所述的方法实施例的操作。
上述图1至图11所示实施例揭示的方法、装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、 网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本说明书实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本说明书实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
图15所示的电子设备还可执行图1至图11的方法,并实现人机交互方法在图1至图11所示实施例的功能,本说明书实施例在此不再赘述。
当然,除了软件实现方式之外,本说明书的电子设备并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
本说明书实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述图1至图11所示的各个方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。其中,所述的计算机可读存储介质,如只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。
本领域内的技术人员应明白,本说明书的实施例可提供为方法、***、或计算机程序产品。因此,本说明书可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书是参照根据本说明书实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理 器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本说明书的实施例而已,并不用于限制本说明书。对于本领域技术人员来说,本说明书可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本说明书的权利要求范围之内。

Claims (17)

  1. 一种人机交互方法,包括:
    获取用于指示终端设备执行动作的图像;
    基于所述图像的图像特征确定匹配的动作指令;
    响应于所述动作指令,执行与所述动作指令相匹配的操作。
  2. 根据权利要求1所述的方法,所述获取用于指示终端设备执行动作的图像包括:
    响应于用户对展示的预设图像的选择操作,获取被选择的图像。
  3. 根据权利要求1所述的方法,所述获取用于指示终端设备执行动作的图像包括:
    通过摄像采集设备采集用户的图像。
  4. 根据权利要求1至3任一项所述的方法,所述用于指示终端设备执行动作的图像包括手势图像、人脸图像或人体图像。
  5. 根据权利要求4所述的方法,所述基于所述图像的图像特征确定匹配的动作指令之前,所述方法还包括:
    获取与所述图像相关的附加动态特征;
    其中,所述基于所述图像的图像特征确定匹配的动作指令包括:基于所述图像的图像特征和所述附加动态特征确定匹配的动作指令。
  6. 根据权利要求5所述的方法,
    所述基于所述图像的图像特征和所述附加动态特征确定匹配的动作指令包括:在预设场景下,基于所述图像的图像特征和所述附加动态特征确定匹配的动作指令。
  7. 根据权利要求1所述的方法,
    所述方法还包括:获取所述图像所应用的场景特征;
    其中,所述基于所述图像的图像特征确定匹配的动作指令包括:基于所述图像的图像特征和所述场景特征确定匹配的动作指令。
  8. 根据权利要求7所述的方法,
    所述方法还包括:保存所述图像特征和所述场景特征。
  9. 根据权利要求1所述的方法,
    所述响应于所述动作指令,执行与所述动作指令相匹配的操作包括:
    基于所述动作指令生成渲染指令,以对所述动作指令相关的目标对象进行渲染。
  10. 根据权利要求9所述的方法,
    所述方法还包括:向接收方发送所述动作指令。
  11. 一种人机交互方法,应用在接收方,包括:
    接收来自于发送方的动作指令;
    响应于所述动作指令,显示与所述动作指令对应的效果;
    其中,所述与所述动作指令对应的效果包括下述至少一种:
    对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
    对与发送方进行通讯的消息边框颜色的处理效果;
    屏幕振动反转;或
    视频或动画播放播放。
  12. 一种人机交互装置,包括:
    图像获取模块,获取用于指示终端设备执行动作的图像;
    动作指令确定模块,基于所述图像的图像特征确定匹配的动作指令;
    执行模块,响应于所述动作指令,执行与所述动作指令相匹配的操作。
  13. 一种人机交互装置,包括:
    接收模块,接收来自于发送方的动作指令;
    效果显示模块,响应于所述动作指令,显示与所述动作指令对应的效果;
    其中,所述与所述动作指令对应的效果包括下述至少一种:
    对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
    对与发送方进行通讯的消息边框颜色的处理效果;
    屏幕振动反转;或
    视频或动画播放。
  14. 一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如下操作:
    获取用于指示终端设备执行动作的图像;
    基于所述图像的图像特征确定匹配的动作指令;
    响应于所述动作指令,执行与所述动作指令相匹配的操作。
  15. 一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如下操作:
    接收来自于发送方的动作指令;
    响应于所述动作指令,显示与所述动作指令对应的效果;
    其中,所述与所述动作指令对应的效果包括下述至少一种:
    对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
    对与发送方进行通讯的消息边框颜色的处理效果;
    屏幕振动反转;或
    视频或动画播放。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:
    获取用于指示终端设备执行动作的图像;
    基于所述图像的图像特征确定匹配的动作指令;
    响应于所述动作指令,执行与所述动作指令相匹配的操作。
  17. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:
    接收来自于发送方的动作指令;
    响应于所述动作指令,显示与所述动作指令对应的效果;
    其中,所述与所述动作指令对应的效果包括下述至少一种:
    对终端设备的发送方头像的处理效果和/或对终端设备的接收方头像的处理效果;
    对与发送方进行通讯的消息边框颜色的处理效果;
    屏幕振动反转;或
    视频或动画播放。
PCT/CN2019/089209 2018-08-02 2019-05-30 一种人机交互方法和装置 WO2020024692A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810871070.2A CN109254650B (zh) 2018-08-02 2018-08-02 一种人机交互方法和装置
CN201810871070.2 2018-08-02

Publications (1)

Publication Number Publication Date
WO2020024692A1 true WO2020024692A1 (zh) 2020-02-06

Family

ID=65049153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089209 WO2020024692A1 (zh) 2018-08-02 2019-05-30 一种人机交互方法和装置

Country Status (3)

Country Link
CN (2) CN109254650B (zh)
TW (1) TWI782211B (zh)
WO (1) WO2020024692A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022017184A1 (zh) * 2020-07-23 2022-01-27 北京字节跳动网络技术有限公司 交互方法、装置、电子设备及计算机可读存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254650B (zh) * 2018-08-02 2021-02-09 创新先进技术有限公司 一种人机交互方法和装置
CN110083238A (zh) * 2019-04-18 2019-08-02 深圳市博乐信息技术有限公司 基于增强现实技术的人机互动方法与***
CN110609921B (zh) * 2019-08-30 2022-08-19 联想(北京)有限公司 一种信息处理方法和电子设备
CN110807395A (zh) * 2019-10-28 2020-02-18 支付宝(杭州)信息技术有限公司 一种基于用户行为的信息交互方法、装置及设备
CN111338808B (zh) * 2020-05-22 2020-08-14 支付宝(杭州)信息技术有限公司 一种协同计算方法及***
CN111627097B (zh) * 2020-06-01 2023-12-01 上海商汤智能科技有限公司 一种虚拟景物的展示方法及装置
CN114035684A (zh) * 2021-11-08 2022-02-11 百度在线网络技术(北京)有限公司 用于输出信息的方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045398A (zh) * 2015-09-07 2015-11-11 哈尔滨市一舍科技有限公司 一种基于手势识别的虚拟现实交互设备
CN105487673A (zh) * 2016-01-04 2016-04-13 京东方科技集团股份有限公司 一种人机交互***、方法及装置
CN106095068A (zh) * 2016-04-26 2016-11-09 乐视控股(北京)有限公司 虚拟图像的控制方法及装置
US20180088663A1 (en) * 2016-09-29 2018-03-29 Alibaba Group Holding Limited Method and system for gesture-based interactions
CN109254650A (zh) * 2018-08-02 2019-01-22 阿里巴巴集团控股有限公司 一种人机交互方法和装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7159008B1 (en) * 2000-06-30 2007-01-02 Immersion Corporation Chat interface with haptic feedback functionality
US9041775B2 (en) * 2011-03-23 2015-05-26 Mgestyk Technologies Inc. Apparatus and system for interfacing with computers and other electronic devices through gestures by using depth sensing and methods of use
CN103916621A (zh) * 2013-01-06 2014-07-09 腾讯科技(深圳)有限公司 视频通信方法及装置
JP5503782B1 (ja) * 2013-06-20 2014-05-28 株式会社 ディー・エヌ・エー 電子ゲーム機、電子ゲーム処理方法及び電子ゲームプログラム
CN105468142A (zh) * 2015-11-16 2016-04-06 上海璟世数字科技有限公司 基于增强现实技术的互动方法、***和终端
CN105988583A (zh) * 2015-11-18 2016-10-05 乐视致新电子科技(天津)有限公司 手势控制方法及虚拟现实显示输出设备
CN106125903B (zh) * 2016-04-24 2021-11-16 林云帆 多人交互***及方法
CN106155311A (zh) * 2016-06-28 2016-11-23 努比亚技术有限公司 Ar头戴设备、ar交互***及ar场景的交互方法
US10471353B2 (en) * 2016-06-30 2019-11-12 Sony Interactive Entertainment America Llc Using HMD camera touch button to render images of a user captured during game play
CN106293461B (zh) * 2016-08-04 2018-02-27 腾讯科技(深圳)有限公司 一种交互式应用中的按键处理方法和终端以及服务器
CN107885317A (zh) * 2016-09-29 2018-04-06 阿里巴巴集团控股有限公司 一种基于手势的交互方法及装置
US20180126268A1 (en) * 2016-11-09 2018-05-10 Zynga Inc. Interactions between one or more mobile devices and a vr/ar headset
US10168788B2 (en) * 2016-12-20 2019-01-01 Getgo, Inc. Augmented reality user interface
CN106657060A (zh) * 2016-12-21 2017-05-10 惠州Tcl移动通信有限公司 一种基于现实场景的vr通讯方法及***
CN107705278B (zh) * 2017-09-11 2021-03-02 Oppo广东移动通信有限公司 动态效果的添加方法和终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045398A (zh) * 2015-09-07 2015-11-11 哈尔滨市一舍科技有限公司 一种基于手势识别的虚拟现实交互设备
CN105487673A (zh) * 2016-01-04 2016-04-13 京东方科技集团股份有限公司 一种人机交互***、方法及装置
CN106095068A (zh) * 2016-04-26 2016-11-09 乐视控股(北京)有限公司 虚拟图像的控制方法及装置
US20180088663A1 (en) * 2016-09-29 2018-03-29 Alibaba Group Holding Limited Method and system for gesture-based interactions
CN109254650A (zh) * 2018-08-02 2019-01-22 阿里巴巴集团控股有限公司 一种人机交互方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022017184A1 (zh) * 2020-07-23 2022-01-27 北京字节跳动网络技术有限公司 交互方法、装置、电子设备及计算机可读存储介质
US11842425B2 (en) 2020-07-23 2023-12-12 Beijing Bytedance Network Technology Co., Ltd. Interaction method and apparatus, and electronic device and computer-readable storage medium

Also Published As

Publication number Publication date
CN109254650A (zh) 2019-01-22
TW202008143A (zh) 2020-02-16
CN112925418A (zh) 2021-06-08
TWI782211B (zh) 2022-11-01
CN109254650B (zh) 2021-02-09

Similar Documents

Publication Publication Date Title
WO2020024692A1 (zh) 一种人机交互方法和装置
US11182615B2 (en) Method and apparatus, and storage medium for image data processing on real object and virtual object
US11595617B2 (en) Communication using interactive avatars
US10699461B2 (en) Telepresence of multiple users in interactive virtual space
JP7268071B2 (ja) バーチャルアバターの生成方法及び生成装置
US20180088663A1 (en) Method and system for gesture-based interactions
WO2018033137A1 (zh) 在视频图像中展示业务对象的方法、装置和电子设备
WO2019173108A1 (en) Electronic messaging utilizing animatable 3d models
CN113228625A (zh) 支持复合视频流的视频会议
WO2022252866A1 (zh) 一种互动处理方法、装置、终端及介质
CN110555507B (zh) 虚拟机器人的交互方法、装置、电子设备及存储介质
WO2023070021A1 (en) Mirror-based augmented reality experience
CN108876878B (zh) 头像生成方法及装置
CN111880664B (zh) Ar互动方法、电子设备及可读存储介质
WO2023039390A1 (en) Controlling ar games on fashion items
WO2020042442A1 (zh) 表情包生成方法及装置
CN113411537A (zh) 视频通话方法、装置、终端及存储介质
KR20160010810A (ko) 실음성 표출 가능한 실사형 캐릭터 생성 방법 및 생성 시스템
CN114779948B (zh) 基于面部识别的动画人物即时交互控制方法、装置及设备
US11960653B2 (en) Controlling augmented reality effects through multi-modal human interaction
CN113176827B (zh) 基于表情的ar交互方法、***、电子设备及存储介质
US20240193838A1 (en) Computer-implemented method for controlling a virtual avatar
US20230154126A1 (en) Creating a virtual object response to a user input
CN113908553A (zh) 游戏角色表情生成方法、装置、电子设备及存储介质
TWI583198B (zh) 使用互動化身的通訊技術

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19844048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19844048

Country of ref document: EP

Kind code of ref document: A1