CN112925418A

CN112925418A - Man-machine interaction method and device

Info

Publication number: CN112925418A
Application number: CN202110302002.6A
Authority: CN
Inventors: 荣涛
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Alibaba Group Holding Ltd; Advanced New Technologies Co Ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2021-06-08
Also published as: CN109254650A; CN109254650B; WO2020024692A1; TWI782211B; TW202008143A

Abstract

The embodiment of the specification discloses a man-machine interaction method and a man-machine interaction device, wherein the method comprises the following steps: and acquiring the scene characteristics selected by the user. And responding to the selection operation of the user on the preset image, and acquiring the selected image for instructing the terminal equipment to execute the action. And determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the preset mapping relation between the image characteristics and the action instruction for different scene characteristics. And executing the operation matched with the action instruction.

Description

Man-machine interaction method and device

The document is a divisional application of a man-machine interaction method and a man-machine interaction device, the application number of a parent application is '201810871070.2', and the application date is '2018-08-02'.

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a human-computer interaction method and apparatus.

Background

Augmented Reality (AR) technology is a technology that increases the perception of a user to the real world through information provided by a computer system, applies virtual information to the real world, and superimposes virtual object, scene or system prompt information generated by a computer to the real scene, thereby realizing the enhancement of reality and achieving sensory experience beyond reality.

Virtual Reality (VR) generates a three-dimensional Virtual world which is the same as or similar to a real scene through simulation calculation, and a user can play games, activities or perform certain operations in the Virtual real world, and the whole process is as in the real world, so that the user is provided with all-round simulation experiences of vision, hearing, touch and the like.

Mixed Reality (MR) technology includes augmented reality and augmented virtual, referring to a new visualization environment created by merging real and virtual worlds. In the new visualization environment, physical and virtual objects (i.e., digital objects) coexist and interact in real-time.

At present, AR, VR and MR technologies are still in development stage, and man-machine interaction technologies related to the technologies are not mature, so that a man-machine interaction scheme is needed.

Disclosure of Invention

The embodiment of the specification provides a human-computer interaction method and device, which are used for realizing human-computer interaction.

The embodiment of the specification adopts the following technical scheme:

in a first aspect, a human-computer interaction method is provided, including: acquiring scene characteristics selected by a user; responding to the selection operation of a user on a preset image, and acquiring the selected image, wherein the image is used for indicating the terminal equipment to execute the action; determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction; and executing the operation matched with the action instruction.

In a second aspect, a human-computer interaction method is provided, which is applied to a receiver in a communication scenario with multiple user interactions, and includes: receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with image characteristics and scene characteristics and is determined according to the image characteristics of an image respectively selected in a communication application interface by a sender user and a receiver user which are mutually interactive and the scene characteristics respectively selected and applied by the image; and responding to the action instruction, and displaying an effect corresponding to the action instruction and aiming at the communication application interface. In a third aspect, a human-computer interaction device is provided, including: the image acquisition module is used for acquiring scene characteristics selected by a user; responding to the selection operation of a user on a preset image, and acquiring a selected image, wherein the image is used for indicating the terminal equipment to execute an action; the action instruction determining module is used for determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction; and the execution module executes the operation matched with the action instruction.

In a fourth aspect, a human-computer interaction device is provided, including: the receiving module is used for receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with the image characteristic and the scene characteristic and is based on the image characteristic of the image respectively selected in a communication application interface and the scene characteristic respectively selected and applied by the image by the mutually interactive sender and receiver users; and the effect display module responds to the action instruction and displays the effect corresponding to the action instruction and aiming at the communication application interface.

In a fifth aspect, an electronic device is provided, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor performing the operations of: acquiring scene characteristics selected by a user; responding to the selection operation of a user on a preset image, and acquiring the selected image, wherein the image is used for indicating the terminal equipment to execute the action; determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction; and executing the operation matched with the action instruction.

In a sixth aspect, an electronic device is provided, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor performing the operations of: receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with image characteristics and scene characteristics and is determined according to the image characteristics of an image respectively selected in a communication application interface by a sender user and a receiver user which are mutually interactive and the scene characteristics respectively selected and applied by the image; and responding to the action instruction, and displaying an effect corresponding to the action instruction and aiming at the communication application interface. In a seventh aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor, performs the following: acquiring scene characteristics selected by a user; responding to the selection operation of a user on a preset image, and acquiring the selected image, wherein the image is used for indicating the terminal equipment to execute the action; determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction; and executing the operation matched with the action instruction.

In an eighth aspect, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the operations of: receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with image characteristics and scene characteristics and is determined according to the image characteristics of an image respectively selected in a communication application interface by a sender user and a receiver user which are mutually interactive and the scene characteristics respectively selected and applied by the image; and responding to the action instruction, and displaying an effect corresponding to the action instruction and aiming at the communication application interface. The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects: and determining a matched action instruction based on the image characteristics of the acquired image, and responding to the action instruction to execute the operation matched with the action instruction, so that the man-machine interaction based on the acquired image is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

fig. 1 is a schematic flowchart of a human-computer interaction method provided in an embodiment of the present specification;

FIG. 2 is a flowchart illustrating a human-computer interaction method according to another embodiment of the present disclosure;

FIG. 3 is a schematic view of a display interface in the embodiment shown in FIG. 2;

FIG. 4 is a flowchart illustrating a human-computer interaction method according to yet another embodiment of the present disclosure;

FIG. 5 is a schematic view of a display interface in the embodiment shown in FIG. 4;

FIG. 6 is a flowchart illustrating a human-computer interaction method according to yet another embodiment of the present disclosure;

FIG. 7 is a schematic view of a display interface in the embodiment shown in FIG. 6;

FIG. 8 is a schematic diagram of an initial interface of a human-computer interaction method provided by an embodiment of the present specification;

FIG. 9 is another diagram illustrating an initial interface of a human-computer interaction method according to an embodiment of the present disclosure;

FIG. 10 is a flowchart illustrating a human-computer interaction method according to a further embodiment of the present disclosure;

FIG. 11 is a schematic view of a display interface in the embodiment shown in FIG. 10;

FIG. 12 is a schematic structural diagram of a human-computer interaction device according to an embodiment of the present disclosure;

FIG. 13 is a schematic structural diagram of a human-computer interaction device according to another embodiment of the present disclosure;

fig. 14 is a schematic diagram of effects that can be achieved by various embodiments of the present specification.

Fig. 15 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

As shown in FIG. 1, one embodiment of the present specification provides a human-computer interaction method 100, comprising the steps of:

s102: an image for instructing the terminal device to perform an action is acquired.

The image for instructing the terminal device to execute the motion acquired in the embodiment of this specification may be a gesture image, a human face image, a human body image of the whole body of the user, or a local image of the body of the user, and the specification is not limited in particular.

The image acquired in the embodiment of the present specification may be a single image, or may be a plurality of frames of images in a captured video stream.

In addition, the image acquired in this step may be an image of a single user or an image of a plurality of users.

The step can be to acquire images from a plurality of pre-stored images or to acquire the images in real time. If the images can be pre-stored, then step S102 can acquire one image from the stored plurality of images, for example, acquire one image selected by the user. In addition, if the image is acquired in real time, the step S102 may acquire the image in real time based on an image sensor of the terminal device or the like.

S104: determining a matching action instruction based on image features of the image.

The image feature in this step corresponds to the acquired image, and may specifically be extracted from the acquired image, for example, if the acquired image is a gesture image, the image feature at this point may be a gesture feature; if the acquired image is a face image, the image characteristics at the position can be face characteristics; the acquired image is a human body image, and the image feature at this position may be a posture or motion feature of a human body or the like.

Before the embodiment is executed, a mapping relation table of the image features and the action instructions may be established in advance, so that step S104 may determine the matched action instructions directly by table lookup.

Optionally, in different application scenarios, the same image feature may correspond to different action instructions, and therefore, before the embodiment is executed, a mapping relationship table of the image feature and the action instruction may also be established in different scenarios, and the embodiment may be executed in a determined scenario, for example, the embodiment may be executed in a scenario selected by a user, for example, the embodiment may also be executed in a scenario acquired based on AR scanning, or in a preset VR environment, or in a preset MR environment, and so on.

S106: and responding to the action instruction, and executing the operation matched with the action instruction.

In this step, in response to the action instruction, an operation matched with the action instruction is executed, for example, in an augmented reality scene of stand-alone human-computer interaction, a rendering instruction may specifically be generated based on the action instruction; and then the target object related to the action instruction is rendered.

In addition, in a chat scene of the sender and the receiver, the target object related to the action instruction is rendered, and meanwhile, the action instruction can be sent to the receiver, so that the receiver can generate a rendering instruction based on the action instruction to render the target object related to the action instruction. And simultaneously, displaying the target object of the augmented reality display at the sender. The above mentioned target object may be specifically an augmented reality scene, a virtual reality scene, a mixed reality scene, and the like; in addition, the display effects and the related display technologies mentioned in the various embodiments of the present specification may be implemented based on the OpenCV vision library.

The sending of the action instruction to the receiving party mentioned above may specifically be sending the action instruction to a server, and then sending the action instruction to the receiving party by the server; alternatively, in a scenario where there is no server but a client-to-client directly, the sender may directly send the action instruction to the receiver.

According to the man-machine interaction method provided by the embodiment of the specification, the matched action instruction is determined based on the image characteristics of the acquired image, the action instruction is responded, the operation matched with the action instruction is executed, and man-machine interaction based on the acquired image is realized.

Optionally, the embodiments of the present disclosure may also be applied in AR, VR, MR, and other scenarios.

To explain the man-machine interaction method provided in the embodiments of the present specification in detail, as shown in fig. 2 and fig. 3, another embodiment of the present specification provides a man-machine interaction method 200, including the following steps:

s202: and responding to the selection operation of the user on the displayed preset image, and acquiring the selected gesture image, the human face image or the human body image.

As shown in the schematic application interface diagram of fig. 3, in this embodiment, a plurality of gesture images may be displayed on the display interface in advance, specifically, see a box below the text "gesture selection" on the right side in fig. 3, and when the user clicks and selects one of the gesture images, the gesture image may be acquired in this step.

Optionally, a plurality of facial expression images, human body motion posture images, and the like can be displayed in advance in the embodiment, and when the user selects the facial expression images or the human body motion images, the steps can be performed to obtain the facial expression images or the human body motion images.

Optionally, the pre-displayed gesture image may include a left-hand gesture image; a gesture image of a right hand; the method can also comprise a gesture image of single-hand fist making or finger closing; a gesture image with a single hand released or fingers extended; and hand gesture images of love in which the middle finger and the ring finger are closed and the other fingers are stretched, and the like.

The pre-displayed facial expression image can be a laughing expression image, a sad expression image, a crying expression image and the like.

The pre-displayed human body motion posture image may be a human body posture image stood by 90 degrees, a human body motion posture image standing by army, or the like.

S204: and determining the action instruction based on the image characteristics of the selected image in the preset scene.

Before the embodiment is executed, the corresponding relationship between the image and the image feature may be stored in advance, so that the image feature may be directly determined based on the image selected by the user, for example, if the gesture image selected by the user is an image of a single hand making a fist, the gesture feature may be a feature indicating that the single hand makes a fist.

Before the embodiment is executed, a mapping relation table between the image feature and the action instruction may be established in advance, so that step S204 may determine the matched action instruction directly by table lookup.

Optionally, in different application scenarios, the same image feature may correspond to different action instructions, and therefore, before the embodiment is executed, a mapping relationship table of the image feature and the action instruction may also be established in different scenarios, and the embodiment may be executed in a determined scenario, for example, the embodiment may be executed in a scenario selected by a user, or for example, the embodiment may also be executed in a scenario acquired based on AR scanning, or in a preset VR scenario, or in a preset MR scenario, and so on.

When determining the action instruction based on the image feature, the current application scene may be determined first, and then the action instruction corresponding to the image feature acquired in the current application scene is determined, for example, in a scenario of a standalone combat game, the action instruction for punching a fist may be determined based on a gesture feature of a single hand gripping a fist.

S206: and responding to the action instruction, and executing the operation matched with the action instruction.

In response to the action instruction, the performing of the operation matched with the action instruction in this step may specifically be generating a rendering instruction based on the action instruction, and rendering a target object related to the action instruction, for example, displaying a target object of strong reality, virtual reality, or mixed reality in a box on the left side of the gesture image displayed in advance in fig. 3, where the displayed target object may be an augmented reality, virtual reality, or mixed reality scene image.

After the action instruction is responded and the operation matched with the action instruction is executed, the action instruction can be sent to the receiving party, so that the receiving party generates a rendering instruction based on the action instruction, and the target object related to the action instruction is rendered.

The interaction method provided by the embodiment of the specification determines the matched action instruction based on the image characteristics of the acquired image, and responds to the action instruction to execute the operation matched with the action instruction, so that the man-machine interaction based on the acquired image is realized.

In addition, in the embodiments of the present specification, a plurality of gesture images, face images, or body images are stored in advance. Therefore, the user can select the content quickly and conveniently, and the user experience is improved.

Optionally, the order of the gesture images shown in advance in the display interface shown in fig. 3, or the display order of the face images or the human body images in other embodiments, may be sorted based on the historical use frequency of the user, for example, if the frequency of the user selecting the gesture image for making a fist with a single hand is the highest, the gesture image for making a fist with a single hand is arranged at the first position for display, which is further convenient for the user to select, and improves the user experience.

It should be noted that the above embodiments can also be applied to a scenario where multiple devices interact with multiple users at the same time. Specifically, for example, the gesture images selected by the user from the plurality of displayed gesture images, such as a, b, c, etc., are acquired through step S202; through step S204 and step S206, in a preset scene where a user a, a user b, a user c, etc. interact with each other, the image features are sent to the user a, the user b, the user c, etc. based on the image features of the gesture images respectively selected. Meanwhile, each terminal device can acquire the gesture image of each user in real time, if the gesture image is matched with the pre-selected image characteristics to reach a certain degree of engagement, subsequent logic operation is executed, for example, the scene selected by the terminal devices such as the first terminal device, the second terminal device, the third terminal device and the like is an ancient temple, a tradition rock door is arranged in the front, and when the plurality of devices recognize the action of pushing hands forward, the rock door can be opened slowly, and the like.

In the embodiments shown in fig. 2 and 3, gesture images, face images, human body images, or the like are shown in advance, and the number of the shown images is considered to be limited; and the content of the pre-displayed image is not rich enough, in order to further increase the number of images and the richness of the images, enhance the user interaction and increase the user interaction pleasure, as shown in fig. 4 and 5, another embodiment of the present specification provides a human-computer interaction method 400, which includes the following steps:

s402: acquiring image features, wherein the image features comprise at least one of the following: gesture image features, face image features, human body image features, and motion features.

The embodiment can be applied to terminal equipment, the terminal equipment comprises a component which can be used for collecting images, the terminal equipment for operating augmented reality application is taken as an example, the component which is used for collecting the images on the terminal equipment can comprise an infrared camera and the like, and image features are obtained based on the obtained images after the images are obtained.

The above action features include, for example: a punch action characteristic, a hand waving action characteristic, a palm action characteristic, a running action characteristic, an upright standing action characteristic, a head shaking action characteristic, a head nodding action characteristic, and the like.

Optionally, before the embodiment is executed, an application scenario may be further identified in advance, for example, the application scenario may specifically include a scenario in which a sender and a receiver chat with each other; an application scenario of a cyber fighting game; a scene that a plurality of terminal devices chat and interact with each other, and the like.

When the image features are obtained, for example, the gesture features are obtained, the gesture features can be obtained by using the gesture feature classification model. The input parameter of the gesture feature classification model may be a collected gesture image (or a preprocessed gesture image, which is introduced in the next paragraph), and the output parameter may be a gesture feature. The gesture feature classification model can be generated and obtained in a Machine learning mode based on algorithms such as a Support Vector Machine (SVM)), a Convolutional Neural Network (CNN for short) or DL.

In order to improve the recognition accuracy of the gesture features, optionally, the step may also perform preprocessing on the acquired gesture image so as to remove noise. In particular, the pre-processing operations on the gesture image may include, but are not limited to: carrying out image enhancement on the acquired gesture image; carrying out image binarization; graying an image, denoising, and the like.

The acquisition mode of the face image features, the human body image features and the motion features is similar to the acquisition mode of the gesture features, and is not described herein again.

Before the embodiment is executed, a gesture image, a face image, a human body image, a motion image and the like can be collected in advance, and then a gesture image feature, a face image feature, a human body image feature and a motion feature are extracted based on the collected images.

Optionally, the embodiment may also determine whether to perform image preprocessing or determine the image preprocessing method to be used according to the accuracy requirement of the image features and the performance requirement (such as the response speed requirement). Specifically, for example, in an application scenario of a cyber fighting game with a high response speed requirement, the gesture image may not be preprocessed; under the scene with higher requirement on gesture precision, the acquired image can be preprocessed.

S404: and determining a matched action instruction based on the image characteristics and the additional dynamic characteristics selected by the user in a preset scene.

Before the embodiment is executed, a scene image may be acquired in advance, and the embodiment is executed in the acquired scene.

Specifically, when determining the matched action instruction based on the image feature and the additional dynamic feature selected by the user, the current application scene may be determined, and then the action instruction corresponding to the image feature and the additional dynamic feature selected by the user in the current application scene may be determined, for example, in a scenario of a standalone combat game, the action instruction of punch + fireball may be determined based on the gesture feature of a single hand making a fist and the dynamic feature of the additional fireball selected by the user. As shown in the application interface schematic diagram of fig. 5, in this embodiment, a plurality of additional dynamic effects may be displayed on the display interface in advance, specifically, see a circle below the text "additional dynamic effect" on the right side in fig. 5, and when the user clicks and selects one of the additional dynamic effects, this step may determine an action instruction based on the gesture feature and the additional dynamic effect feature.

In this embodiment, the selected additional dynamic feature corresponds to the acquired image. In other embodiments, if the acquired face features are obtained, a plurality of additional face-related dynamic effects may be displayed in advance on the display interface for selection by the user, and when the user selects the additional face features, the additional face features are generated to enhance the display of the face display effect and the like.

In other embodiments, if the acquired image features or motion features of the human body are acquired, a plurality of additional human body or motion-related dynamic effects may be displayed in advance on the display interface for the user to select, and when the user selects the additional dynamic features, the additional dynamic features may be generated.

Specifically, for example, the gesture feature that represents making a fist with a single hand is acquired in step S402, and if the additional dynamic effect (or called feature) is not selected, the motion instruction determined in this step only represents a motion instruction for making a fist; if the additional dynamic effect of the additional "snowball" is selected, the motion instructions determined at this step may be motion instructions with a cool effect including punching a fist and launching a snowball.

S406: and responding to the action instruction, and executing the operation matched with the action instruction.

In this step, in response to the action instruction, an operation matched with the action instruction is performed, specifically, a rendering instruction is generated based on the action instruction, and a target object related to the action instruction is rendered, for example, an augmented reality, a virtual reality, or a mixed reality target object is displayed in a left box in fig. 5, and the displayed target object may be an augmented reality, a virtual reality, or a mixed reality scene image.

The embodiment may also send the action instruction to the receiving party, so that the receiving party generates a rendering instruction based on the action instruction to render the target object related to the action instruction, and of course, the sending party may also display the augmented reality target object.

The interaction method provided by the embodiment of the specification acquires image characteristics, determines an action instruction based on the image characteristics and additional dynamic characteristics selected by a user, and responds to the action instruction to realize human-computer interaction based on the acquired image characteristics.

In addition, the embodiment acquires gesture image features, human face image features, human body image features, action features and the like based on the images acquired in real time, and compared with the images which are acquired in limited number and stored in advance, the image features which can be acquired are more abundant and diversified.

Meanwhile, the interaction of the user is increased by acquiring the user image in real time and acquiring the image characteristics, and particularly, the blending feeling and the interaction of the user are improved and the user experience is improved in some game scenes.

In addition, the embodiment of the specification stores the additional dynamic effect in advance for the user to select, so that the user can conveniently and quickly select the additional dynamic effect, a cool special effect is generated, and the user experience is improved.

Optionally, the order of the additional dynamic effects shown in advance in the display interface shown in fig. 5, or the display order of the additional dynamic effects on the human face features or the additional dynamic effects on the human body features in other embodiments, may be sorted based on the historical use frequency of the user, for example, the frequency of the user selecting "fireball" is the highest, referring to fig. 5, the additional dynamic effects of "fireball" are arranged in the first place for display, which is further convenient for the user to select, and improves the user experience.

It should be noted that the above embodiment may be applied not only in a single terminal device scenario, but also in a scenario where multiple devices interact with each other.

As shown in fig. 6 and 7, another embodiment of the present specification provides a human-computer interaction method 600, including the following steps:

s602: and acquiring the scene characteristics selected by the user.

Specifically, as shown in the application interface schematic diagram of fig. 7, in the embodiment, a plurality of preset scenes, for example, "avatar" scenes shown in fig. 7, may be displayed in advance on the display interface, and a plurality of subsequent scenes are schematically displayed by "×", where when a user clicks and selects one of the scenes, the step corresponds to the acquired scene feature.

In addition, the application interface in fig. 7 further includes a "more" button, so that when the user clicks, more preset scenes can be displayed.

S604: determining action instructions based on the scene features and the acquired image features, wherein the image features comprise at least one of the following: gesture image features, face image features, human body image features, and motion features.

The embodiment can be applied to a terminal device, where the terminal device includes a component that can be used to acquire an image, and the terminal device that runs an augmented reality application is taken as an example, and the component that is used to acquire the image on the terminal device may include an infrared camera and the like, and acquires image features based on the acquired image, and a specific acquisition process is described below with reference to the embodiment shown in fig. 4 and taking the acquisition of facial features as an example.

When the human face features are obtained, the human face features can be obtained by using the human face feature classification model. The input parameters of the face feature classification model can be collected face images (or preprocessed face images, introduced in the next section), and the output parameters can be face features. The face feature classification model can be generated and obtained in a Machine learning mode based on algorithms such as a Support Vector Machine (SVM)), a Convolutional Neural Network (CNN for short) or DL.

In order to improve the recognition accuracy of the face features, optionally, the step may also perform preprocessing on the acquired face image so as to remove noise. In particular, the pre-processing operations on the face image may include, but are not limited to: carrying out image enhancement on the collected face image; carrying out image binarization; graying an image, denoising, and the like.

When determining the matched action instruction based on the image feature and the scene feature, for example, in an application scene of a network chat with a sender and a receiver, the image feature and the scene feature may be fused, for example, the face feature and the scene feature are fused to generate an action instruction with the fused face feature and the fused scene feature, specifically, for example, a face region is reserved in a scene selected by a user, the face feature of the user is fused and displayed in the reserved face region, so that seamless docking of the face of the user and the selected scene is realized, an effect that the user is actually in the scene is generated, specifically, the face of a role in the scene, such as a user's midstream, becomes the face of the user, and the like.

The embodiment is particularly suitable for application scenes such as group photo, art photo stickers, art modeling, cosplay and the like.

S606: and responding to the action instruction, and executing the operation matched with the action instruction.

In this step, in response to the action instruction, an operation matched with the action instruction is executed, specifically, a rendering instruction is generated based on the action instruction, so as to render a target object related to the action instruction; the action instruction can also be sent to the receiving party, so that the receiving party generates a rendering instruction based on the action instruction, renders the target object related to the action instruction, and finally displays the target object of augmented reality, virtual reality or mixed reality.

In the application scenario of the group photo, after the operation of step S606, a message carrying the face features and the scene features may be sent to the receiving party, and the face features of the receiving party are obtained at the receiving party, so that the face features of the sending party, the face features of the receiving party, and the scene selected by the sending party are integrated, and the user experience is improved.

The interaction method provided by the embodiment of the specification acquires image characteristics and scene characteristics, determines action instructions based on the image characteristics and the scene characteristics, responds to the action instructions, achieves fusion of the image characteristics and various preset scenes, and facilitates improvement of user experience.

In addition, different preset scenes are stored in advance for the user to select, so that the obtained image can be changed into different shapes in different scenes, the interestingness is increased, and the user experience is improved.

Optionally, the embodiment may further store the displayed target object of augmented reality, virtual reality, or mixed reality, so as to facilitate subsequent use by the user. In one embodiment, the third-party camera equipment can be requested to shoot and record the current augmented reality, virtual reality or mixed reality view displayed on the screen of the terminal equipment from the outside, so that the augmented reality, virtual reality or mixed reality view storage is indirectly realized, and the augmented reality, virtual reality or mixed reality view required to be stored by a user can be flexibly acquired.

In another embodiment, the augmented reality, virtual reality or mixed reality view seen by the user on the display screen can also be intercepted and saved by way of a screenshot. The implementation mode not only intercepts and stores all augmented reality, virtual reality or mixed reality contents displayed on the screen, but also can selectively store augmented reality, virtual reality or mixed reality views according to the needs of a user.

For the specific application of the embodiments shown in fig. 1 to fig. 7 before the present description, the initial display interface thereof may be as shown in fig. 8 to fig. 9, and the user clicks the right-most add button to present the option of the star Card, and saves the function of the star Card in the chat interface, as shown in fig. 8, where the star Card may be an AR Card, an MR Card, or a VR Card, etc.

When the subsequent user uses the method, the user can click the ×) Card button shown in fig. 8, and then the operation steps of the embodiments shown in fig. 1 to fig. 7 can be executed; alternatively, when it is detected that the current scene of the user can execute the method steps of the embodiments shown in fig. 1 to fig. 7, the option of a star Card may be popped up in the message interface for the user to select and use, so as to improve the user experience.

It should be noted that fig. 8 and fig. 9 only schematically illustrate one triggering execution manner, and actually, the methods described in the foregoing embodiments may also be triggered and executed by other manners, for example, automatically executing by shaking the terminal device, executing by recognizing a specific voice uttered by the user, and the like, and the embodiments of this specification are not limited in particular.

As shown in fig. 10 and 11, another embodiment of the present specification provides a human-computer interaction method 1000, which is applied to a receiving party, and includes the following steps:

s1002: an action instruction is received from a sender.

The action instruction in this embodiment may be the action instruction mentioned in the foregoing embodiments shown in fig. 1 to 7, that is, the embodiment is applied to a receiving side, and an operation performed by a sending side of the embodiment may be the operation of each embodiment shown in fig. 1 to 7.

Of course, the motion command in this embodiment may be another motion command, i.e., independent from the embodiments shown in fig. 1 to 7.

S1004: responding to the action instruction, and displaying an effect corresponding to the action instruction;

wherein the effect corresponding to the action instruction comprises at least one of:

processing effect on the head portrait of the sender of the terminal equipment and/or processing effect on the head portrait of the receiver of the terminal equipment;

the processing effect on the color of the message frame communicated with the sender can be seen in fig. 11, and for the mentioned message frame, in the display interface, a friend with the net name x sends three messages, and each message includes the message frame.

The screen vibration is reversed, namely the whole terminal equipment screen vibrates and is reversed; or

And automatically playing videos, animations, voices and the like, wherein the animations comprise gif images.

The video can be specifically a video file in an encoding format such as H264, H265 and the like, and the video file can be automatically played after being received by a receiver; the animation can be specifically animation for intensifying expression of character expression, artistic words of voice-over, some background animation effects and the like, and the receiving party automatically plays the animation after receiving the animation.

In addition, the embodiment can also display the state change of the three-dimensional model of the receiver on the display interface of the sender, and specifically can display the three-dimensional display effects of augmented reality, virtual reality or mixed reality, and the like, such as a medium bullet on the receiver, snowflake on the receiver, and the like.

In addition, the embodiment may further display the processing effect of the avatar on the display interface of the sender, for example, the avatar of the receiver may be changed into a three-dimensional display change style of the avatar of the receiver such as a tortoise or other augmented reality, virtual reality, or mixed reality, so as to improve the interest and enhance the user experience.

In the display effect, the final states of the actions from the generation to the death of the two parties, the state of the receiving party, the head portrait and the like can be displayed on the display interface of the sending party; the occurrence and disappearance of actions of the two parties can be displayed on a display interface of the receiving party, and the state of the receiving party, the final state of the head portrait and the like can not be displayed generally, so that the interestingness is improved, and the user experience is enhanced.

In addition, the embodiment can also receive a dragging instruction, move a displayed object on the display interface, and the like.

The man-machine interaction method provided by the embodiment of the specification receives the action instruction from the sender, responds to the action instruction, and displays the effect corresponding to the action instruction, so that man-machine interaction based on the action instruction is realized.

In the human-computer interaction method provided in the embodiment of the present specification, the effect corresponding to the action instruction may be displayed in a three-dimensional state, and specifically may be displayed in a three-dimensional augmented reality, a virtual reality, or a mixed reality.

In a specific embodiment, the following effects can also be generated in the display interface of the sender: the first party (sender) sends a snowball, the second party (receiver) sends a fireball, the fireball can weaken and fly to the first party after the fireball and the snowball collide with each other, and then the image of the first party is ignited; for example, the party A and the party B send fire balls or water balls at the same time, and the fire balls or the water balls can be scattered into sparks or splash after collision, so that a fantastic artistic effect is formed, the interestingness is improved, and the user experience is enhanced.

The above description section introduces an embodiment of a human-computer interaction method in detail, as shown in fig. 12, and the present specification further provides a human-computer interaction apparatus 1200, as shown in fig. 12, the apparatus 1200 includes:

an image acquisition module 1202, which may be configured to acquire an image for instructing a terminal device to perform an action;

a motion instruction determination module 1204, operable to determine a matching motion instruction based on image features of the image;

the executing module 1206 may be configured to, in response to the action instruction, execute an operation matched with the action instruction.

The interaction device provided by the embodiment of the specification determines the action instruction based on the image characteristics of the acquired image and responds to the action instruction to execute the operation matched with the action instruction, so that the man-machine interaction based on the acquired image is realized.

Optionally, as an embodiment, the image obtaining module 1202 may be configured to obtain the selected image in response to a user selecting a displayed preset image.

Optionally, as an embodiment, the image obtaining module 1202 may be configured to obtain an image of a user through a camera shooting and collecting device.

Optionally, as an embodiment, the image for instructing the terminal device to perform the action includes a gesture image, a human face image, or a human body image.

Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine a matching action instruction based on the gesture feature and the acquired additional dynamic feature.

Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine, in a preset scene, a matching action instruction based on the image feature of the image and the additional dynamic feature.

Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine a matching action instruction based on the image feature of the image and the acquired scene feature.

Optionally, as an embodiment, the apparatus 1200 further includes a saving module, which may be configured to save the image feature and the scene feature.

Optionally, as an embodiment, the executing module 1206 may be configured to generate a rendering instruction based on the action instruction, so as to render a target object related to the action instruction.

Optionally, as an embodiment, the apparatus 1200 further includes a sending module, which may be configured to send the action instruction to a receiving party.

The human-computer interaction device 1200 according to the embodiment of the present specification may refer to the flows of the human-computer interaction method shown in fig. 1 to fig. 9 corresponding to the previous text specification embodiments, and each unit/module and the other operations and/or functions in the human-computer interaction device 1200 are respectively for implementing the corresponding flows in the human-computer interaction method, and are not described herein again for brevity.

As shown in fig. 13, the present specification further provides a human-computer interaction device 1300, as shown in fig. 13, the device 1300 includes:

a receiving module 1302, which may be configured to receive an action instruction from a sender;

an effect display module 1304, configured to display, in response to the action instruction, an effect corresponding to the action instruction, where the effect corresponding to the action instruction includes at least one of:

processing effect of message frame color communicated with a sender;

screen vibration inversion; or

Video or animation play.

The video can be specifically a video file in an encoding format such as H264, H265 and the like, or a three-dimensional model timely calculation animation, namely, the video file can be automatically played after being received by a receiver; the animation can be specifically animation for intensifying expression of character expression, artistic words of voice-over, some background animation effects and the like, and the receiving party can automatically play the animation after receiving the animation.

In addition, the display interface of the sender can also display that the state of the three-dimensional model of the receiver changes, and specifically can display three-dimensional display effects such as augmented reality, virtual reality or mixed reality, wherein the augmented reality is that a middle bullet is on the receiver, and snowflakes are on the receiver.

In addition, the embodiment may further display a processing effect of the avatar of the receiving party on the display interface of the sending party, for example, the avatar of the receiving party may be changed into a three-dimensional display change style of the avatar of the receiving party such as a tortoise or other augmented reality, a virtual reality, or a mixed reality, so as to improve interest and enhance user experience.

The man-machine interaction device provided by the embodiment of the specification receives the action instruction from the sender, responds to the action instruction, and displays the effect corresponding to the action instruction, so that man-machine interaction based on the received action instruction is realized.

The human-computer interaction device 1300 according to the embodiment of the present specification may refer to the flows of the human-computer interaction method shown in fig. 10 to fig. 11 corresponding to the embodiments of the previous text specification, and each unit/module and the other operations and/or functions in the human-computer interaction device 1300 are respectively for implementing the corresponding flows in the human-computer interaction method, and are not described herein again for brevity.

The effects that can be achieved by the above embodiments in this specification can be specifically shown in fig. 14, when a user inputs, not only text input, voice input, picture input, and short video input are achieved, but also face recognition, motion recognition, scene recognition, and the like can be achieved, and different effects are sent according to the recognized face, motion, scene, and the like. When the user receives the video, the effects of ordinary text display, voice playing, picture dynamic playing short video playing and the like are realized, and the effects of state change, animation sound playing screen vibration feedback and the like are also realized, wherein the state change comprises that the sender plays a medium shot on the body, the sender looks like a tortoise, the background is dynamically changed and the like.

An electronic apparatus according to an embodiment of the present specification will be described in detail below with reference to fig. 15. Referring to fig. 15, at a hardware level, the electronic device includes a processor, optionally an internal bus, a network interface, and a memory. As shown in fig. 15, the Memory may include a Memory, such as a Random-Access Memory (RAM), and may also include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware needed to implement other services.

The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 15, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a device for forwarding the chat information on a logic level. The processor executes the program stored in the memory and is specifically configured to perform the operations of the method embodiments described herein.

The methods performed by the methods and apparatuses disclosed in the embodiments of fig. 1 to 11 may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device shown in fig. 15 may also execute the method shown in fig. 1 to fig. 11, and implement the functions of the human-computer interaction method in the embodiments shown in fig. 1 to fig. 11, which are not described herein again in this specification.

Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

An embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of each method embodiment shown in fig. 1 to 11, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A human-computer interaction method, comprising:

acquiring scene characteristics selected by a user;

responding to the selection operation of a user on a preset image, and acquiring the selected image, wherein the image is used for indicating the terminal equipment to execute the action;

determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction;

and executing the operation matched with the action instruction.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

executing the operation matched with the action instruction, wherein the operation comprises the following steps:

and generating a rendering instruction based on the action instruction, and rendering the target object related to the action instruction in the communication application interface.

3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

the display sequence of the preset images is determined based on the user historical use frequency sorting.

4. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

the scene characteristics come from a communication scene of a plurality of user interactions, and the determined action command achieves the subscription engagement degree of the mutually interactive users in the communication scene.

5. The method of claim 1, the preset image comprising a gesture image, a face image, or a body image.

6. The method according to claim 5, before determining, according to the mapping relationship between the image features and the action commands preset for different scene features, an action command which is matched based on the image features of the image and the scene features applied by the image and reaches a predetermined engagement degree with mutually interacting users, the method further comprising:

acquiring additional dynamic features related to the image;

the action instruction which is determined to be matched based on the image characteristics of the image and the scene characteristics applied by the image and achieves the preset engagement degree with mutually interactive users according to the preset mapping relation between the image characteristics and the action instruction for different scene characteristics comprises the following steps: and determining an action instruction which is matched with the interactive users and achieves a preset engagement degree based on the image characteristics of the image, the additional dynamic characteristics and the scene characteristics applied by the image according to the preset mapping relation between the image characteristics and the action instruction for different scene characteristics.

7. A human-computer interaction method is applied to a receiver in a communication scene of multiple user interactions, and comprises the following steps:

receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with image characteristics and scene characteristics and is determined according to the image characteristics of an image respectively selected in a communication application interface by a sender user and a receiver user which are mutually interactive and the scene characteristics respectively selected and applied by the image;

and responding to the action instruction, and displaying an effect corresponding to the action instruction and aiming at the communication application interface.

8. The method of claim 7, wherein the effect on the communication application interface corresponding to the action instruction comprises at least one of:

processing effect of message frame color communicated with a sender;

screen vibration inversion; or

And playing video or animation.

9. A human-computer interaction device is applied to a communication scene of a plurality of user interactions, and comprises:

the image acquisition module is used for acquiring scene characteristics selected by a user; responding to the selection operation of a user on a preset image, and acquiring a selected image, wherein the image is used for indicating the terminal equipment to execute an action;

the action instruction determining module is used for determining a matched action instruction based on the image characteristics of the image and the scene characteristics applied by the image according to the mapping relation between the image characteristics preset for different scene characteristics and the action instruction;

and the execution module executes the operation matched with the action instruction.

10. A human-computer interaction device, applied to a recipient in a plurality of user interaction scenarios, comprising:

the receiving module is used for receiving an action instruction from a sender, wherein the action instruction comprises an action instruction which is determined to be matched with the image characteristic and the scene characteristic and is based on the image characteristic of the image respectively selected in a communication application interface and the scene characteristic respectively selected and applied by the image by the mutually interactive sender and receiver users;

and the effect display module responds to the action instruction and displays the effect corresponding to the action instruction and aiming at the communication application interface.

11. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor performing the operations of:

acquiring scene characteristics selected by a user;

and executing the operation matched with the action instruction.

12. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor performing the operations of:

13. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs operations comprising:

acquiring scene characteristics selected by a user;

and executing the operation matched with the action instruction.

14. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs operations comprising: