CN114418903A

CN114418903A - Man-machine interaction method and man-machine interaction device based on privacy protection

Info

Publication number: CN114418903A
Application number: CN202210070782.0A
Authority: CN
Inventors: 陈志远; 马晨光
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-04-29

Abstract

The embodiment of the specification provides a human-computer interaction method and a human-computer interaction device based on privacy protection. In the man-machine interaction method, when a user is in a visual field range of a 3D camera of a human body perception system, the 3D camera is used for acquiring a depth image of the user; converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system; obtaining the spatial position and/or the human posture of the user according to the converted 3D point cloud; and performing an interactive operation for the user based on the spatial position and/or the human body posture.

Description

Man-machine interaction method and man-machine interaction device based on privacy protection

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a human-computer interaction method and a human-computer interaction device based on privacy protection.

Background

The human-computer interaction is an information exchange process between a person and a computer for completing a determined task in a certain interaction mode, and the human-computer interaction is a product of the technical development of the computer. Human-computer interaction is widely used in various fields, such as wearable devices, immersive games, fingerprint recognition, and the like.

At present, in a human-computer interaction scene, a 2D image for a user is mainly acquired through a 2D camera, then, the 2D image of the user is analyzed, and a human-computer interaction device interacts with the user according to an analysis result. For example, in the immersion game, the human-computer interaction device collects the action image of the player through the 2D camera in real time and analyzes the game action of the player according to the action image, and the human-computer interaction device causes the game character to perform corresponding operations, such as a clip-changing action, according to the game action of the player. Therefore, man-machine interaction can be realized, and the user can experience the reality of the immersive game.

Disclosure of Invention

In view of the above, embodiments of the present specification provide a human-computer interaction method and a human-computer interaction device based on privacy protection. According to the technical scheme of the embodiment of the specification, man-machine interaction is achieved through the depth image collected by the 3D camera, and privacy safety of a user is achieved.

According to an aspect of embodiments of the present specification, there is provided a human-computer interaction method based on privacy protection, performed by a human perception system, the human-computer interaction method including: when a user is in a visual field range of a 3D camera of the human body perception system, acquiring a depth image of the user by using the 3D camera; converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system; obtaining the spatial position and/or the human posture of the user according to the converted 3D point cloud; and performing an interactive operation for the user based on the spatial position and/or the human body gesture.

According to another aspect of the embodiments of the present specification, there is also provided a human-computer interaction device based on privacy protection, applied to a human perception system, the human-computer interaction device including: a depth image acquisition unit which acquires a depth image of a user by using a 3D camera when the user is in a visual field range of the 3D camera of the human body perception system; the coordinate system conversion unit is used for converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system; the 3D point cloud computing unit is used for obtaining the spatial position and/or the human body posture of the user according to the converted 3D point cloud; and an interaction execution unit which executes an interactive operation for the user based on the spatial position and/or the human body posture.

According to another aspect of embodiments herein, there is also provided an electronic device, including: at least one processor, a memory coupled to the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement a human-computer interaction method as described in any of the above.

According to another aspect of embodiments of the present specification, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the human-computer interaction method as described above.

According to another aspect of embodiments of the present specification, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the human-computer interaction method as described in any one of the above.

Drawings

A further understanding of the nature and advantages of the contents of the embodiments of the present specification may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

Fig. 1 is a flowchart illustrating an example of a human-computer interaction method based on privacy protection according to an embodiment of the present specification.

Fig. 2 shows a schematic diagram of one example of a positional relationship between a camera coordinate system and an image coordinate system.

FIG. 3 shows a flowchart of one example of performing an interactive operation for a user according to an embodiment of the present specification.

Fig. 4 is a block diagram illustrating an example of a human-computer interaction device based on privacy protection according to an embodiment of the present specification.

Fig. 5 is a block diagram of an electronic device for implementing a human-computer interaction method according to an embodiment of the present specification.

Detailed Description

The subject matter described herein will be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiments of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

However, the 2D image of the user collected by the 2D camera contains a large amount of user privacy information, and the user privacy information in the 2D image is directly presented and is recognizable by naked eyes, which causes invasion of user privacy and leakage of user privacy.

In view of the above, embodiments of the present specification provide a human-computer interaction method and a human-computer interaction device based on privacy protection. In the man-machine interaction method, when a user is in a visual field range of a 3D camera of a human body perception system, a depth image of the user is acquired by using the 3D camera; converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system; obtaining the spatial position and/or the human posture of the user according to the converted 3D point cloud; and performing an interactive operation for the user based on the spatial position and/or the human body posture. According to the technical scheme of the embodiment of the specification, man-machine interaction is achieved through the depth image collected by the 3D camera, and privacy safety of a user is achieved.

The following describes a human-computer interaction method and a human-computer interaction device based on privacy protection, which are provided in an embodiment of the present specification, with reference to the accompanying drawings.

Fig. 1 shows a flowchart of an example 100 of a human-computer interaction method based on privacy protection according to an embodiment of the present description.

The human-computer interaction method provided by the embodiment of the specification can be executed by a human body perception system, and the human body perception system can be used for perceiving information such as spatial position and human body posture of a human body, so that human-computer interaction operation can be conveniently carried out on the human body perception system. The human body sensing system can comprise a plurality of sensors, such as an infrared sensor, a temperature sensor, a camera and the like. The human body perception system can be applied to different human-computer interaction scenes, and can operate in different devices in different human-computer interaction scenes. For example, the human body perception system can be applied to dragonfly face brushing machines, face brushing vending machines, self-service face brushing ordering machines, campus group meal payment treasures face brushing payment, face brushing access controls and other intelligent machines.

As shown in fig. 1, at 110, when a user is in the field of view of a 3D camera of a human perception system, a depth image of the user is acquired by the 3D camera.

In the embodiment of the present specification, the human perception system may be configured with at least one 3D camera, and the configured 3D camera may include at least one of a Structured light (Structured light) -based 3D camera, a TOF (Time of flight) based 3D camera, a binocular vision-based 3D camera, and the like. The following description will be given taking a 3D camera based on structured light as an example. Structured light based 3D cameras are 3D cameras that can make depth measurements using the principles of structured light and triangulation.

The image collected by the 3D camera is a depth image (depth image), which is an image in which the distance from the 3D camera to each point on the photographed user is taken as a pixel value, so that the depth distance of a position point in the real space represented by each pixel point in the depth image is relative to the 3D camera. Based on this, the depth image may directly reflect the geometry of the visible surface of the photographed user.

Compared with the image collected by the 2D camera, the depth image collected by the 3D camera realizes the minimization of the collection of the user information, and the privacy information of the user cannot be identified from the depth image by means of naked eyes, so that the privacy of the user is protected, and the privacy safety of the user is realized.

In this embodiment of the present description, when the 3D camera is in an operating state, in an example, whether there is a user in a field of view of the 3D camera may be monitored in real time, and when it is detected that there is a user, a depth image of the user may be acquired by using the 3D camera. In this example, the manner of monitoring the user may be monitored by other sensors, such as infrared sensing, temperature sensing, etc., for the sensor for monitoring the user is in communication connection with the 3D camera.

For example, when the user is monitored through temperature sensing, when it is detected that the temperature of an image appearing in the visual field range of the 3D camera is close to the temperature of a human body through the temperature sensing, it may be determined that the image is the human body image of the user, and thus it may be determined that the user exists in the visual field range of the 3D camera. When the sensor monitors that a user exists, the 3D camera can be triggered to collect the depth image of the user.

In another example, the 3D camera may acquire depth images in the field of view in real time and then detect whether there is a user for each depth image, which may include a depth learning based detection model, keypoint detection, etc. When it is detected that a user exists in the depth image, the depth image including the user is acquired.

At 120, individual pixel points in the depth image based on the pixel coordinate system are converted into a 3D point cloud based on the spatial coordinate system.

In this specification embodiment, information for each point in three-dimensional space may be represented by a 3D point cloud. The pixel coordinate system is a two-dimensional coordinate system based on the depth image. In one example, the spatial coordinate system may be a world coordinate system. In another example, the spatial coordinate system may be a camera coordinate system referenced to the 3D camera, e.g., the camera coordinate system has the 3D camera as the origin of coordinates. The world coordinate system and the camera coordinate system are both three-dimensional coordinate systems.

When the spatial coordinate system is a camera coordinate system, the pixel coordinate system may be converted into an image coordinate system with the center of the image as an origin by means of translation. Based on the relation between the 3D camera and the imaging plane, the plane of the image coordinate system is parallel to the imaging plane, the plane formed by two axes in the space coordinate system is also parallel to the imaging plane, and therefore the plane of the image coordinate system is also parallel to the plane formed by the two axes in the space coordinate system. Further, the other axis other than the two axes in the space coordinate system passes through the origin of the image coordinate system.

Fig. 2 shows a schematic diagram of one example of a positional relationship between a camera coordinate system and an image coordinate system. As shown in FIG. 2, the camera coordinate system is set to 0_CIs an origin, X_CAxis, Y_CAxis and Z_CThe axes are three axes constituting a camera coordinate system, the image coordinate system takes 0 as an origin, and the X-axis and the Y-axis are two axes of a two-dimensional image coordinate system. Wherein, the plane where the image coordinate system is located and the X_CAnd Y_CFormed plane-parallel, Z_CThe axis passes through the origin of the image coordinate system.

Based on the positional relationship between the camera coordinate system and the image coordinate system, the image coordinate system may be converted into the camera coordinate system using the relationship of the similar triangles. In one example, a straight line between any point in the camera coordinate system and the origin passes through a plane in which the image coordinate system is located, so that the origin in the camera coordinate system, the origin in the image coordinate system, and the point passing through the plane in which the image coordinate system is located may constitute one triangle (hereinafter, referred to as a first triangle), and further, the origin in the camera coordinate system, the origin in the image coordinate system, and the point in the camera coordinate system may constitute another triangle (hereinafter, referred to as a second triangle). The first triangle and the second triangle are similar triangles, so that each pixel point of the image coordinate system can be converted into a 3D point cloud of the camera coordinate system based on the relationship between the two similar triangles.

Taking FIG. 2 as an example, a point P in the camera coordinate system and the origin 0_CThe point where the straight line between them passes through the plane where the image coordinate system is located is P', so that the first triangle 0_C0P' and a second triangle 0_C0P is a similar triangle, so that based on the similar triangle, P' in the image coordinate system can be converted to a 3D point cloud of P points in the camera coordinate system.

When the spatial coordinate system is a world coordinate system, it is also necessary to convert the camera coordinate system into the world coordinate system. The relationship between the camera coordinate system and the world coordinate system can be expressed as:

wherein the point in the camera coordinate system is X_C、Y_CAnd Z_CRepresenting points in a space coordinate system by X_W、Y_WAnd Z_WAnd (4) showing. R denotes rotation and t denotes translation. That is, for each point in the camera coordinate system, it can be rotated once and translated once, and can be converted to a corresponding point in the world-based coordinate system.

Through the conversion mode, each point in the camera coordinate system can be converted into a point based on the world coordinate system, and therefore 3D point cloud based on the world coordinate system can be obtained.

After the conversion operation, the 3D point cloud corresponding to each pixel point in the depth image can be obtained.

In an example of the embodiment of the present specification, before the coordinate system conversion is performed, a denoising process may also be performed on the depth image by using a bilateral filtering method. In the bilateral filtering mode, for each pixel point in the depth image, the distance value of each pixel point around the pixel point may be weighted and averaged, and then the distance value obtained through weighted averaging is used as the distance value of the pixel point.

The depth image is denoised in a bilateral filtering mode, and the depth image can be subjected to edge storage, so that a clearer portrait outline can be obtained, and the subsequent calculation of the spatial position and the human posture is facilitated.

At 130, a spatial position and/or a human pose of the user may be obtained from the converted 3D point cloud.

In this specification embodiment, the spatial position of the user may be a position in real space based on a world coordinate system, and may also be a relative position with respect to the 3D camera.

In one example of an embodiment of the present specification, a spatial position and/or a human posture of a user may be obtained from a 3D point cloud obtained by conversion using a neural network model. The neural network model for outputting the spatial position may be a target detection model obtained based on deep learning, and the neural network model for outputting the human body posture may be a key point detection model obtained based on deep learning.

In this example, the target detection model may be used for target detection for a human body to determine a spatial location where the human body is located, the target detected by the target detection model being the human body. The key point detection model may be used to detect each key point of the human body, and each detected key point may be specified. The key points of the human body may include points on joints, points on organs, points on human body parts, and the like, for example, the key points of the human body may include a head, a neck, a shoulder, an elbow, a wrist, a hip, a knee, an ankle, and the like.

In one example, the converted 3D point cloud may be input to a target detection model, and target detection targeting a human body is performed on the input 3D point cloud using the target detection model to detect a spatial position of the user, and the spatial position may be output. In one example, the converted 3D point cloud may be input to a key point detection model, and the key point detection model may be used to perform key point detection on the input 3D point cloud to detect each key point on the user's body in the depth image, and output each detected key point.

In one example in which the keypoint detection model detects keypoints, keypoints in respective parts may be detected separately by human body parts, and the keypoints in the respective parts may be specified. For example, the keypoints of the head may include the keypoints corresponding to at least one of the five sense organs. When a certain portion is not displayed in the depth image, the detection of the key points of the portion may not be performed, and the key points of the portion may not be output.

In another example, the converted 3D point cloud may be input to a target detection model, the input 3D point cloud may be subject to target detection using the target detection model to detect a spatial location of the user, and the spatial location may be output. And then inputting the spatial position of the user and the 3D point cloud obtained by conversion into a key point detection model, performing key point detection on the input spatial position and the 3D point cloud by using the key point detection model to determine each key point on the human body of the user in the depth image, and outputting each determined key point.

At 140, an interactive operation is performed for the user based on the spatial position and/or the human gesture.

In this specification embodiment, the interactive operation for the user may include an operation of interacting with the user, an interactive operation for the user with another object, and the like. In the interaction operation with other objects for the user, the content of the interaction operation may relate to the user, such as the behavior of the user. For example, in a security scene, when a user photographed by a 3D camera is involved in an illegal action, the interactive operation performed at this time may include an alarm operation, sending alarm information to a responsible person (e.g., a boss), and the like.

The interaction operation for the user may include multiple types of interaction operations, and the interaction operations in different application scenarios may be different, for example, the interaction operation may include pushing a message, issuing an alarm, and the like. The interactive objects of the interactive operation in different application scenes can also be different. For example, in some application scenarios, the object of the interactive operation may be a user captured by a 3D camera, and in other application scenarios, the object of the interactive operation may be an object other than the user captured by the 3D camera.

In one example of an embodiment of the present specification, an interactive operation for a user may be performed based on a spatial location. In this example, the corresponding interactive operation may be performed only according to the spatial position, the spatial position of the user is different, and the interactive operation performed correspondingly may be different.

In one application scenario, the distance between the user and the 3D camera, or the distance between the user and the interactive device running the human perception system, may be determined according to the spatial position of the user. When the determined distance is less than the specified distance threshold, an interaction with the user may be performed, or an interaction with another object for the user may be performed.

In one example, the interaction device may be a device that interacts with the user and provides a corresponding function for the user, such as a kiosk or the like, and when the distance between the user and the interaction device is less than a distance threshold, the user may be considered to have an intention to interact with the interaction device, so that the interaction device may perform an interaction operation with the user, which may include calling, displaying welcome information, pushing information, sending a coupon or a red packet, and the like.

In another example, the interactive device may be a device for monitoring, such as a security device. In this example, the interactive device may be preset with a warning region, and when the distance between the user and the interactive device is smaller than the distance threshold, the warning region may be considered to be entered, and at this time, the interactive device may perform an interactive operation with another object, such as alarming, sending warning information, and the like.

In another example of an embodiment of the present specification, an interactive operation for a user may be performed based on a human body gesture. In this example, the corresponding interactive operation may be performed only according to the human body posture, the human body posture of the user is different, and the interactive operation performed correspondingly may be different.

In one example, the interaction operation type corresponding to each human body posture can be set, so that when the human body posture of the user is determined, the interaction operation corresponding to the human body posture can be executed. For example, when the human posture of the user is that the user faces the interactive device, the interactive operation performed by the interactive device includes pushing information, sending a coupon or a red packet, and the like. Wherein the human posture of the user towards the interaction device may comprise that the whole body of the user faces towards the interaction device, that the upper body part of the user faces towards the interaction device, that the head of the user faces towards the interaction device, and the like. When the full face of the user faces the 3D camera, the interactive operation performed by the interactive device comprises settlement operation, displaying settlement amount, displaying order content and the like.

In another example of an embodiment of the present specification, an interactive operation for a user may be performed based on a user spatial position and a human body posture. In this example, the type of the interactive operation to be performed may be determined according to the spatial position and the human posture, at least one of the spatial position and the human posture of the user may be different, and the interactive operation to be performed may also be different.

In one example, the interaction device may be a device that interacts with the user and provides corresponding functionality for the user, and when the distance between the user and the interaction device is less than a distance threshold and the user's body posture is that the user is facing the interaction device, the user may be considered to have an intention to interact with the interaction device, so that the interaction device may perform an interaction operation with the user.

In another example, the interactive device is a device for monitoring, and when the distance between the user and the interactive device is less than the distance threshold and the human posture of the user is that the real body of the user faces the interactive device, the user can be considered to enter the alert area, and the interactive device can perform operations of alarming, sending alert information and the like.

FIG. 3 shows a flow diagram of one example 300 of performing an interactive operation for a user in accordance with an embodiment of the present description.

As shown in FIG. 3, at 142, the user's behavior may be determined from the spatial position and/or the human pose.

In this example, the determined user behavior may be different for different application scenarios, different spatial locations and/or different body gestures.

In one example, the behavior of the user may be determined solely from the spatial location of the user. For example, in a security application scenario, when a distance between a user and an interactive device is smaller than a distance threshold, the behavior of the user may be considered as an illegal behavior. For another example, when the spatial position of the user is in a preset prohibited area, the behavior of the user may be considered as an illegal behavior.

In another example, the behavior of the user may be determined based on other factors such as the spatial location of the user and time. And at different times, the set management and control levels of the warning areas are different. For example, the level of control at night is high, and the level of control during the day is low. For another example, the level of control is low during business hours and high during non-business hours. When the spatial position of the user is located in the warning area and within a time period with a high control level, the behavior of the user can be considered as an illegal behavior. When the spatial position of the user is within the warning area and within a time period in which the control level is low, the behavior of the user may not be considered as an illegal behavior.

In another example, the user's behavior may be determined only according to the user's body posture, and may be classified into regular behaviors such as walking, running, jumping, speaking, and the like, and may also be classified into illegal behaviors and non-illegal behaviors. For example, one leg in the user's body posture is in front of the other leg, and the determined user behavior is walking or running. For example, an interactive device is a device that interacts with a user and provides the user with corresponding functionality, and when the user's body posture is that the user is facing the interactive device and one of the legs is in front of the other, the user's behavior may be considered to be close to the interactive device in a walking or running manner. When the user's body posture is that the user is facing the interactive device and the mouth is in an open state, the user's behavior can be considered to be speaking.

In another example, the behavior of the user may be determined from the spatial position and the human pose of the user. For example, when the distance between the user and the interactive device is less than a preset distance threshold, and the user's body posture is that the user is facing the interactive device with one leg in front of the other, the user's behavior may be considered to be approaching the interactive device in a walking or running manner. For example, when the user is located in the alert area and the posture of the user is abnormal such as walking or standing, the behavior of the user can be considered as an illegal behavior.

After the user's behavior is determined, at 144, an interaction may be performed with respect to the behavior based on the user's behavior.

In one example, the behavior of the user is different, and the type of interaction operation performed accordingly may be different. The correspondence between the behavior of the user and the interactive operation type may be preset, and in one example, a correspondence table between the behavior of the user and the interactive operation type may be pre-constructed, and when the behavior of the user is determined, the interactive operation corresponding to the queried interactive operation type is executed by querying the corresponding interactive operation type in the correspondence table.

The executed interactive operation can be determined according to the behavior of the user and the application scene, and in different application scenes, the same behavior of the user can also correspondingly execute different interactive operations.

In one example, the application scenarios may include a security scenario and an information push scenario. When the behavior of the user approaches towards the interactive device operating the human perception system, the behavior of the user can be considered as an illegal behavior in a security scene, and particularly the behavior of the user is more likely to be determined as the illegal behavior under the condition that the user is in a warning area, so that the alarm operation is performed. In an information pushing scenario, the behavior of the user at this time may be considered as having an interaction intention, so that information may be pushed to the user, and the pushed information may include marketing content, coupons, red packages, and the like.

According to the man-machine interaction method provided by the embodiment of the specification, man-machine interaction is realized through the depth image acquired by the 3D camera, user information is acquired in a minimized mode, user privacy is protected to the maximum extent, and therefore privacy safety of a user is achieved.

Fig. 4 is a block diagram illustrating an example of a human-computer interaction device 400 based on privacy protection according to an embodiment of the present disclosure.

The human-computer interaction device 400 can be applied to a human perception system, and as shown in fig. 4, the human-computer interaction device 400 includes a depth image obtaining unit 410, a coordinate system conversion unit 420, a 3D point cloud computing unit 430, and an interaction execution unit 440.

And a depth image acquiring unit 410 configured to acquire a depth image of the user by using the 3D camera when the user is in a visual field range of the 3D camera of the human body perception system.

A coordinate system conversion unit 420 configured to convert each pixel point in the depth image based on the pixel coordinate system into a 3D point cloud based on the spatial coordinate system.

In one example, the human-computer interaction device 400 may further include an image denoising unit, and the image denoising unit may be configured to denoise the depth image by using bilateral filtering.

And a 3D point cloud computing unit 430 configured to obtain the spatial position and/or the human body posture of the user according to the converted 3D point cloud.

In one example, the 3D point cloud computing unit 430 may be further configured to: and obtaining the spatial position and/or the human body posture of the user by utilizing the neural network model according to the converted 3D point cloud.

In one example, the 3D point cloud computing unit 430 may be further configured to: performing target detection on the obtained 3D point cloud by using a target detection model to determine the spatial position of the user; and detecting key points of the human body by using the key point detection model on the obtained 3D point cloud and the space position of the user so as to determine the human body posture of the user.

An interaction performing unit 440 configured to perform an interactive operation for the user based on the spatial position and/or the human body posture.

In one example, the spatial locations are different and/or the body gestures are different, corresponding to different types of interactions being performed.

In one example, the interaction execution unit 440 may further include a behavior determination module and an interaction execution module. The behavior determination module is configured to: and determining the user behavior according to the space position and/or the human body posture. The interaction execution module is configured to: and performing interaction operation aiming at the behaviors according to the behaviors.

In one example, the behaviors are different and the types of interaction operations that are performed correspondingly are different.

In one example, the interaction execution module may be further configured to: under the application scene of security protection, when the behavior is determined as an illegal behavior, triggering an alarm; or in an application scenario of information push, when the behavior is determined as that the user is facing an interactive device running a human perception system, pushing information to the user.

Embodiments of a human-computer interaction method and a human-computer interaction device based on privacy protection according to the embodiments of the present specification are described above with reference to fig. 1 to 4.

The human-computer interaction device based on privacy protection in the embodiments of the present specification may be implemented by hardware, or may be implemented by software, or a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the storage into the memory for operation through the processor of the device where the software implementation is located as a logical means. In the embodiment of the specification, the privacy protection-based human-computer interaction device can be implemented by using an electronic device, for example.

Fig. 5 is a block diagram of an electronic device 500 for implementing a human-computer interaction method according to an embodiment of the present disclosure.

As shown in fig. 5, the electronic device 500 may include at least one processor 510, a storage (e.g., non-volatile storage) 520, a memory 530, and a communication interface 540, and the at least one processor 510, the storage 520, the memory 530, and the communication interface 540 are connected together via a bus 550. The at least one processor 510 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 510 to: when a user is in the visual field range of a 3D camera of a human body perception system, acquiring a depth image of the user by using the 3D camera; converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system; obtaining the spatial position and/or the human posture of the user according to the converted 3D point cloud; and performing an interactive operation for the user based on the spatial position and/or the human body posture.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 510 to perform the various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present description.

According to one embodiment, a program product, such as a machine-readable medium, is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present specification.

Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Computer program code required for the operation of various portions of the present specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB, NET, Python, and the like, a conventional programming language such as C, Visual Basic 2003, Perl, COBOL2002, PHP, and ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute on the user's computer, or on the user's computer as a stand-alone software package, or partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Not all steps and elements in the above flows and system structure diagrams are necessary, and some steps or elements may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the embodiments of the present disclosure are not limited to the specific details of the embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present disclosure within the technical spirit of the embodiments of the present disclosure, and all of them fall within the scope of the embodiments of the present disclosure.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the description is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A human-computer interaction method based on privacy protection, performed by a human perception system, the human-computer interaction method comprising:

when a user is in a visual field range of a 3D camera of the human body perception system, acquiring a depth image of the user by using the 3D camera;

converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system;

obtaining the spatial position and/or the human posture of the user according to the converted 3D point cloud; and

performing an interactive operation for the user based on the spatial position and/or the human gesture.

2. The human-computer interaction method of claim 1, wherein the spatial positions are different and/or the human postures are different, and the types of interaction operations correspondingly performed are different.

3. The human-computer interaction method of claim 1, wherein performing an interactive operation for the user based on the spatial position and/or the human body gesture comprises:

determining the user's behavior according to the spatial position and/or the human body posture; and

and executing the interactive operation aiming at the behavior according to the behavior.

4. A human-computer interaction method according to claim 3, wherein said behaviors differ and the type of interaction operation performed correspondingly differs.

5. The human-computer interaction method of claim 3, wherein performing the interaction operation for the behavior in accordance with the behavior comprises:

under the application scene of security protection, when the behavior is determined as an illegal behavior, triggering an alarm; or

In an application scenario of information pushing, when the behavior is determined that the user is facing an interactive device running the human perception system, pushing information to the user.

6. The human-computer interaction method of claim 1, wherein obtaining the spatial position and/or the human posture of the user from the converted 3D point cloud comprises:

and obtaining the spatial position and/or the human body posture of the user according to the converted 3D point cloud by utilizing a neural network model.

7. The human-computer interaction method of claim 6, wherein the obtaining the spatial position and/or the human posture of the user from the converted 3D point cloud by using a neural network model comprises:

performing target detection on the obtained 3D point cloud by using a target detection model to determine the spatial position of the user; and

and detecting key points of the human body by using the key point detection model on the obtained 3D point cloud and the space position of the user so as to determine the human body posture of the user.

8. The human-computer interaction method of claim 1, wherein prior to converting individual pixel points in the depth image based on a pixel coordinate system to a spatial coordinate system based 3D point cloud, the method further comprises:

and denoising the depth image by using a bilateral filtering mode.

9. A human-computer interaction device based on privacy protection is applied to a human perception system, and comprises:

a depth image acquisition unit which acquires a depth image of a user by using a 3D camera when the user is in a visual field range of the 3D camera of the human body perception system;

the coordinate system conversion unit is used for converting each pixel point based on a pixel coordinate system in the depth image into a 3D point cloud based on a space coordinate system;

the 3D point cloud computing unit is used for obtaining the spatial position and/or the human body posture of the user according to the converted 3D point cloud; and

an interaction execution unit that executes an interactive operation for the user based on the spatial position and/or the human body posture.

10. An electronic device, comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement the method of any of claims 1-8.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-8.