CN111259755A

CN111259755A - Data association method, device, equipment and storage medium

Info

Publication number: CN111259755A
Application number: CN202010027553.1A
Authority: CN
Inventors: 罗宇轩; 亢乐; 包英泽
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2020-06-09
Anticipated expiration: 2040-01-10
Also published as: CN111259755B

Abstract

The application discloses a data association method, a data association device, data association equipment and a storage medium, and relates to the field of computer vision. The specific implementation scheme is as follows: the method is applied to electronic equipment, the electronic equipment is communicated with a plurality of depth vision sensors, the depth vision sensors are arranged in a preset three-dimensional scene, and the preset three-dimensional scene also comprises a target object and at least one user, and the method comprises the following steps: if it is monitored that the user takes the target object, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the target object is taken; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; determining a target user for taking a target object according to the position of each user key part in a preset three-dimensional scene; and associating the target item with the target user. And the person and goods data are more accurately correlated.

Description

Data association method, device, equipment and storage medium

Technical Field

The application relates to the technical field of data processing, in particular to a computer vision technology.

Background

As computer vision technology matures, the field of unmanned retail based on computer vision has also developed rapidly. In the unmanned retail scene, the operation of changing the goods on the shelf needs to be associated with the user who takes the goods, which is called the association technology of personal goods data.

In the prior art, when people and goods data are correlated, a single depth camera is adopted to collect position information and depth information of key parts of a user who takes articles in an image. If the article taking user is blocked, the position information and the depth information of the acquired key part in the image are not accurate enough, so that the determined position information of the key part of the article taking user in the three-dimensional scene of unmanned retail is not accurate enough, the person and goods data cannot be accurately correlated, and finally the article taking person cannot be accurately determined in the scene of unmanned retail.

Disclosure of Invention

The embodiment of the application provides a data association method, a data association device, equipment and a storage medium, and solves the technical problem that people and goods data cannot be accurately associated in the prior art, and finally a person to take each article cannot be accurately determined in an unmanned retail scene.

A first aspect of an embodiment of the present application provides a data association method, where the method is applied to an electronic device, the electronic device communicates with a plurality of depth vision sensors, the depth vision sensors are disposed in a preset three-dimensional scene, the preset three-dimensional scene further includes a target item and at least one user, and the method includes:

if it is monitored that the user takes the target object, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the target object is taken; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; determining a target user for taking the target object according to the position of each user key part in a preset three-dimensional scene; and associating the target item with the target user.

In the embodiment of the application, when the user takes the target object, the two-dimensional images are collected by the target depth vision sensors, and the three-dimensional scene point cloud data is generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined position of the key part of the user in a preset three-dimensional scene is more accurate, the person and goods data are more accurately associated, and the person taking person of each object is accurately determined.

Further, the method for determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and mapping parameters includes:

acquiring internal and external parameters corresponding to each target depth vision sensor and positions of the target depth vision sensors in a preset three-dimensional scene; and mapping each two-dimensional image to a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene to obtain corresponding target three-dimensional scene point cloud data.

In the embodiment of the application, each two-dimensional image is acquired by a corresponding target depth visual sensor, so that each two-dimensional image is mapped to a preset three-dimensional scene coordinate system according to internal and external parameters of the corresponding target depth visual sensor and the position of the corresponding target depth visual sensor in a preset three-dimensional scene, and the fusion and the splicing of scene point cloud data are completed in the process that each two-dimensional image is mapped to the preset three-dimensional scene coordinate system. The target three-dimensional scene point cloud data can be accurately and quickly determined. And because the position of each target depth sensor is different from the shooting angle, although a single two-dimensional image has an occlusion area, the occlusion areas can be effectively removed from the obtained target three-dimensional scene point cloud data after the single two-dimensional image is projected to a preset three-dimensional scene coordinate system and spliced.

Further, the method for determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene comprises the following steps:

inputting the target three-dimensional scene point cloud data into a first trained to converged position detection model, and detecting the position of each user key part in a preset three-dimensional scene through the first trained to converged position detection model; and outputting the position of each user key part in a preset three-dimensional scene through the first position detection model trained to be convergent.

In the embodiment of the application, the first trained to converged position detection model is adopted to detect the position of the user key part in the target three-dimensional scene point cloud data in the preset three-dimensional scene, and the first position detection model is obtained after training to convergence, so that the position of the user key part in the preset three-dimensional scene can be accurately detected.

Further, the method as described above, before inputting the target three-dimensional scene point cloud data into the first trained to converged position detection model, further comprising:

training the first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first trained to converged position detection model.

In the embodiment of the application, the first trained-to-convergent position detection model is obtained by training the first initial position detection model through the first historical three-dimensional scene point cloud data marking the position of at least one user key part in the preset three-dimensional scene, so that the first trained-to-convergent position detection model is more suitable for detecting the position of the user key part in the target three-dimensional scene point cloud data in the preset three-dimensional scene, and the accuracy of detecting the position of the user key part in the preset three-dimensional scene is further improved.

Further, the method for determining a target user who takes the target item according to the position of each user key part in a preset three-dimensional scene includes:

acquiring the position of a target object; determining the distance between each user key part and the target object according to the position of each user key part in a preset three-dimensional scene and the position of the target object; and determining the user with the minimum distance as the target user.

In the embodiment of the application, the distance between the key part of the user for taking the target object and the target object is minimum, so that the user with the minimum distance is determined as the target user, and the determined target user for taking the target object is more accurate.

Further, the method as described above, the associating the target item with the target user, comprising:

acquiring the identification information of the target user and the identification information of the target object; and associating the identification information of the target user with the identification information of the target object.

In the embodiment of the application, the identification information of the target user is associated with the identification information of the target object, and the identification information is information which can uniquely represent the target, so that the target object and the target user are associated more accurately.

Further, the method, after determining a target user who takes the target item according to the position of each user key part in a preset three-dimensional scene, further includes:

determining the position of the head of a target user in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; determining the human body position of the target user matched with the head position of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

In the embodiment of the application, the position of the head of each user in the preset three-dimensional scene is consistent with the position of the human body of the user in the preset three-dimensional scene, and after each user enters the preset three-dimensional scene, the identification information of each user can be acquired, and each user is tracked in real time, so that the human body position and the identification information of each user can be accurately determined, and after the target user is determined, the identification information of the target user can be accurately determined in a mode that the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of the user.

Further, the method for determining the position of the head of the target user in the preset three-dimensional scene according to the point cloud data of the target three-dimensional scene comprises the following steps:

inputting the target three-dimensional scene point cloud data into a second position detection model trained to be convergent, and detecting the position of the head of a target user in a preset three-dimensional scene through the second position detection model trained to be convergent; and outputting the position of the head of the target user in the preset three-dimensional scene through the second position detection model trained to be converged.

In the embodiment of the application, the second trained to converged position detection model is adopted to detect the position of the head of the target user in the target three-dimensional scene point cloud data in the preset three-dimensional scene, and the second position detection model is obtained after training to convergence, so that the position of the head of the user in the preset three-dimensional scene can be accurately detected.

Further, the method as described above, before inputting the target three-dimensional scene point cloud data into the second trained to converged position detection model, further comprising:

training the second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data marking the position of the head of the user for taking the article in a preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second trained to converged position detection model.

In the embodiment of the application, the second trained to converged position detection model is obtained by training the second initial position detection model through the second historical three-dimensional scene point cloud data marking the position of the head of the user taking the article in the preset three-dimensional scene, so that the second trained to converged position detection model is more suitable for detecting the position of the head of the user in the preset three-dimensional scene in the target three-dimensional scene point cloud data, and the accuracy of detecting the position of the head of the user in the preset three-dimensional scene is further improved.

Further, the method as described above, after associating the target item with the target user, further comprising:

if the condition that the list generation condition is met is monitored, list information corresponding to the target object is obtained; and sending the list information to the terminal equipment of the target user.

In the embodiment of the application, after the target object and the target user are associated, if the monitoring meets the list generation condition, the list information is sent to the terminal equipment of the target user, and the list can be automatically generated and paid in an unmanned retail scene.

A second aspect of the embodiments of the present application provides a data association apparatus, where the apparatus is located in an electronic device, the electronic device communicates with a plurality of depth vision sensors, the depth vision sensors are disposed in a preset three-dimensional scene, the preset three-dimensional scene further includes a target object and at least one user, and the apparatus includes:

the image acquisition module is used for acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the target object is taken if it is monitored that the target object is taken by a user; the scene point cloud determining module is used for determining corresponding target three-dimensional scene point cloud data according to the two-dimensional images and the mapping parameters; the key part position determining module is used for determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; the target user determining module is used for determining a target user for taking the target object according to the position of each user key part in a preset three-dimensional scene; and the data association module is used for associating the target object with the target user.

Further, in the apparatus as described above, the scene point cloud determining module is specifically configured to:

Further, in the apparatus as described above, the key location determining module is specifically configured to:

Further, the apparatus as described above, further comprising:

the first model training module is used for training the first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first trained to converged position detection model.

Further, in the apparatus as described above, the target user determination module is specifically configured to:

Further, in the apparatus as described above, the data association module is specifically configured to:

Further, the apparatus as described above, further comprising:

the user identification determining module is used for determining the position of the head of a target user in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; determining the human body position of the target user matched with the head position of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, in the apparatus as described above, the user identifier determining module, when determining the position of the head of the target user in the preset three-dimensional scene according to the point cloud data of the target three-dimensional scene, is specifically configured to:

Further, the apparatus as described above, further comprising:

the second model training module is used for training the second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data marking the position of the head of the user for taking the article in a preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second trained to converged position detection model.

Further, the apparatus as described above, further comprising:

the list processing module is used for acquiring the list information corresponding to the target object if the list generating condition is met; and sending the list information to the terminal equipment of the target user.

A third aspect of the embodiments of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

A fourth aspect of embodiments of the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of the first aspects.

A fifth aspect of embodiments of the present application provides a computer program comprising program code for performing the method according to the first aspect when the computer program is run by a computer.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a scene diagram of a data association method that can implement an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a data association method according to a first embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a data association method according to a second embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating step 202 of a data association method according to a second embodiment of the present application;

FIG. 5 is a flowchart illustrating step 203 of a data association method according to a second embodiment of the present application;

FIG. 6 is a schematic flow chart illustrating step 204 of a data association method according to a second embodiment of the present application;

FIG. 7 is a flowchart illustrating step 205 of a data association method according to a second embodiment of the present application;

FIG. 8 is a flowchart illustrating step 208 of a data association method according to a second embodiment of the present application;

fig. 9 is a signaling flow diagram of a data association method according to a third embodiment of the present application;

fig. 10 is a schematic structural diagram of a data association apparatus according to a fourth embodiment of the present application;

fig. 11 is a schematic structural diagram of a data association apparatus according to a fifth embodiment of the present application;

fig. 12 is a block diagram of an electronic device for implementing a data association method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An application scenario of the data association method provided in the embodiment of the present application is described below. As shown in fig. 1, an application scenario corresponding to the data association method provided in the embodiment of the present application includes: an electronic device and a plurality of depth vision sensors. The electronic device is communicatively coupled to the plurality of depth vision sensors. The depth vision sensor is arranged in a preset three-dimensional scene, and can be uniformly arranged at the top of the preset three-dimensional scene. The target object and at least one user are also included in the preset three-dimensional scene. The preset three-dimensional scene can be an unmanned supermarket, an unmanned container and the like. The object included in the preset three-dimensional scene may be a commodity, and the user may be a customer. In the preset three-dimensional scene, articles are placed on a shelf of the preset three-dimensional scene, and a gravity sensor is arranged at each position where the articles are placed on the shelf, so that the gravity sensor is associated with the corresponding articles to detect whether a customer takes the articles. The gravity sensor is communicated with the electronic equipment to send the gravity change signal to the electronic equipment when a customer takes the corresponding article, so that the electronic equipment monitors that the customer takes the corresponding article. Each item may also be associated with a plurality of depth vision sensors, such that the plurality of depth vision sensors acquire a two-dimensional image including the item at a sampling frequency. If a customer is taking a target item, a plurality of target depth vision sensors corresponding to the target item can acquire a two-dimensional image including the target item and the customer. As shown in fig. 1, a specific application scenario of the data association method provided by the present solution is described by taking a preset three-dimensional scenario as an unmanned supermarket as an example. Specifically, before a user enters an unmanned supermarket, user account registration is carried out in a client corresponding to the unmanned supermarket through terminal equipment, when the user enters the unmanned supermarket, an entrance gate or identity recognition equipment of the unmanned supermarket determines that the user is a registered user of the unmanned supermarket through account information, and then the user is allowed to enter the unmanned supermarket. And simultaneously acquiring images through a plurality of depth cameras, sending the images to the electronic equipment, detecting and tracking the users through the images by the electronic equipment, determining the position of each user in the unmanned supermarket in real time, and associating the position of the user in the unmanned supermarket with the user account information. If the user wants to purchase a certain target object, when the user takes the target object, the gravity sensor detects a gravity change signal and sends the gravity change signal to the electronic equipment, and the electronic equipment determines that the user takes the target object, and then two-dimensional images collected by the plurality of target depth vision sensors corresponding to the user when the user takes the target object are obtained. Three target depth vision sensors are illustrated in fig. 1, the two-dimensional image may include a target item and may further include a user who takes the target item, and if there are other users near the user who takes the target item when the user takes the target item, the two-dimensional image may further include other users. Because the arrangement position and the angle of each target depth vision sensor are different, the acquired two-dimensional images are also different, and some two-dimensional images inevitably comprise occlusion areas. Such as where a key portion of the user that is picking up the target item is occluded. After each two-dimensional image is acquired, the two-dimensional image comprises a color image and a depth image, and the parameters of each target depth vision sensor and the position in the preset three-dimensional scene are known, that is, the mapping parameters are known, so that the corresponding target three-dimensional scene point cloud data can be determined according to each two-dimensional image and the mapping parameters. Because the target three-dimensional scene point cloud data is fused with effective information of each two-dimensional image, compared with a single two-dimensional image, the occlusion area is greatly reduced. If the target three-dimensional scene point cloud data comprises at least one user, the position of at least one user key part in a preset three-dimensional scene can be determined according to the target three-dimensional scene point cloud data. And determining a target user for taking the target object according to the position of each user key part in the preset three-dimensional scene, and associating the target object with the target user. When the target user finishes shopping and goes out of the exit gate, determining that the list generation condition is met, and acquiring the list information corresponding to the target object; and after the list information is sent to the terminal equipment of the target user, deduction can be automatically carried out, and automatic shopping of the user in the unmanned supermarket is finished. When a user takes a target article, the two-dimensional images are collected by the target depth vision sensors, and the three-dimensional scene point cloud data is generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined position of the key part of the user in a preset three-dimensional scene is more accurate, the person and goods data are more accurately correlated, and the person taking person of each article is accurately determined.

Embodiments of the present application will be described below in detail with reference to the accompanying drawings.

Example one

Fig. 2 is a schematic flowchart of a data association method according to a first embodiment of the present application, and as shown in fig. 2, an execution subject of the embodiment of the present application is a data association apparatus, and the data association apparatus may be integrated in an electronic device. The data association method provided by the present embodiment includes the following steps.

Step 101, if it is monitored that a user takes a target object, acquiring two-dimensional images acquired by a plurality of target depth vision sensors corresponding to the target object when the target object is taken.

The object taken by the user is a target object, and the depth vision sensors corresponding to the target object are target depth vision sensors. The depth vision sensor may be a depth camera.

In this embodiment, each article is arranged on the shelf, the gravity sensors are respectively arranged at positions on the shelf where the articles are placed, when a user takes a target article, the corresponding target gravity sensor can detect a gravity change signal and send the gravity change signal to the electronic device, and the electronic device detects that the user takes the target article according to the gravity change signal.

The communication mode between the electronic device and each gravity sensor is not limited, and may be Global System for Mobile communication (GSM), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division-Synchronous Code Division multiple Access (TD-SCDMA), Long Term Evolution (LTE), or 5G. It is understood that the wireless communication mode may be zigbee communication, bluetooth BLE communication, or wifi communication of a mobile hotspot.

In this embodiment, each article has a corresponding plurality of target depth vision sensors. A plurality of target depth vision sensors acquire a two-dimensional image including a target item using a sampling frequency. If the electronic equipment monitors that the user takes the target object, the electronic equipment can communicate with the plurality of target depth vision sensors, and the corresponding two-dimensional image at the moment is obtained from each target depth vision sensor. Since the user is taking the target item at this time, the user who takes the target item may also be included in the two-dimensional image. If there are other users beside the user who takes the target item, the other users may be included in the two-dimensional image. I.e. including at least one user in the two-dimensional image.

The communication mode between the electronic device and the target depth vision sensors is not limited, and may be Global System for Mobile communication (GSM), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division-Synchronous Code Division Multiple Access (TD-SCDMA), Long Term Evolution (Long Term Evolution, LTE), or 5G. It is understood that the wireless communication mode may be zigbee communication, bluetooth BLE communication, or wifi communication of a mobile hotspot.

The values are illustrated in each two-dimensional image, including a color image and a depth image. The color image represents color information of each pixel point in the image, and the color image may be an RGB image, for example. The depth image represents depth information of each pixel point in the image in a preset three-dimensional scene.

And 102, determining corresponding target three-dimensional scene point cloud data according to the two-dimensional images and the mapping parameters.

In this embodiment, the mapping parameter is a parameter that is mapped from the two-dimensional image to the target three-dimensional scene. Parameters of each target depth vision sensor and location in the preset three-dimensional scene may be included. Wherein the parameters of each target depth vision sensor may include internal parameters and external parameters.

In this embodiment, each target depth visual sensor may be calibrated in advance, and the internal reference, the external reference, and the position of each calibrated target depth visual sensor in the preset three-dimensional scene may be determined. Because each two-dimensional image can acquire the color information and the depth information of each pixel point, each two-dimensional image can be mapped into a preset three-dimensional scene coordinate system according to corresponding internal reference and external reference and the position in a preset three-dimensional scene, and the pixel points of each two-dimensional image are mapped into target three-dimensional scene point cloud data.

It can be understood that, if the plurality of target depth vision sensors are depth vision sensors around the target object, the determined target three-dimensional scene point cloud data is local three-dimensional point cloud data around the target object in the preset three-dimensional scene.

The values are explained in the following, because the arrangement position and the angle of each target depth vision sensor are different, the acquired two-dimensional images are different, and some two-dimensional images inevitably include occlusion areas. After each two-dimensional image is acquired, the two-dimensional image comprises a color image and a depth image, and the mapping parameters are known, so that corresponding target three-dimensional scene point cloud data can be determined according to each two-dimensional image and the mapping parameters. Because the target three-dimensional scene point cloud data is fused with effective information of each two-dimensional image, compared with a single two-dimensional image, the occlusion area is greatly reduced.

Step 103, determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data.

In this embodiment, the target three-dimensional scene point cloud data includes at least one user key part, and a position detection model may be used to detect a position of the user key part in a preset three-dimensional scene.

The position detection model may be a deep learning model, a machine learning model, or the like, which is not limited in this embodiment.

In this embodiment, the key part of the user is a part related to the target object to be taken, such as a hand, a wrist or a forearm.

And 104, determining a target user for taking the target object according to the position of each user key part in the preset three-dimensional scene.

In this embodiment, when each article is placed in the preset three-dimensional scene, the position of each article in the preset three-dimensional scene is stored. The position of the target object in the preset three-dimensional scene, which is stored in advance, is obtained, as an optional implementation manner, the position of each user key part in the preset three-dimensional scene is compared with the position of the target object in the preset three-dimensional scene, and the user of the key part closest to the target object is determined as the target user.

It can be understood that the target user mode for taking the target object may be determined according to the position of each user key part in the preset three-dimensional scene, and is not limited in this embodiment.

Wherein the target user is a user who takes the target item.

Step 105, associating the target item with the target user.

In this embodiment, when the target item is associated with the target user, the identification information of the target item may be associated with the identification information of the target user.

The identification information of the target item may be a unique barcode or a two-dimensional code of the target item. The identification information of the target user may be information uniquely representing the target user, such as a mobile phone, a mailbox or an account of the target user.

It will be appreciated that the association data is stored after the target item has been associated with the target user.

In the data association method provided by this embodiment, if it is monitored that a user takes a target article, two-dimensional images acquired by a plurality of target depth vision sensors corresponding to the target article are acquired; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene; determining a target user for taking a target object according to the position of each user key part in a preset three-dimensional scene; and associating the target item with the target user. When a user takes a target article, the two-dimensional images are collected by the target depth vision sensors, and the three-dimensional scene point cloud data is generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined position of the key part of the user in a preset three-dimensional scene is more accurate, the person and goods data are more accurately correlated, and the person taking person of each article is accurately determined.

Example two

Fig. 3 is a schematic flowchart of a data association method according to a second embodiment of the present application, and as shown in fig. 3, the data association method provided in this embodiment is further detailed in steps 102 to 105 based on the data association method provided in the first embodiment of the present application. If the monitored condition meets the list generation condition, acquiring list information corresponding to the target object; and transmitting the list information to the terminal equipment of the target user. The data association method provided by the present embodiment includes the following steps.

Step 201, if it is monitored that the user takes the target object, acquiring two-dimensional images acquired by a plurality of target depth vision sensors corresponding to the target object when the user takes the target object.

In this embodiment, the implementation manner of step 201 is similar to that of step 101 in the first embodiment of the present application, and is not described herein again.

Step 202, determining corresponding target three-dimensional scene point cloud data according to the two-dimensional images and the mapping parameters.

As an alternative embodiment, as shown in fig. 4, step 202 includes the following steps:

step 2021, obtaining internal and external parameters corresponding to each target depth vision sensor and a position in a preset three-dimensional scene.

In this embodiment, the mapping parameters include: and (3) internal reference and external reference of each target depth vision sensor and the position of each target depth vision sensor in the preset three-dimensional scene. The depth vision sensors corresponding to each object can be calibrated in advance, internal parameters, external parameters and positions of the calibrated depth vision sensors in the preset three-dimensional scene are determined, and the internal parameters, the external parameters and the positions of the depth vision sensors in the preset three-dimensional scene are stored in an associated mode in advance. And acquiring internal and external parameters corresponding to each target depth vision sensor and the position of each target depth vision sensor in a preset three-dimensional scene.

Wherein, the corresponding internal reference of each target depth vision sensor may include: a focal length. The external parameters may include: a rotation matrix and a translation matrix.

Step 2022, mapping each two-dimensional image to a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene to obtain corresponding target three-dimensional scene point cloud data.

In this embodiment, since each two-dimensional image can obtain color information and depth information of each pixel point, each two-dimensional image can be mapped into a preset three-dimensional scene coordinate system according to corresponding internal reference and external reference and a position in a preset three-dimensional scene, and the pixel points of each two-dimensional image are mapped into target three-dimensional scene point cloud data.

In this embodiment, each two-dimensional image is acquired by a corresponding target depth vision sensor, so that each two-dimensional image is mapped into a preset three-dimensional scene coordinate system according to internal and external parameters of the corresponding target depth vision sensor and a position in a preset three-dimensional scene, and fusion and splicing of scene point cloud data are completed in the process of mapping each two-dimensional image to the preset three-dimensional scene coordinate system. The target three-dimensional scene point cloud data can be accurately and quickly determined. And because the position of each target depth sensor is different from the shooting angle, although a single two-dimensional image has an occlusion area, the occlusion areas can be effectively removed from the obtained target three-dimensional scene point cloud data after the single two-dimensional image is projected to a preset three-dimensional scene coordinate system and spliced.

Step 203, determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data.

As an alternative embodiment, as shown in fig. 5, step 203 comprises the following steps:

step 2031, inputting the target three-dimensional scene point cloud data into the first trained to converged position detection model, and detecting the position of each user key part in the preset three-dimensional scene through the first trained to converged position detection model.

In this embodiment, the first trained to convergent position detection model is a model for detecting the position of each user key part in the target three-dimensional scene point cloud data in a preset three-dimensional scene. The first trained to converged position detection model is obtained by training the first initial position detection model to converge.

The first position detection model trained to converge may be a deep learning model, such as a PointNet model or an Action4d model, which is suitable for processing three-dimensional point cloud data.

Step 2032, outputting the positions of the key parts of the users in the preset three-dimensional scene through the first trained to converged position detection model.

Further, in this embodiment, the target three-dimensional scene point cloud data is input into the first trained to converged position detection model, and the first trained to converged position detection model detects each user key part in the target three-dimensional scene point cloud data, determines the position of each user key part in the preset three-dimensional scene, and outputs the position of each user key part in the preset three-dimensional scene.

It can be understood that the position of each user key part in the preset three-dimensional scene is the position of the central point of each user key part in the preset three-dimensional scene.

It should be noted that, if the first initial position detection model is not trained to obtain the first position detection model trained to converge, step 2030 is further included before step 2031.

Step 2030, training the first initial position detection model by using the first training sample; the first training sample is first historical three-dimensional scene point cloud data marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining the first initial position detection model meeting the first training convergence condition as a first position detection model trained to be converged.

Further, in this embodiment, the first training sample is first historical three-dimensional scene point cloud data that is used for marking a position of at least one user key part in a preset three-dimensional scene, and since there are a small number of occlusion areas in the first historical three-dimensional scene point cloud data in each first training sample, when the first initial position detection model is trained by using the first training sample under supervision, the first initial position detection model trained to be convergent can be more suitable for detecting a position of each user key part in the target three-dimensional scene point cloud data in the preset three-dimensional scene. The positions of the detected key parts of the users in the preset three-dimensional scene are more accurate.

The first training convergence condition is a convergence condition when the first initial position detection model is trained. The convergence condition may be that the loss function is minimized, or the iteration number reaches a preset iteration number, and the like, which is not limited in this embodiment.

And 204, determining a target user for taking the target object according to the position of each user key part in the preset three-dimensional scene.

As an alternative implementation, in this embodiment, as shown in fig. 6, step 204 includes the following steps:

step 2041, the position of the target item is obtained.

In this embodiment, when each article is placed in a preset three-dimensional scene, the position of each article is stored in advance. The target item location is obtained from the stored item locations.

Wherein, the target object position can be represented by the central point position of the target object.

Step 2042, determining the distance between each user key part and the target object according to the position of each user key part in the preset three-dimensional scene and the position of the target object.

Specifically, in this embodiment, the distance between each user key part and the target object is calculated according to the position of each user key part in the preset three-dimensional scene and the position of the target object by using an euclidean distance formula between two points.

Step 2043, determine the user with the smallest distance as the target user.

Specifically, in this embodiment, since the user who takes the target item needs to contact the target item with his hand, and the user near the target item does not contact the target item, the distance between the key part of the user who takes the target item and the target item is smaller than the distance between the key parts of other users near the target item and the target item. Therefore, the distances between the key parts of the users and the target object are sorted in descending order, and the user with the first smallest distance is determined as the target user.

In this embodiment, since the distance between the key part of the user who takes the target object and the target object is the minimum, the user with the minimum distance is determined as the target user, so that the determined target user who takes the target object is more accurate.

And step 205, determining the position of the head of the target user in the preset three-dimensional scene according to the point cloud data of the target three-dimensional scene.

Optionally, in this embodiment, when associating the target user with the target item, identification information of the target user needs to be determined. The target three-dimensional scene point cloud data does not include identification information of the target user, so that the identification information of the target user needs to be determined.

In this embodiment, steps 205 to 207 are methods for determining identification information of a target user.

As an alternative implementation, in this embodiment, as shown in fig. 7, step 205 includes the following steps:

step 2051, inputting the point cloud data of the target three-dimensional scene into the second trained to converged position detection model, so as to detect the position of the head of the target user in the preset three-dimensional scene through the second trained to converged position detection model.

In this embodiment, the second position detection model trained to converge is a model for detecting a position of a head of a target user in a preset three-dimensional scene in the point cloud data of the target three-dimensional scene. The second trained to converge position detection model is obtained by training the second initial position detection model to converge.

The second position detection model trained to converge may be a deep learning model, such as a PointNet model or an Action4d model, which is suitable for processing three-dimensional point cloud data.

It will be appreciated that the network architecture of the second trained to converged location detection model may be the same as the network architecture of the first trained to converged location detection model, but the values of the parameters in the corresponding models are different.

Step 2052, outputting the position of the target user's head in the preset three-dimensional scene through the second trained to converged position detection model.

Further, in this embodiment, the target three-dimensional scene point cloud data is input into the second trained to converged position detection model, and the second trained to converged position detection model detects the head of the target user in the target three-dimensional scene point cloud data, determines the position of the head of the target user in the preset three-dimensional scene, and outputs the position of the head of the target user in the preset three-dimensional scene.

It is understood that the position of the head of the target user in the preset three-dimensional scene is the position of the center point of the head of the target user in the preset three-dimensional scene.

It is to be noted that, if the second initial position detection model is not trained to obtain the second position detection model trained to converge, step 2050 is further included before step 2051.

Step 2050, training the second initial position detection model by using a second training sample; the second training sample is second historical three-dimensional scene point cloud data marking the position of the head of the article taking user in the preset three-dimensional scene; and if the second training convergence condition is satisfied, determining the second initial position detection model satisfying the second training convergence condition as a second position detection model trained to be converged.

Further, in this embodiment, the second training sample is second historical three-dimensional scene point cloud data that marks the position of the head of the user who takes the article in the preset three-dimensional scene. It is to be understood that the second historical three-dimensional scene point cloud data may differ from the first historical three-dimensional scene point cloud data only by the location of the marked part.

Because a small amount of shielding areas exist in the second historical three-dimensional scene point cloud data in each second training sample, when the second initial position detection model is trained by adopting the second training sample in a supervision manner, the second initial position detection model which is trained to be converged can be more suitable for detecting the position of the head of the target user in the target three-dimensional scene point cloud data in the preset three-dimensional scene. The detected position of the head of the target user in the preset three-dimensional scene is more accurate.

It is understood that the network architecture of the second initial position detection model and the first initial position detection model may be the same, but the values of the parameters in the models are different.

And step 206, determining the human body position of the target user matched with the head position of the target user in the preset three-dimensional scene.

And step 207, determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, in this embodiment, when the user enters the preset three-dimensional scene, the multiple depth vision sensors simultaneously acquire images and send the images to the electronic device, the electronic device detects and tracks each user through the images, determines the position of each user in the preset three-dimensional scene in real time, and can determine the identification information of each user when the user enters the preset three-dimensional scene. Therefore, after tracking each user, the position of each user in the preset three-dimensional scene and the corresponding identification information can be determined.

In this embodiment, the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of each user, and if the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of a certain user, the user corresponding to the human body position of the user is determined to be the target user. And determining the identification information of the target user through the mapping relation between the matched human body position of the user and the corresponding identification information.

Step 208, associating the target item with the target user.

As an alternative implementation, in this embodiment, as shown in fig. 8, step 208 includes the following steps:

step 2081, obtaining the identification information of the target user and the identification information of the target object.

In this embodiment, since the identification information of the target user is determined in steps 205 to 207, the identification information of the target user may be acquired.

In this embodiment, the identification information of the gravity sensor and the identification information of the corresponding object may be stored in the electronic device in an associated manner, and after the gravity sensor sends the gravity change signal to the electronic device, the electronic device determines the identification information of the corresponding object according to the identification information of the gravity sensor carried in the gravity change signal. The identification information of the target item may be a unique barcode or a two-dimensional code of the target item.

Step 2082, associating the identification information of the target user with the identification information of the target item.

It can be understood that, after the identification information of the target user is associated with the identification information of the target item, the associated identification information of the target user and the associated identification information of the target item are stored.

In this embodiment, the identification information of the target user is associated with the identification information of the target item, and since the identification information is information that can uniquely represent the target, the target item and the target user are associated more accurately.

And 209, if the condition that the list generation condition is met is monitored, acquiring the list information corresponding to the target object.

Further, in this embodiment, as an optional implementation manner, monitoring whether the list generation condition is met may be that the target user leaves the preset three-dimensional scene, if the target user leaves the preset three-dimensional scene, it is determined that the list generation condition is met, otherwise, it is determined that the list generation condition is not met.

The monitoring target user leaving the preset three-dimensional scene may be: and communicating the gate set in the preset three-dimensional scene with the electronic equipment, and if the target user opens the exit gate through the identification information, sending a target user leaving message to the electronic equipment by the exit gate, wherein the target user leaving message carries the identification information of the target user. And after receiving the target user leaving message, the electronic equipment determines that the list generation condition is met.

The inventory information of the target item may include: the name of the target item, the place of origin, and the price are equal to the information related to the target item.

Step 210, sending the list information to the terminal device of the target user.

Further, in this embodiment, a mapping relationship between each piece of user identification information and corresponding piece of terminal equipment identification information may be stored in the electronic device in advance. And determining corresponding terminal equipment identification information through the target user identification information, and sending the list information to the terminal equipment of the target user.

It is understood that, in the case of the unattended retail scenario, a value corresponding to the price of the target item may be deducted from the account of the target user based on the inventory information.

In this embodiment, after the target item and the target user are associated, if the monitoring meets the list generation condition, the list information is sent to the terminal device of the target user, and the generation and payment of the list can be automatically completed in an unmanned retail scene.

EXAMPLE III

Fig. 9 is a signaling flowchart of a data association method according to a third embodiment of the present application, and as shown in fig. 9, the data association method provided in this embodiment includes the following steps:

in step 301, if the target gravity sensor detects that the gravity thereon changes, a gravity change signal is generated.

Step 302, sending the gravity change signal to the electronic device.

Wherein the target gravity sensor is arranged on the shelf below the target object.

Wherein, the gravity change signal comprises: identification information of target object

And step 303, the electronic equipment determines and monitors that the user takes the target object according to the gravity change signal.

At step 304, a plurality of target depth vision sensors acquire a two-dimensional image including a target object at a sampling frequency.

Step 305, the electronic device acquires a corresponding two-dimensional image when the target item is taken from the plurality of target depth vision sensors.

And step 306, determining corresponding target three-dimensional scene point cloud data according to the two-dimensional images and the mapping parameters.

And 307, determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data.

And 308, determining a target user for taking the target object according to the position of each user key part in the preset three-dimensional scene.

Step 309, identify information of the target user is determined.

Step 310, associating the target item with the target user.

Step 311, if it is monitored that the list generation condition is met, list information corresponding to the target object is obtained.

Step 312, the list information is sent to the terminal device of the target user.

In this embodiment, the implementation manner and the technical effect of steps 303 to 312 are similar to those of steps 201 to 210 in the second embodiment of the present application, and are not described in detail herein.

Example four

Fig. 10 is a schematic structural diagram of a data association apparatus according to a fourth embodiment of the present application, and as shown in fig. 10, a data association apparatus 1000 according to this embodiment is located in an electronic device, the electronic device is in communication with a plurality of depth vision sensors, the depth vision sensors are disposed in a preset three-dimensional scene, and the preset three-dimensional scene further includes a target item and at least one user. The data association apparatus 1000 includes: the system comprises an image acquisition module 1001, a scene point cloud determination module 1002, a key part position determination module 1003, a target user determination module 1004 and a data association module 1005.

The image obtaining module 1001 is configured to, if it is monitored that the user takes the target item, obtain two-dimensional images acquired by the plurality of target depth vision sensors corresponding to the taking of the target item. And a scene point cloud determining module 1002, configured to determine corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameter. A key part position determining module 1003, configured to determine, according to the target three-dimensional scene point cloud data, a position of at least one user key part in a preset three-dimensional scene. And the target user determining module 1004 is configured to determine a target user who takes the target item according to the position of each user key part in the preset three-dimensional scene. A data association module 1005 for associating the target item with the target user.

The data association apparatus provided in this embodiment may execute the technical solution of the method embodiment shown in fig. 2, and the implementation principle and technical effect of the data association apparatus are similar to those of the method embodiment shown in fig. 2, which are not described in detail herein.

EXAMPLE five

Fig. 11 is a schematic structural diagram of a data association apparatus according to a fifth embodiment of the present application, and as shown in fig. 11, a data association apparatus 1100 provided in this embodiment further includes, on the basis of the data association apparatus provided in the fourth embodiment: a first model training module 1101, a user identification determining module 1102, a second model training module 1103, and a list processing module 1104.

Further, the scene point cloud determining module 1002 is specifically configured to:

Further, the key location determining module 1003 is specifically configured to:

inputting the target three-dimensional scene point cloud data into a first trained to converged position detection model, and detecting the position of each user key part in a preset three-dimensional scene through the first trained to converged position detection model; and outputting the position of each user key part in the preset three-dimensional scene through the first position detection model trained to be convergent.

Further, a first model training module 1101, configured to train a first initial position detection model with a first training sample; the first training sample is first historical three-dimensional scene point cloud data marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining the first initial position detection model meeting the first training convergence condition as a first position detection model trained to be converged.

Further, the target user determining module 1004 is specifically configured to:

acquiring the position of a target object; determining the distance between each user key part and a target object according to the position of each user key part in a preset three-dimensional scene and the position of the target object; and determining the user with the minimum distance as the target user.

Further, the data association module 1005 is specifically configured to:

acquiring identification information of a target user and identification information of a target object; and associating the identification information of the target user with the identification information of the target object.

Further, the user identifier determining module 1102 is configured to determine, according to the point cloud data of the target three-dimensional scene, a position of a head of the target user in a preset three-dimensional scene; determining the human body position of the target user matched with the head position of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, when determining the position of the head of the target user in the preset three-dimensional scene according to the point cloud data of the target three-dimensional scene, the user identifier determining module 1102 is specifically configured to:

Further, a second model training module 1103 is configured to train a second initial position detection model by using a second training sample; the second training sample is second historical three-dimensional scene point cloud data marking the position of the head of the article taking user in the preset three-dimensional scene; and if the second training convergence condition is satisfied, determining the second initial position detection model satisfying the second training convergence condition as a second position detection model trained to be converged.

Further, the list processing module 1104 is configured to, if it is monitored that the list generation condition is met, obtain list information corresponding to the target item; and sending the list information to the terminal equipment of the target user.

The data association apparatus provided in this embodiment may execute the technical solutions of the method embodiments shown in fig. 2 to 9, and the implementation principles and technical effects thereof are similar to those of the method embodiments shown in fig. 2 to 9, and are not described in detail herein.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 12 is a block diagram of an electronic device according to the data association method in the embodiment of the present application. Electronic devices are intended for various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 12, the electronic apparatus includes: one or more processors 1201, memory 1202, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of one processor 1201.

Memory 1202 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data association methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the data association method provided herein.

The memory 1202 is a non-transitory computer-readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the data association method in the embodiment of the present application (for example, the image acquisition module 1001, the scene point cloud determination module 1002, the key location determination module 1003, the target user determination module 1004, and the data association module 1005 shown in fig. 10). The processor 1201 executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 1202, that is, implements the data association method in the above-described method embodiment.

The memory 1202 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of fig. 12, and the like. Further, the memory 1202 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1202 may optionally include memory located remotely from the processor 1201, which may be connected to the electronic device of fig. 12 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of fig. 12 may further include: an input device 1203 and an output device 1204. The processor 1201, the memory 1202, the input device 1203, and the output device 1204 may be connected by a bus or other means, and the bus connection is exemplified in fig. 12.

The input device 1203 may receive input voice, numeric, or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus of fig. 12, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 1204 may include a voice playing device, a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, when the user takes the target object, the two-dimensional images are collected by the target depth vision sensors, and the three-dimensional scene point cloud data are generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined position of the key part of the user in the preset three-dimensional scene is more accurate, the people and goods data are more accurately associated, and the person taking person of each object is accurately determined.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data association method is applied to an electronic device, the electronic device is communicated with a plurality of depth vision sensors, the depth vision sensors are arranged in a preset three-dimensional scene, and the preset three-dimensional scene further comprises a target object and at least one user, and the method comprises the following steps:

if it is monitored that the user takes the target object, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the target object is taken;

determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters;

determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene;

determining a target user for taking the target object according to the position of each user key part in a preset three-dimensional scene;

and associating the target item with the target user.

2. The method of claim 1, wherein determining corresponding target three-dimensional scene point cloud data from the two-dimensional images and the mapping parameters comprises:

acquiring internal and external parameters corresponding to each target depth vision sensor and positions of the target depth vision sensors in a preset three-dimensional scene;

and mapping each two-dimensional image to a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene to obtain corresponding target three-dimensional scene point cloud data.

3. The method of claim 1, wherein determining the location of at least one user key in a preset three-dimensional scene from the target three-dimensional scene point cloud data comprises:

inputting the target three-dimensional scene point cloud data into a first trained to converged position detection model, and detecting the position of each user key part in a preset three-dimensional scene through the first trained to converged position detection model;

and outputting the position of each user key part in a preset three-dimensional scene through the first position detection model trained to be convergent.

4. The method of claim 3, wherein prior to inputting the target three-dimensional scene point cloud data into the first trained to converged location detection model, further comprising:

training the first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data marking the position of at least one user key part in a preset three-dimensional scene;

and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first trained to converged position detection model.

5. The method according to claim 1, wherein the determining the target user to take the target item according to the position of each user key part in a preset three-dimensional scene comprises:

acquiring the position of a target object;

determining the distance between each user key part and the target object according to the position of each user key part in a preset three-dimensional scene and the position of the target object;

and determining the user with the minimum distance as the target user.

6. The method of claim 1, wherein associating the target item with the target user comprises:

acquiring the identification information of the target user and the identification information of the target object;

and associating the identification information of the target user with the identification information of the target object.

7. The method of claim 6, wherein after determining the target user to pick up the target item according to the position of each user key part in the preset three-dimensional scene, further comprising:

determining the position of the head of a target user in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene;

determining the human body position of the target user matched with the head position of the target user in a preset three-dimensional scene;

and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

8. The method of claim 7, wherein determining the position of the target user's head in the pre-set three-dimensional scene from the target three-dimensional scene point cloud data comprises:

inputting the target three-dimensional scene point cloud data into a second position detection model trained to be convergent, and detecting the position of the head of a target user in a preset three-dimensional scene through the second position detection model trained to be convergent;

and outputting the position of the head of the target user in the preset three-dimensional scene through the second position detection model trained to be converged.

9. The method of claim 8, wherein prior to inputting the target three-dimensional scene point cloud data into the second trained to converged location detection model, further comprising:

training the second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data marking the position of the head of the user for taking the article in a preset three-dimensional scene;

and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second trained to converged position detection model.

10. The method of claim 1, wherein after associating the target item with the target user, further comprising:

if the condition that the list generation condition is met is monitored, list information corresponding to the target object is obtained;

and sending the list information to the terminal equipment of the target user.

11. A data association apparatus, wherein the apparatus is located in an electronic device, the electronic device is in communication with a plurality of depth vision sensors, the depth vision sensors are disposed in a preset three-dimensional scene, the preset three-dimensional scene further includes a target object and at least one user, the apparatus includes:

the image acquisition module is used for acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the target object is taken if it is monitored that the target object is taken by a user;

the scene point cloud determining module is used for determining corresponding target three-dimensional scene point cloud data according to the two-dimensional images and the mapping parameters;

the key part position determining module is used for determining the position of at least one user key part in a preset three-dimensional scene according to the point cloud data of the target three-dimensional scene;

the target user determining module is used for determining a target user for taking the target object according to the position of each user key part in a preset three-dimensional scene;

and the data association module is used for associating the target object with the target user.

12. The apparatus of claim 11, wherein the scene point cloud determination module is specifically configured to:

13. The apparatus of claim 11, wherein the key location determining module is specifically configured to:

14. The apparatus of claim 13, further comprising:

15. The apparatus of claim 11, wherein the target user determination module is specifically configured to:

16. The apparatus according to claim 11, wherein the data association module is specifically configured to:

17. The apparatus of claim 16, further comprising:

18. The apparatus of claim 17, wherein the user identifier determining module, when determining the position of the head of the target user in the preset three-dimensional scene according to the point cloud data of the target three-dimensional scene, is specifically configured to:

19. The apparatus of claim 18, further comprising:

20. The apparatus of claim 11, further comprising:

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.