CN111259755B

CN111259755B - Data association method, device, equipment and storage medium

Info

Publication number: CN111259755B
Application number: CN202010027553.1A
Authority: CN
Inventors: 罗宇轩; 亢乐; 包英泽
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2023-07-28
Anticipated expiration: 2040-01-10
Also published as: CN111259755A

Abstract

The application discloses a data association method, a device, equipment and a storage medium, and relates to the field of computer vision. The specific implementation scheme is as follows: the method is applied to electronic equipment, the electronic equipment is communicated with a plurality of depth vision sensors, the depth vision sensors are arranged in a preset three-dimensional scene, a target object and at least one user are also included in the preset three-dimensional scene, and the method comprises the following steps: if the user is monitored to take the target object, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining a target user taking a target object according to the positions of the key parts of each user in a preset three-dimensional scene; the target item is associated with a target user. And the personnel and goods data are more accurately related.

Description

Data association method, device, equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to computer vision technologies.

Background

As computer vision technology matures, the field of computer vision-based unmanned retail has also evolved rapidly. There is a need in the unmanned retail setting to associate an item change on a shelf with a user taking the item, a technique known as association of people and goods data.

In the prior art, when the association of the human-cargo data is carried out, a single depth camera is adopted to acquire the position information and the depth information of the key parts of the user for taking the articles in the image. If the user who takes the article is blocked, the position information and the depth information in the collected images of the key parts are not accurate enough, so that the determined position information of the key parts of the user who takes the article in the three-dimensional scene of unmanned retail is also not accurate enough, further, the people and goods data cannot be associated accurately, and finally, the user who takes each article cannot be determined accurately in the unmanned retail scene.

Disclosure of Invention

The embodiment of the application provides a data association method, device, equipment and storage medium, which solve the technical problem that people and goods data cannot be associated accurately in the prior art, and finally the person who takes each article cannot be determined accurately in an unmanned retail scene.

An embodiment of the present application provides a data association method, where the method is applied to an electronic device, and the electronic device communicates with a plurality of depth vision sensors, where the depth vision sensors are set in a preset three-dimensional scene, and the preset three-dimensional scene further includes a target object and at least one user, and the method includes:

if the fact that the user takes the target object is monitored, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining a target user taking the target object according to the positions of the key parts of each user in a preset three-dimensional scene; and associating the target object with the target user.

In the embodiment of the application, when a user takes a target object, a plurality of target depth vision sensors acquire two-dimensional images and generate three-dimensional scene point cloud data from each two-dimensional image, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined positions of the key parts of the user in a preset three-dimensional scene are more accurate, the human-cargo data are more accurately associated, and the taker of each object is accurately determined.

Further, as described above, the method for determining the corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameter includes:

obtaining the corresponding internal and external parameters of each target depth vision sensor and the position of each target depth vision sensor in a preset three-dimensional scene; mapping the two-dimensional images into a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene so as to obtain corresponding target three-dimensional scene point cloud data.

In the embodiment of the application, each two-dimensional image is acquired by a corresponding target depth vision sensor, so that each two-dimensional image is mapped into a preset three-dimensional scene coordinate system according to the inner parameter and the outer parameter of the corresponding target depth vision sensor and the position of the corresponding target depth vision sensor in the preset three-dimensional scene, and fusion and splicing of scene point cloud data are completed in the process that each two-dimensional image is mapped into the preset three-dimensional scene coordinate system. The cloud data of the target three-dimensional scene point can be accurately and rapidly determined. And because the position and shooting angle of each target depth sensor are different, although a single two-dimensional image has a shielding area, the shielding area can be effectively removed from the obtained target three-dimensional scene point cloud data after the single two-dimensional image is projected to a preset three-dimensional scene coordinate system and spliced.

Further, the method as described above, determining the position of at least one user key part in the preset three-dimensional scene according to the target three-dimensional scene point cloud data, includes:

inputting the target three-dimensional scene point cloud data into a first trained to-be-converged position detection model, so as to detect the positions of key parts of each user in a preset three-dimensional scene through the first trained to-be-converged position detection model; outputting the positions of the key parts of each user in a preset three-dimensional scene through the first trained to-be-converged position detection model.

In the embodiment of the application, the first trained-to-convergence position detection model is adopted to detect the position of the user key part in the target three-dimensional scene point cloud data in the preset three-dimensional scene, and the first position detection model is obtained after being trained to be converged, so that the position of the user key part in the preset three-dimensional scene can be accurately detected.

Further, the method as described above, before the inputting the target three-dimensional scene point cloud data into the first trained to converged position detection model, further includes:

training the first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data for marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first position detection model trained to be converged.

In the embodiment of the application, the first trained-to-converged position detection model is obtained by marking the first historical three-dimensional scene point cloud data of the position of the at least one user key part in the preset three-dimensional scene after training the first initial position detection model, so that the first trained-to-converged position detection model is more suitable for detecting the position of the user key part in the preset three-dimensional scene in the target three-dimensional scene point cloud data, and the accuracy of detecting the position of the user key part in the preset three-dimensional scene is further improved.

Further, according to the method described above, the determining the target user who takes the target object according to the position of the key part of each user in the preset three-dimensional scene includes:

acquiring the position of a target object; determining the distance between each user key part and the target object according to the position of each user key part in a preset three-dimensional scene and the position of the target object; and determining the user with the smallest distance as the target user.

In the embodiment of the application, the distance between the key part of the user for taking the target object and the target object is the smallest, so that the user with the smallest distance is determined as the target user, and the determined target user for taking the target object is more accurate.

Further, the method as described above, the associating the target item with the target user includes:

acquiring the identification information of the target user and the identification information of the target object; and associating the identification information of the target user with the identification information of the target object.

In the embodiment of the application, the identification information of the target user and the identification information of the target object are associated, and the identification information is the information capable of uniquely representing the target, so that the target object is more accurately associated with the target user.

Further, according to the method as described above, after determining the target user who takes the target object according to the positions of the key parts of each user in the preset three-dimensional scene, the method further includes:

determining the position of the head of the target user in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining the human body position of a target user, which is matched with the position of the head of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

In the embodiment of the application, the position of each user head in the preset three-dimensional scene is consistent with the position of the human body of the user in the preset three-dimensional scene, and after each user enters the preset three-dimensional scene, the identification information of each user can be obtained and tracked in real time, so that the human body position and the identification information of each user can be accurately determined, and further after the target user is determined, the identification information of the target user can be accurately determined in a mode that the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of the user.

Further, the method as described above, wherein the determining the position of the target user head in the preset three-dimensional scene according to the target three-dimensional scene point cloud data includes:

inputting the target three-dimensional scene point cloud data into a second trained to-be-converged position detection model, so as to detect the position of the head of the target user in a preset three-dimensional scene through the second trained to-be-converged position detection model; and outputting the position of the head of the target user in the preset three-dimensional scene through the second trained-to-converged position detection model.

In the embodiment of the application, the position of the head of the target user in the target three-dimensional scene point cloud data in the preset three-dimensional scene is detected by adopting the second trained-to-convergence position detection model, and the position of the head of the user in the preset three-dimensional scene can be accurately detected because the second position detection model is obtained after training to convergence.

Further, the method as described above, before the inputting the target three-dimensional scene point cloud data into the second trained to converge position detection model, further includes:

training the second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second position detection model trained to be converged.

In the embodiment of the application, the second trained-to-converged position detection model is obtained by marking the second historical three-dimensional scene point cloud data of the position of the head of the user taking the object in the preset three-dimensional scene after training the second initial position detection model, so that the second trained-to-converged position detection model is more suitable for detecting the position of the head of the user in the preset three-dimensional scene in the target three-dimensional scene point cloud data, and the accuracy of detecting the position of the head of the user in the preset three-dimensional scene is further improved.

Further, the method as described above, after associating the target object with the target user, further includes:

if the fact that the list generation condition is met is monitored, acquiring list information corresponding to the target object; and sending the list information to the terminal equipment of the target user.

In the embodiment of the application, after the target object and the target user are associated, if the monitoring meets the list generation condition, the list information is sent to the terminal equipment of the target user, so that the generation and payment of the list can be automatically completed in an unmanned retail scene.

A second aspect of the present application provides a data association apparatus, the apparatus being located in an electronic device, the electronic device being in communication with a plurality of depth vision sensors, the depth vision sensors being disposed in a preset three-dimensional scene, the preset three-dimensional scene further including a target object and at least one user, the apparatus including:

The image acquisition module is used for acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user is monitored to take the target object; the scene point cloud determining module is used for determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; the key part position determining module is used for determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; the target user determining module is used for determining a target user taking the target object according to the positions of the key parts of each user in a preset three-dimensional scene; and the data association module is used for associating the target object with the target user.

Further, in the apparatus as described above, the scene point cloud determining module is specifically configured to:

Further, in the apparatus as described above, the key location determining module is specifically configured to:

Further, the apparatus as described above, further comprising:

the first model training module is used for training the first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data for marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first position detection model trained to be converged.

Further, the device as described above, the target user determining module is specifically configured to:

Further, in the apparatus as described above, the data association module is specifically configured to:

Further, the apparatus as described above, further comprising:

the user identification determining module is used for determining the position of the head of the target user in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining the human body position of a target user, which is matched with the position of the head of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, in the apparatus as described above, the user identification determining module is specifically configured to, when determining, according to the target three-dimensional scene point cloud data, a position of a target user head in a preset three-dimensional scene:

Further, the apparatus as described above, further comprising:

the second model training module is used for training the second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second position detection model trained to be converged.

Further, the apparatus as described above, further comprising:

the list processing module is used for acquiring list information corresponding to the target object if the fact that the list generation condition is met is monitored; and sending the list information to the terminal equipment of the target user.

A third aspect of the embodiments of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

A fourth aspect of the embodiments provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the first aspects.

A fifth aspect of the embodiments of the present application provides a computer program comprising program code for performing the method according to the first aspect when the computer program runs on a computer.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a scene diagram of a data association method in which embodiments of the present application may be implemented;

fig. 2 is a flow chart of a data association method according to a first embodiment of the present application;

FIG. 3 is a flow chart of a data association method provided according to a second embodiment of the present application;

fig. 4 is a schematic flow chart of step 202 in the data association method according to the second embodiment of the present application;

fig. 5 is a schematic flow chart of step 203 in the data association method according to the second embodiment of the present application;

fig. 6 is a schematic flow chart of step 204 in the data association method according to the second embodiment of the present application;

fig. 7 is a schematic flowchart of step 205 in the data association method according to the second embodiment of the present application;

Fig. 8 is a schematic flow chart of step 208 in the data association method according to the second embodiment of the present application;

fig. 9 is a signaling flow chart of a data association method according to a third embodiment of the present application;

fig. 10 is a schematic structural diagram of a data association device according to a fourth embodiment of the present application;

fig. 11 is a schematic structural diagram of a data association device according to a fifth embodiment of the present application;

fig. 12 is a block diagram of an electronic device for implementing a data association method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The application scenario of the data association method provided in the embodiment of the present application is described below. As shown in fig. 1, an application scenario corresponding to the data association method provided in the embodiment of the present application includes: an electronic device and a plurality of depth vision sensors. The electronic device is communicatively coupled to the plurality of depth vision sensors. The depth vision sensor is arranged in a preset three-dimensional scene, and can be uniformly arranged at the top of the preset three-dimensional scene. The target object and at least one user are also included in the preset three-dimensional scene. The preset three-dimensional scene can be an unmanned supermarket, an unmanned container and the like. The items included in the preset three-dimensional scene may be merchandise and the user may be a customer. In the preset three-dimensional scene, the articles are placed on a goods shelf of the preset three-dimensional scene, and a gravity sensor is arranged at each position of the goods shelf where the articles are placed, so that the gravity sensors are associated with the corresponding articles to detect whether a customer takes the articles. The gravity sensor is in communication with the electronic device to send a gravity change signal to the electronic device when the customer picks up the corresponding item, so that the electronic device monitors that the customer picks up the corresponding item. Each item may also be associated with a plurality of depth vision sensors such that the plurality of depth vision sensors acquire a two-dimensional image including the item at a sampling frequency. If a customer is picking up a target object, a plurality of target depth vision sensors corresponding to the target object can acquire a two-dimensional image including the target object and the customer. As shown in fig. 1, a specific application scenario of the data association method provided in the present embodiment is described by taking a preset three-dimensional scenario as an example of an unmanned supermarket. Specifically, before a user enters an unmanned supermarket, user account registration is performed in a client corresponding to the unmanned supermarket through terminal equipment, when the user enters the unmanned supermarket, an entrance gate or an identity recognition device of the unmanned supermarket determines that the user is a registered user of the unmanned supermarket through account information, and the user is allowed to enter the unmanned supermarket. And simultaneously acquiring images through a plurality of depth cameras, sending the images to electronic equipment, detecting and tracking the users through the images by the electronic equipment, determining the position of each user in the unmanned supermarket in real time, and associating the position of the user in the unmanned supermarket with the user account information. If the user wants to purchase a certain target object, when the user takes the target object, the gravity sensor detects a gravity change signal and sends the gravity change signal to the electronic equipment, and the electronic equipment determines that the user takes the target object, and then acquires two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object. Three target depth vision sensors are illustrated in fig. 1, including a target item in a two-dimensional image, and may also include a user who is picking up the target item, and if there are other users in the vicinity of the user who is picking up the target item when the user is picking up the target item, then other users may also be included in the two-dimensional image. Because the setting position and angle of each target depth vision sensor are different, the acquired two-dimensional images are different, and some two-dimensional images which are unavoidable may include an occlusion region. Such as a key location of the user who is taking the target item is obscured. After each two-dimensional image is acquired, the two-dimensional image comprises a color image and a depth image, and the parameters of each target depth vision sensor and the position in the preset three-dimensional scene are known, namely the mapping parameters are known, so that the corresponding target three-dimensional scene point cloud data can be determined according to each two-dimensional image and the mapping parameters. Because the target three-dimensional scene point cloud data fuses the effective information of each two-dimensional image, compared with a single two-dimensional image, the shielding area is greatly reduced. If the target three-dimensional scene point cloud data comprises at least one user, the position of the key part of the at least one user in the preset three-dimensional scene can be determined according to the target three-dimensional scene point cloud data. And determining a target user taking the target object according to the positions of the key parts of each user in the preset three-dimensional scene, and associating the target object with the target user. When the target user completes shopping and goes out from the exit gate, the list generation condition is confirmed to be met, and list information corresponding to the target object is acquired; the list information is sent to the terminal equipment of the target user, the price of each target object can be included in the list information, and after the list information is sent to the terminal equipment of the target user, payment deduction can be automatically carried out, so that automatic shopping of the user in the unmanned supermarket is completed. When a user takes a target object, a plurality of target depth vision sensors acquire two-dimensional images, and three-dimensional scene point cloud data are generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined positions of key parts of the user in a preset three-dimensional scene are more accurate, the human-cargo data are more accurately associated, and the taker of each object is accurately determined.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Example 1

Fig. 2 is a flow chart of a data association method according to a first embodiment of the present application, as shown in fig. 2, an execution subject of the embodiment of the present application is a data association device, and the data association device may be integrated in an electronic apparatus. The data association method provided in this embodiment includes the following steps.

And step 101, if the user is monitored to take the target object, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object.

Wherein, the article taken by the user is a target article, and the plurality of depth vision sensors corresponding to the target article are target depth vision sensors. The depth vision sensor may be a depth camera.

In this embodiment, each article is disposed on a shelf, and gravity sensors are respectively disposed at positions on the shelf where the articles are placed, and when a user takes a target article, the corresponding target gravity sensor can detect a gravity change signal and send the gravity change signal to an electronic device, and the electronic device detects that the user takes the target article according to the gravity change signal.

The communication mode between the electronic device and each gravity sensor is not limited, and may be, for example, global system for mobile communication (Global System of Mobile communication, abbreviated as GSM), code Division multiple access (Code Division Multiple Access, abbreviated as CDMA), wideband code Division multiple access (Wideband Code Division Multiple Access, abbreviated as WCDMA), time Division-Synchronous Code Division Multiple Access, abbreviated as TD-SCDMA), long term evolution (Long Term Evolution, abbreviated as LTE) system, 5G, and the like. It can be understood that the wireless communication mode can be a zigbee communication mode, a bluetooth BLE communication mode, a wifi communication mode, or the like.

In this embodiment, each item has a corresponding plurality of target depth vision sensors. A plurality of target depth vision sensors acquire two-dimensional images including a target object using a sampling frequency. If the electronic equipment monitors that the user takes the target object, the electronic equipment can communicate with a plurality of target depth vision sensors, and the corresponding two-dimensional image is acquired from each target depth vision sensor. Since the user is taking the target object at this time, the user who takes the target object may also be included in the two-dimensional image. If there are other users beside the user who takes the target object, other users may also be included in the two-dimensional image. I.e. including at least one user in the two-dimensional image.

The communication mode between the electronic device and the plurality of target depth vision sensors is not limited, and may be, for example, global system for mobile communications (Global System of Mobile communication, abbreviated as GSM), code Division multiple access (Code Division Multiple Access, abbreviated as CDMA), wideband code Division multiple access (Wideband Code Division Multiple Access, abbreviated as WCDMA), time Division-synchronization code Division multiple access (Time Division-Synchronous Code Division Multiple Access, abbreviated as TD-SCDMA), long term evolution (Long Term Evolution, abbreviated as LTE) system, 5G, and the like. It can be understood that the wireless communication mode can be a zigbee communication mode, a bluetooth BLE communication mode, a wifi communication mode, or the like.

The values are illustrative of the color image and depth image included in each two-dimensional image. The color image represents color information of each pixel point in the image, for example, the color image may be an RGB image. The depth image represents depth information of each pixel point in the image in a preset three-dimensional scene.

And 102, determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters.

In this embodiment, the mapping parameter is a parameter mapped from the two-dimensional image to the target three-dimensional scene. Parameters of each target depth vision sensor and positions in the preset three-dimensional scene may be included. Wherein the parameters of each target depth vision sensor may include internal parameters and external parameters.

In this embodiment, the target depth visual sensors may be calibrated in advance, and the internal parameters, external parameters and positions of the calibrated target depth visual sensors in the preset three-dimensional scene may be determined. Because each two-dimensional image can acquire the color information and the depth information of each pixel point, each two-dimensional image can be mapped into a preset three-dimensional scene coordinate system according to the corresponding internal parameters, external parameters and the positions in the preset three-dimensional scene, and the pixel points of each two-dimensional image are mapped into target three-dimensional scene point cloud data.

It can be appreciated that if the plurality of target depth vision sensors are depth vision sensors around the target object, the determined target three-dimensional scene point cloud data is local three-dimensional point cloud data around the target object in the preset three-dimensional scene.

The values illustrate that, because the setting position and angle of each target depth vision sensor are different, the acquired two-dimensional images are different, and some two-dimensional images which are unavoidable may include an occlusion region. After each two-dimensional image is acquired, because the two-dimensional image comprises a color image and a depth image, and the mapping parameters are known, corresponding target three-dimensional scene point cloud data can be determined according to each two-dimensional image and the mapping parameters. Because the target three-dimensional scene point cloud data fuses the effective information of each two-dimensional image, compared with a single two-dimensional image, the shielding area is greatly reduced.

And step 103, determining the position of at least one user key part in the preset three-dimensional scene according to the target three-dimensional scene point cloud data.

In this embodiment, the target three-dimensional scene point cloud data includes at least one key part of the user, and a position detection model may be used to detect the position of the key part of the user in the preset three-dimensional scene.

The position detection model may be a deep learning model, a machine learning model, or the like, and is not limited in this embodiment.

In this embodiment, the key parts of the user are the parts related to the object to be taken, such as the hand, the wrist or the forearm.

And 104, determining a target user for taking the target object according to the positions of the key parts of each user in the preset three-dimensional scene.

In this embodiment, when each article is placed in a preset three-dimensional scene, the position of each article in the preset three-dimensional scene is stored. The position of the pre-stored target object in the preset three-dimensional scene is obtained, as an optional implementation manner, the position of each user key part in the preset three-dimensional scene is compared with the position of the target object in the preset three-dimensional scene, and the user of the key part closest to the target object is determined as the target user.

It can be appreciated that the manner of determining the target user for taking the target object according to the positions of the key parts of each user in the preset three-dimensional scene may be other manners, which is not limited in this embodiment.

Wherein the target user is a user who takes the target object.

Step 105, associating the target item with the target user.

In this embodiment, when the target object is associated with the target user, the identification information of the target object may be associated with the identification information of the target user.

The identification information of the target object may be a unique bar code or two-dimensional code of the target object. The identification information of the target user can be information uniquely representing the target user, such as a mobile phone, a mailbox or an account number of the target user.

It will be appreciated that the association data is stored after the target item is associated with the target user.

According to the data association method provided by the embodiment, if the user is monitored to take the target object, two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object are acquired; determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters; determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining a target user taking a target object according to the positions of the key parts of each user in a preset three-dimensional scene; the target item is associated with a target user. When a user takes a target object, a plurality of target depth vision sensors acquire two-dimensional images, and three-dimensional scene point cloud data are generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined positions of key parts of the user in a preset three-dimensional scene are more accurate, the human-cargo data are more accurately associated, and the taker of each object is accurately determined.

Example two

Fig. 3 is a flowchart of a data association method according to a second embodiment of the present application, and as shown in fig. 3, the data association method according to the present embodiment further refines steps 102-105 based on the data association method according to the first embodiment of the present application. If the fact that the list generation condition is met is monitored, acquiring list information corresponding to the target object; and transmitting the list information to the terminal equipment of the target user. The data association method provided in this embodiment includes the following steps.

Step 201, if it is monitored that the user takes the target object, acquiring two-dimensional images acquired by the corresponding multiple target depth vision sensors when the user takes the target object.

In this embodiment, the implementation manner of step 201 is similar to that of step 101 in the first embodiment of the present application, and will not be described in detail here.

Step 202, corresponding target three-dimensional scene point cloud data are determined according to each two-dimensional image and the mapping parameters.

As an alternative embodiment, as shown in fig. 4, step 202 includes the steps of:

in step 2021, the internal and external parameters corresponding to each target depth vision sensor and the position in the preset three-dimensional scene are acquired.

In this embodiment, the mapping parameters include: the depth vision sensor of each target is internally and externally related to the position in the preset three-dimensional scene. The depth vision sensors corresponding to each object can be calibrated in advance, the internal parameters, the external parameters and the positions of the calibrated depth vision sensors in the preset three-dimensional scene are determined, and the internal parameters, the external parameters and the positions of the depth vision sensors in the preset three-dimensional scene are stored in a correlated mode in advance. And acquiring the internal and external parameters corresponding to each target depth vision sensor and the position of each target depth vision sensor in a preset three-dimensional scene.

Wherein, the internal references corresponding to the target depth vision sensors can comprise: focal length. The external parameters may include: the rotation matrix and the translation matrix.

Step 2022, mapping each two-dimensional image into a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene, so as to obtain corresponding target three-dimensional scene point cloud data.

In this embodiment, since each two-dimensional image can acquire color information and depth information of each pixel point, each two-dimensional image can be mapped into a preset three-dimensional scene coordinate system according to the corresponding internal parameter, external parameter and position in the preset three-dimensional scene, and the pixel points of each two-dimensional image are mapped into target three-dimensional scene point cloud data.

In this embodiment, each two-dimensional image is acquired by a corresponding target depth vision sensor, so that each two-dimensional image is mapped to a preset three-dimensional scene coordinate system according to the internal and external parameters of the corresponding target depth vision sensor and the position of the corresponding target depth vision sensor in the preset three-dimensional scene, and fusion and splicing of scene point cloud data are completed in the process of mapping each two-dimensional image to the preset three-dimensional scene coordinate system. The cloud data of the target three-dimensional scene point can be accurately and rapidly determined. And because the position and shooting angle of each target depth sensor are different, although a single two-dimensional image has a shielding area, the shielding area can be effectively removed from the obtained target three-dimensional scene point cloud data after the single two-dimensional image is projected to a preset three-dimensional scene coordinate system and spliced.

And 203, determining the position of at least one user key part in the preset three-dimensional scene according to the target three-dimensional scene point cloud data.

As an alternative embodiment, as shown in fig. 5, step 203 includes the steps of:

step 2031, inputting the target three-dimensional scene point cloud data into the first trained to-be-converged position detection model, so as to detect the positions of the key parts of each user in the preset three-dimensional scene through the first trained to-be-converged position detection model.

In this embodiment, the first trained-to-converge position detection model is a model for detecting positions of key parts of each user in the target three-dimensional scene point cloud data in a preset three-dimensional scene. The first trained-to-converged position detection model is obtained by training the first initial position detection model to be converged.

The first position detection model trained to be converged can be a deep learning model, such as a PointNet model or an Action4d model, which is suitable for processing three-dimensional point cloud data.

Step 2032, outputting the positions of the key parts of each user in the preset three-dimensional scene through the first trained to converged position detection model.

Further, in this embodiment, the target three-dimensional scene point cloud data is input into the first trained to converged position detection model, the first trained to converged position detection model detects each user key part in the target three-dimensional scene point cloud data, determines the position of each user key part in the preset three-dimensional scene, and outputs the position of each user key part in the preset three-dimensional scene.

It can be understood that the position of each user key part in the preset three-dimensional scene is the position of the center point of each user key part in the preset three-dimensional scene.

The values illustrate that if the first initial position detection model is not trained to obtain a first trained to converged position detection model, then step 2030 is further included prior to step 2031.

Step 2030, training a first initial position detection model by using a first training sample; the first training sample is first historical three-dimensional scene point cloud data for marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as a first trained-to-converged position detection model.

Further, in this embodiment, the first training samples are first historical three-dimensional scene point cloud data for marking the positions of at least one user key part in a preset three-dimensional scene, and since there are a small number of shielding areas in the first historical three-dimensional scene point cloud data in each first training sample, when the first initial position detection model is supervised by adopting the first training samples, the first initial position detection model trained to be converged can be more suitable for detecting the positions of each user key part in the target three-dimensional scene point cloud data in the preset three-dimensional scene. The detected positions of the key parts of each user in the preset three-dimensional scene are more accurate.

The first training convergence condition is a convergence condition when training the first initial position detection model. The convergence condition may be that the loss function is minimized, or the number of iterations is up to a preset number of iterations, which is not limited in this embodiment.

And 204, determining a target user for taking the target object according to the positions of the key parts of each user in the preset three-dimensional scene.

As an alternative implementation, in this embodiment, as shown in fig. 6, step 204 includes the following steps:

step 2041, a target item location is acquired.

In this embodiment, when each article is placed in a preset three-dimensional scene, the position of each article is stored in advance. The target item location is retrieved from the stored item locations.

Wherein the target object position may be represented by a center point position of the target object.

Step 2042, determining the distance between each user key part and the target object according to the position of each user key part in the preset three-dimensional scene and the position of the target object.

Specifically, in this embodiment, according to the position of each user key part in the preset three-dimensional scene and the target object position, the distance between each user key part and the target object is calculated according to the euclidean distance formula between the two points.

In step 2043, the user with the smallest distance is determined as the target user.

Specifically, in this embodiment, since the user who takes the target object needs to have his hand in contact with the target object, and the user who is near the target object is not in contact with the target object, the distance between the key part of the user who takes the target object and the target object is smaller than the distance between other user key parts near the target object and the target object. The distances between the key parts of the users and the target object are sorted from small to large, and the user with the smallest distance is determined as the target user.

In this embodiment, since the distance between the key part of the user who takes the target object and the target object is the smallest, the user with the smallest distance is determined as the target user, so that the determined target user who takes the target object is more accurate.

And step 205, determining the position of the head of the target user in the preset three-dimensional scene according to the target three-dimensional scene point cloud data.

Alternatively, in this embodiment, when the target user is associated with the target article, the identification information of the target user needs to be determined. The identification information of the target user is not included in the target three-dimensional scene point cloud data, so that the identification information of the target user needs to be determined.

In this embodiment, steps 205-207 are methods for determining identification information of a target user.

As an alternative implementation, in this embodiment, as shown in fig. 7, step 205 includes the following steps:

step 2051, inputting the target three-dimensional scene point cloud data into the second trained to-be-converged position detection model, so as to detect the position of the head of the target user in the preset three-dimensional scene through the second trained to-be-converged position detection model.

In this embodiment, the second trained-to-converge position detection model is a model for detecting a position of a target user head in the target three-dimensional scene point cloud data in a preset three-dimensional scene. The second trained-to-converged position detection model is obtained by training the second initial position detection model to converge.

The second location detection model trained to be converged can be a deep learning model, such as a PointNet model or an Action4d model, which is suitable for processing three-dimensional point cloud data.

It is understood that the network architecture of the second trained to converged position detection model may be the same as the network architecture of the first trained to converged position detection model, but the values of the parameters in the corresponding models may be different.

Step 2052, outputting the position of the head of the target user in the preset three-dimensional scene through the second trained to-be-converged position detection model.

Further, in this embodiment, the target three-dimensional scene point cloud data is input into the second trained to converged position detection model, the second trained to converged position detection model detects the target user head in the target three-dimensional scene point cloud data, determines the position of the target user head in the preset three-dimensional scene, and outputs the position of the target user head in the preset three-dimensional scene.

It can be understood that the position of the target user's head in the preset three-dimensional scene is the position of the center point of the target user's head in the preset three-dimensional scene.

The values illustrate that if the second initial position detection model is not trained to obtain a second trained to converged position detection model, then step 2050 is further included prior to step 2051.

Step 2050, training a second initial position detection model by using a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as a second trained-to-converged position detection model.

Further, in this embodiment, the second training sample is second historical three-dimensional scene point cloud data that marks a position of a head of the user who takes the article in the preset three-dimensional scene. It is understood that the second historical three-dimensional scene point cloud data may differ from the first historical three-dimensional scene point cloud data only by the location of the marked part.

Because a small amount of shielding areas exist in the second historical three-dimensional scene point cloud data in each second training sample, when the second initial position detection model is trained by adopting the second training samples in a supervised mode, the second initial position detection model trained to be converged can be more suitable for detecting the position of the head of the target user in the preset three-dimensional scene in the target three-dimensional scene point cloud data. The detected position of the head of the target user in the preset three-dimensional scene is more accurate.

It is understood that the network architecture of the second initial position detection model and the first initial position detection model may be the same, but the values of the parameters in the models are different.

Step 206, determining the human body position of the target user, which is matched with the position of the head of the target user in the preset three-dimensional scene.

Step 207, determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, in this embodiment, when a user enters a preset three-dimensional scene, a plurality of depth vision sensors collect images at the same time and send the images to an electronic device, the electronic device detects and tracks each user through the images, determines the position of each user in the preset three-dimensional scene in real time, and can determine the identification information of each user when the user enters the preset three-dimensional scene. Therefore, after each user is tracked, the position of each user in the preset three-dimensional scene and the corresponding identification information can be determined.

In this embodiment, the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of each user, and if the position of the head of the target user in the preset three-dimensional scene is matched with the human body position of a certain user, the user corresponding to the human body position of the user is determined to be the target user. And determining the identification information of the target user through the mapping relation between the matched human body position of the user and the corresponding identification information.

Step 208, associating the target item with the target user.

As an alternative implementation, in this embodiment, as shown in fig. 8, step 208 includes the following steps:

step 2081, obtaining the identification information of the target user and the identification information of the target object.

In this embodiment, since the identification information of the target user is determined in steps 205 to 207, the identification information of the target user can be acquired.

In this embodiment, the identification information of the gravity sensor and the identification information of the corresponding object may be stored in association in the electronic device, and after the gravity sensor sends the gravity change signal to the electronic device, the electronic device determines the identification information of the corresponding target object according to the identification information of the gravity sensor carried by the gravity change signal. The identification information of the target object may be a unique bar code or two-dimensional code of the target object.

Step 2082, associating the identification information of the target user with the identification information of the target object.

It can be understood that after the identification information of the target user and the identification information of the target object are associated, the associated identification information of the target user and the associated identification information of the target object are stored.

In this embodiment, the identification information of the target user and the identification information of the target object are associated, and since the identification information is information capable of uniquely representing the target, the association between the target object and the target user is more accurate.

Step 209, if it is detected that the inventory generation condition is satisfied, acquiring inventory information corresponding to the target object.

Further, in this embodiment, as an optional implementation manner, the monitoring whether the manifest generation condition is met may be monitoring that the target user leaves the preset three-dimensional scene, if the target user leaves the preset three-dimensional scene, it is determined that the manifest generation condition is met, otherwise, it is determined that the manifest generation condition is not met.

Wherein, the monitoring target user leaves the preset three-dimensional scene may be: and communicating the gate set in the preset three-dimensional scene with the electronic equipment, and if the target user opens the exit gate through the identification information, sending a target user departure message to the electronic equipment by the exit gate, wherein the target user departure message carries the target user identification information. And after receiving the leaving message of the target user, the electronic equipment determines that the list generation condition is met.

Wherein the inventory information of the target item may include: the name, place of origin, price of the target item is equal to the information related to the target item.

Step 210, the list information is sent to the terminal device of the target user.

Further, in this embodiment, the mapping relationship between each piece of user identification information and the corresponding piece of terminal equipment identification information may be stored in the electronic device in advance. And determining corresponding terminal equipment identification information through the target user identification information, and sending the list information to the terminal equipment of the target user.

It will be appreciated that, if the retail scenario is unmanned, a credit corresponding to the price of the target item may also be deducted from the account of the target user based on the inventory information.

In this embodiment, after the target object and the target user are associated, if the monitoring meets the inventory generation condition, the inventory information is sent to the terminal device of the target user, so that the generation and payment of the inventory can be automatically completed in the unmanned retail scene.

Example III

Fig. 9 is a signaling flow chart of a data association method according to a third embodiment of the present application, and as shown in fig. 9, the data association method provided in the present embodiment includes the following steps:

in step 301, if the target gravity sensor detects that gravity thereon changes, a gravity change signal is generated.

Step 302, a gravity change signal is sent to the electronic device.

Wherein the target gravity sensor is arranged on a goods shelf below the target object.

Wherein the gravity change signal comprises: identification information of target article

In step 303, the electronic device determines that the user is monitored to take the target object according to the gravity change signal.

At step 304, a plurality of target depth vision sensors acquire a two-dimensional image including a target object at a sampling frequency.

In step 305, the electronic device obtains two-dimensional images corresponding to the target object from the plurality of target depth vision sensors.

And step 306, determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters.

Step 307, determining the position of at least one user key part in the preset three-dimensional scene according to the target three-dimensional scene point cloud data.

And 308, determining a target user for taking the target object according to the positions of the key parts of each user in the preset three-dimensional scene.

Step 309, determining identification information of the target user.

Step 310, associating the target item with the target user.

Step 311, if it is detected that the inventory generation condition is satisfied, acquiring inventory information corresponding to the target object.

Step 312, the list information is sent to the terminal device of the target user.

In this embodiment, the implementation and technical effects of steps 303 to 312 are similar to those of steps 201 to 210 in the second embodiment of the present application, and are not described in detail herein.

Example IV

Fig. 10 is a schematic structural diagram of a data association device according to a fourth embodiment of the present application, and as shown in fig. 10, the data association device 1000 provided in this embodiment is located in an electronic device, where the electronic device communicates with a plurality of depth vision sensors, and the depth vision sensors are disposed in a preset three-dimensional scene, and the preset three-dimensional scene further includes a target object and at least one user. The data association apparatus 1000 includes: an image acquisition module 1001, a scene point cloud determination module 1002, a key part position determination module 1003, a target user determination module 1004 and a data association module 1005.

The image acquiring module 1001 is configured to acquire two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user is monitored to take the target object. The scene point cloud determining module 1002 is configured to determine corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameter. The key part position determining module 1003 is configured to determine a position of at least one key part of the user in a preset three-dimensional scene according to the target three-dimensional scene point cloud data. The target user determining module 1004 is configured to determine a target user who takes a target object according to the positions of the key parts of each user in the preset three-dimensional scene. A data association module 1005 for associating the target item with the target user.

The data association device provided in this embodiment may execute the technical scheme of the method embodiment shown in fig. 2, and its implementation principle and technical effects are similar to those of the method embodiment shown in fig. 2, and are not described in detail herein.

Example five

Fig. 11 is a schematic structural diagram of a data association device according to a fifth embodiment of the present application, as shown in fig. 11, a data association device 1100 provided in the present embodiment further includes, on the basis of the data association device provided in the fourth embodiment: a first model training module 1101, a user identification determination module 1102, a second model training module 1103, and a manifest processing module 1104.

Further, the scene point cloud determining module 1002 is specifically configured to:

obtaining the corresponding internal and external parameters of each target depth vision sensor and the position of each target depth vision sensor in a preset three-dimensional scene; mapping each two-dimensional image into a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene so as to obtain corresponding target three-dimensional scene point cloud data.

Further, the key location determining module 1003 is specifically configured to:

inputting target three-dimensional scene point cloud data into a first trained to-be-converged position detection model, so as to detect the positions of key parts of each user in a preset three-dimensional scene through the first trained to-be-converged position detection model; outputting the positions of the key parts of each user in a preset three-dimensional scene through the first trained to converged position detection model.

Further, a first model training module 1101 is configured to train the first initial position detection model by using a first training sample; the first training sample is first historical three-dimensional scene point cloud data for marking the position of at least one user key part in a preset three-dimensional scene; and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as a first trained-to-converged position detection model.

Further, the target user determining module 1004 is specifically configured to:

acquiring the position of a target object; determining the distance between each user key part and the target object according to the position of each user key part in the preset three-dimensional scene and the position of the target object; and determining the user with the smallest distance as a target user.

Further, the data association module 1005 is specifically configured to:

acquiring identification information of a target user and identification information of a target object; and associating the identification information of the target user with the identification information of the target object.

Further, a user identification determining module 1102 is configured to determine a position of a head of the target user in a preset three-dimensional scene according to the target three-dimensional scene point cloud data; determining the human body position of a target user, which is matched with the position of the head of the target user in a preset three-dimensional scene; and determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

Further, the user identification determining module 1102 is specifically configured to, when determining, according to the target three-dimensional scene point cloud data, a position of a target user head in a preset three-dimensional scene:

inputting the target three-dimensional scene point cloud data into a second trained to-be-converged position detection model, so as to detect the position of the head of the target user in a preset three-dimensional scene through the second trained to-be-converged position detection model; and outputting the position of the head of the target user in the preset three-dimensional scene through the second trained to converged position detection model.

Further, a second model training module 1103 is configured to train the second initial position detection model using a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as a second trained-to-converged position detection model.

Further, the inventory processing module 1104 is configured to acquire inventory information corresponding to the target item if it is monitored that the inventory generation condition is satisfied; and sending the list information to the terminal equipment of the target user.

The data association device provided in this embodiment may execute the technical scheme of the method embodiment shown in fig. 2 to 9, and its implementation principle and technical effects are similar to those of the method embodiment shown in fig. 2 to 9, and are not described in detail herein.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 12, a block diagram of an electronic device according to a data association method according to an embodiment of the present application is shown. Electronic devices are intended for various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 12, the electronic device includes: one or more processors 1201, memory 1202, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1201 is illustrated in fig. 12.

Memory 1202 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data association methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the data association method provided by the present application.

The memory 1202 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the data association method in the embodiments of the present application (e.g., the image acquisition module 1001, the scene point cloud determination module 1002, the key location determination module 1003, the target user determination module 1004, and the data association module 1005 shown in fig. 10). The processor 1201 performs various functional applications of the server and data processing, i.e., implements the data correlation method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 1202.

Memory 1202 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of fig. 12, or the like. In addition, memory 1202 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1202 optionally includes memory remotely located relative to processor 1201, which may be connected to the electronic device of fig. 12 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of fig. 12 may further include: an input device 1203 and an output device 1204. The processor 1201, the memory 1202, the input device 1203, and the output device 1204 may be connected by a bus or otherwise, for example in fig. 12.

The input device 1203 may receive input voice, numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of fig. 12, such as a touch screen, keypad, mouse, trackpad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, etc. input devices. The output means 1204 may include a voice playing device, a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme, when a user takes a target object, the two-dimensional images are acquired by the plurality of target depth visual sensors, and three-dimensional scene point cloud data are generated by the two-dimensional images, so that the shielding area in the three-dimensional scene point cloud data can be effectively reduced, the determined positions of the key parts of the user in a preset three-dimensional scene are more accurate, the human-cargo data are more accurately associated, and the taker of each object is accurately determined.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A data association method, wherein the method is applied to an electronic device, the electronic device is in communication with a plurality of depth vision sensors, the depth vision sensors are arranged in a preset three-dimensional scene, the preset three-dimensional scene further comprises a target object and at least one user, and the method comprises:

if the fact that the user takes the target object is monitored, acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user takes the target object;

Determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters;

determining the position of at least one user key part in a preset three-dimensional scene according to target three-dimensional scene point cloud data, wherein the position of the at least one user key part in the preset three-dimensional scene is determined according to the target three-dimensional scene point cloud data and a first trained to converged position detection model;

determining a target user taking the target object according to the position of each user key part in a preset three-dimensional scene, wherein the target user is a user with the minimum distance between the position of each user key part in the preset three-dimensional scene and the position of the target object;

and associating the target object with the target user.

2. The method of claim 1, wherein determining corresponding target three-dimensional scene point cloud data from each two-dimensional image and mapping parameters comprises:

obtaining the corresponding internal and external parameters of each target depth vision sensor and the position of each target depth vision sensor in a preset three-dimensional scene;

mapping the two-dimensional images into a preset three-dimensional scene coordinate system according to the corresponding internal and external parameters and the position in the preset three-dimensional scene so as to obtain corresponding target three-dimensional scene point cloud data.

3. The method of claim 1, wherein determining the location of the at least one user key location in the preset three-dimensional scene from the target three-dimensional scene point cloud data comprises:

inputting the target three-dimensional scene point cloud data into a first trained to-be-converged position detection model, so as to detect the positions of key parts of each user in a preset three-dimensional scene through the first trained to-be-converged position detection model;

outputting the positions of the key parts of each user in a preset three-dimensional scene through the first trained to-be-converged position detection model.

4. The method of claim 3, wherein before inputting the target three-dimensional scene point cloud data into the first trained-to-converge position detection model, further comprising:

training a first initial position detection model by adopting a first training sample; the first training sample is first historical three-dimensional scene point cloud data for marking the position of at least one user key part in a preset three-dimensional scene;

and if the first training convergence condition is determined to be met, determining a first initial position detection model meeting the first training convergence condition as the first position detection model trained to be converged.

5. The method of claim 1, wherein the determining the target user to take the target object according to the position of each user key part in the preset three-dimensional scene comprises:

acquiring the position of a target object;

determining the distance between each user key part and the target object according to the position of each user key part in a preset three-dimensional scene and the position of the target object;

and determining the user with the smallest distance as the target user.

6. The method of claim 1, wherein the associating the target item with the target user comprises:

acquiring the identification information of the target user and the identification information of the target object;

and associating the identification information of the target user with the identification information of the target object.

7. The method of claim 6, wherein after determining the target user who takes the target object according to the position of each user key part in the preset three-dimensional scene, further comprising:

determining the position of the head of the target user in a preset three-dimensional scene according to the target three-dimensional scene point cloud data;

determining the human body position of a target user, which is matched with the position of the head of the target user in a preset three-dimensional scene;

And determining the identification information of the target user according to the mapping relation between the human body position of the target user and the identification information of the target user.

8. The method of claim 7, wherein determining the position of the target user's head in the preset three-dimensional scene from the target three-dimensional scene point cloud data comprises:

inputting the target three-dimensional scene point cloud data into a second trained to-be-converged position detection model, so as to detect the position of the head of the target user in a preset three-dimensional scene through the second trained to-be-converged position detection model;

and outputting the position of the head of the target user in the preset three-dimensional scene through the second trained-to-converged position detection model.

9. The method of claim 8, wherein before inputting the target three-dimensional scene point cloud data into a second trained-to-converge position detection model, further comprising:

training a second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene;

And if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second position detection model trained to be converged.

10. The method of claim 1, wherein after associating the target item with the target user, further comprising:

if the fact that the list generation condition is met is monitored, acquiring list information corresponding to the target object;

and sending the list information to the terminal equipment of the target user.

11. A data association apparatus, the apparatus being located in an electronic device, the electronic device in communication with a plurality of depth vision sensors, the depth vision sensors being disposed within a preset three-dimensional scene, the preset three-dimensional scene further including a target object and at least one user, the apparatus comprising:

the image acquisition module is used for acquiring two-dimensional images acquired by a plurality of corresponding target depth vision sensors when the user is monitored to take the target object;

the scene point cloud determining module is used for determining corresponding target three-dimensional scene point cloud data according to each two-dimensional image and the mapping parameters;

The key part position determining module is used for determining the position of at least one user key part in a preset three-dimensional scene according to the target three-dimensional scene point cloud data, wherein the position of the at least one user key part in the preset three-dimensional scene is determined according to the target three-dimensional scene point cloud data and a first trained-to-converged position detection model;

the target user determining module is used for determining a target user taking the target object according to the position of each user key part in a preset three-dimensional scene, wherein the target user is a user with the minimum distance between the position of each user key part in the preset three-dimensional scene and the position of the target object;

and the data association module is used for associating the target object with the target user.

12. The apparatus of claim 11, wherein the scene point cloud determination module is specifically configured to:

13. The apparatus of claim 11, wherein the strategic location determination module is configured to:

14. The apparatus as recited in claim 13, further comprising:

15. The apparatus according to claim 11, wherein the target user determination module is specifically configured to:

16. The apparatus of claim 11, wherein the data association module is specifically configured to:

17. The apparatus as recited in claim 16, further comprising:

18. The apparatus according to claim 17, wherein the user identification determining module is configured, when determining the position of the target user head in the preset three-dimensional scene according to the target three-dimensional scene point cloud data, to:

19. The apparatus as recited in claim 18, further comprising:

the second model training module is used for training a second initial position detection model by adopting a second training sample; the second training sample is second historical three-dimensional scene point cloud data for marking the position of the head of the user taking the article in the preset three-dimensional scene; and if the second training convergence condition is determined to be met, determining a second initial position detection model meeting the second training convergence condition as the second position detection model trained to be converged.

20. The apparatus as recited in claim 11, further comprising:

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.