CA3154025A1

CA3154025A1 - Interactive behavior recognizing method, device, computer equipment and storage medium

Info

Publication number: CA3154025A1
Application number: CA3154025A
Authority: CA
Inventors: Xiyang ZHUANG; Daiwei YU; Hao Sun; Xian YANG
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2019-09-11
Filing date: 2020-06-19
Publication date: 2021-03-18
Also published as: CN110674712A; WO2021047232A1

Abstract

The present application relates to an interaction behavior recognition method, an apparatus, a computer device, and a storage medium. Said method comprises: acquiring an image to be detected; performing human body posture detection on said image by means of a preset detection model, so as to obtain human body posture information and hand position information, the detection model being used for performing human body posture detection; tracking a human body posture according to the human body posture information, so as to obtain human body motion trajectory information; performing object tracking on a hand position according to the hand position information, and acquiring a hand area image; performing item recognition on the hand area image by means of a preset classification recognition model, so as to obtain an item recognition result, the classification recognition model being used for performing item recognition; and according to the human body motion trajectory information and the item recognition result, obtaining a first interaction behavior recognition result. The present method can improve the recognition accuracy of interaction behaviors, and has a good transportability.

Description

INTERACTIVE BEHAVIOR RECOGNIZING METHOD, DEVICE, COMPUTER
EQUIPMENT AND STORAGE MEDIUM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present application relates to an interactive behavior recognizing method, and corresponding device, computer equipment and storage medium.
Description of Related Art

[0002] With the development of science and technology, the unmanned selling technique has been increasingly highly regarded by various large-scale retailers. This technique makes use of such multiple smart recognition techniques as sensors, image analysis, and computer vision to achieve unmanned settlement of accounts. Use of the image recognition technique to sense relative positions between human beings and shelves and movements of commodities on the shelves to carry out human-goods interactive behavior recognition is an important precondition to ensuring normal settlement of consumptions by customers.

[0003] However, the currently available human-goods interactive behavior recognizing methods usually make use of templates and rule matching, but the definition of templates and stipulation of rules require a great deal of manpower input, and such practice is often applicable only to the recognition of conventional human body postures, such recognition is inferior in precision, weak in transferability, and applicable merely to human-goods interactive behaviors under specific scenarios.

Date Recue/Date Received 2022-03-10 SUMMARY OF THE INVENTION

[0004] In view of the aforementioned technical problems, there is an urgent need to provide an interactive behavior recognizing method, and corresponding device, computer equipment and storage medium having higher recognition precision and better transferability.

[0005] There is provided an interactive behavior recognizing method that comprises:

[0006] obtaining an image to be detected;

[0007] performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;

[0008] tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;

[0009] performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and

[0010] obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0011] In one of the embodiments, the step of performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information includes:

[0012] preprocessing the image to be detected, and obtaining a human body image in the image to be detected; and

[0013] performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

Date Recue/Date Received 2022-03-10

[0014] In one of the embodiments, the method further comprises:

[0015] obtaining human body position information according to the image to be detected; and

[0016] obtaining a second interactive behavior recognition result according to the human body motion track information, the article recognition result, the human body position information and a preset shelf information, wherein the second interactive behavior recognition result is a human-goods interactive behavior recognition result.

[0017] In one of the embodiments, the step of obtaining an image to be detected includes:

[0018] obtaining the image to be detected as collected by an image collection device at a preset first shooting angle; wherein

[0019] preferably, the preset first shooting angle is an overhead angle perpendicular to the ground, and the image to be detected is of RGBD data.

[0020] In one of the embodiments, the method further comprises:

[0021] obtaining sample image data;

[0022] marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data;

[0023] performing an image enhancing process on the first marked image data, and obtaining a first training dataset; and

[0024] inputting the first training dataset to an HRNet model for training, and obtaining the detection model.

[0025] In one of the embodiments, the method further comprises:

[0026] marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data;

[0027] performing an image enhancing process on the second marked image data, and obtaining a second training dataset; and

[0028] inputting the second training dataset to a convolutional neural network for training, and Date Recue/Date Received 2022-03-10 obtaining the preset classification recognition model, wherein, the convolutional neural network is a yo1ov3-tiny network or a vgg16 network.

[0029] In one of the embodiments, the step of obtaining sample image data includes:

[0030] obtaining an image data collected by the image collection device at a preset second shooting angle within a preset time frame; and

[0031] screening from the collected image data to obtain sample image data containing human-goods interactive behaviors, wherein, preferably, the preset second shooting angle is an overhead angle perpendicular to the ground, and the sample image data is of RGBD data.

[0032] There is provided an interactive behavior recognizing device that comprises:

[0033] a first obtaining module, for obtaining an image to be detected;

[0034] a first detecting module, for performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;

[0035] a tracing module, for tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;

[0036] a second detecting module, for performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and

[0037] a first interactive behavior recognizing module, for obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0038] There is provided a computer equipment that comprises a memory, a processor and a Date Recue/Date Received 2022-03-10 computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:

[0039] obtaining an image to be detected;

[0040] performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;

[0041] tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;

[0042] performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and

[0043] obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0044] There is provided a computer-readable storage medium storing a computer program thereon, and the following steps are realized when the computer program is executed by a processor:

[0045] obtaining an image to be detected;

[0046] performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;

[0047] tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;

[0048] performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification Date Recue/Date Received 2022-03-10 recognition model is employed to perform article recognition; and

[0049] obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0050] In the aforementioned interactive behavior recognizing method, and corresponding device, computer equipment and storage medium, interactive behavior recognition is performed on the image to be detected through the detection model and the classification recognition model, whereby only few data is required to be collected on the basis of existing models to be deployed in different stores, stronger transferability is achieved, lower deployment cost is spent, and it is made possible for the detection model to flexibly and precisely recognize interactive behaviors, and to enhance recognition precision.
BRIEF DESCRIPTION OF THE DRAWINGS

[0051] Fig. 1 is a view illustrating the application environment for an interactive behavior recognizing method in an embodiment;

[0052] Fig. 2 is a flowchart schematically illustrating an interactive behavior recognizing method in an embodiment;

[0053] Fig. 3 is a flowchart schematically illustrating an interactive behavior recognizing method in another embodiment;

[0054] Fig. 4 is a flowchart schematically illustrating the detection model training steps in an embodiment;

[0055] Fig. 5 is a flowchart schematically illustrating the classification recognition model training steps in an embodiment;

Date Recue/Date Received 2022-03-10

[0056] Fig. 6 is a block diagram illustrating the structure of an interactive behavior recognizing device in an embodiment; and

[0057] Fig. 7 is a view illustrating the internal structure of a computer equipment in an embodiment.
DETAILED DESCRIPTION OF THE INVENTION

[0058] To make more lucid and clear the objectives, technical solutions and advantages of the present application, the present application is described in greater detail below with reference to accompanying drawings and embodiments. As should be understood, the specific embodiments as described here are merely meant to explain the present application, rather than to restrict the present application.

[0059] The interactive behavior recognizing method provided by the present application is applicable to the application environment as shown in Fig. 1, in which terminal 102 communicates with server 104 through network. Terminal 102 can be, but is not limited to be, any of various image collection devices, moreover, terminal 102 can employ one or more depth camera(s) with shooting angle(s) perpendicular to the ground, while server 104 can be embodied as an independent server or a server cluster consisting of a plurality of servers.

[0060] In one embodiment, as shown in Fig. 2, there is provided an interactive behavior recognizing method, and the method is explained with an example of its being applied to the server in Fig. 1, to comprise the following steps.

[0061] Step 202 ¨ obtaining an image to be detected.

[0062] The image to be detected is an interactive behavior image between a human being and an Date Recue/Date Received 2022-03-10 object to be detected.

[0063] In one of the embodiments, step 202 includes: the server obtains the image to be detected as collected by an image collection device at a preset first shooting angle, preferably, the preset first shooting angle is an overhead angle perpendicular to the ground or approximately perpendicular to the ground, and the image to be detected is of RGBD data.

[0064] In other words, the image to be detected is of RGBD data collected by an image collection device under an overhead angle scenario, the image collection device can be embodied as a depth camera disposed above a shelf, the first shooting angle can be not perpendicular to the ground, and can be any overhead angle close to being perpendicular insofar as the installation environment allows, so as to avoid shooting dead angle as far as possible.

[0065] The present technical solution makes use of a depth camera poised for the overhead angle to detect human-goods interactive behaviors, in comparison with the traditional camera installation mode whereby the camera is installed at a certain included angle with respect to the ground, the present technical solution effectively evades the problem in which both the human being and the shelf are shielded due to askance angle and the problem in which it is more difficult to trace the hand; in actual application, image collection at overhead angle makes it possible to better recognize the behaviors of picking up goods by different persons in turns.

[0066] Step 204 ¨ performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection.

[0067] The detection model is a human body posture detection model that can be used to detect key points of the human skeleton.

Date Recue/Date Received 2022-03-10

[0068] Specifically, the server inputs a human body image to the detection model; human body posture detection is performed on the human body image in the detection model;
human body posture information and hand position information output by the detection model are obtained; the human body posture detection can be via a common skeleton line detecting method, the human body posture information as obtained is an image of human skeletal key points, and the hand position information is the specific position of a hand in the image of human skeletal key points.

[0069] Step 206 ¨ tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image.

[0070] Specifically, a target tracing algorithm is employed, such as the Camshift algorithm adaptable to changes in size and shape of a moving target, to trace the motion tracks of the human body and the hand, respectively, to obtain human body motion track information, and to enlarge the hand position in the tracing process to obtain a hand area image.

[0071] Step 208 ¨ performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition.

[0072] The classification recognition model is an article recognition model that can be trained by deep learning.

[0073] Specifically, the hand area image is input to the classification recognition model, and the hand area image is detected in the classification recognition model to judge whether an Date Recue/Date Received 2022-03-10 article is held in the hand area, in the case there is an article being held, the classification recognition model recognizes the article and outputs an article recognition result;
moreover, the classification recognition model can further make skin color judgment on the hand area image, and timely send out early warning against the behavior of intentional shielding of the hand by means of such articles as clothes, so as to achieve the objective of reducing goods loss.

[0074] Step 210¨ obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0075] The first interactive behavior recognition result is a human-article interactive behavior recognition result.

[0076] Specifically, the human body motion track information can be used to judge behavioral actions of a human being, for example, hand stretching, bending, stooping and squatting, etc., and to judge whether any article is held in the hand, when an article is held in the hand, the article is recognized to obtain an article recognition result, whereby it is possible to judge that the human body is picking up or putting down the article, namely to analyze to obtain a human-article interactive behavior recognition result.

[0077] In the interactive behavior recognizing method provided by the present technical solution, interactive behavior recognition is performed on the image to be detected through the detection model and the classification recognition model, and it is made possible, through model training and algorithm tuning, to automatically recognize interactive behaviors between human beings and articles, and the recognition result is made more precise;
moreover, only few data is required to be collected on the basis of the current detection model and classification recognition model to be deployed in different scenarios, stronger transferability is achieved, and lower deployment cost is spent.
Date Recue/Date Received 2022-03-10

[0078] In one of the embodiments, as shown in Fig. 3, the method comprises the following steps.

[0079] Step 302 ¨ obtaining an image to be detected.

[0080] Step 304 ¨ preprocessing the image to be detected, and obtaining a human body image in the image to be detected.

[0081] Step 304 is a process to extract a human body image that is required to be used in subsequent steps from the image to be detected, and the unwanted background image is shielded out.

[0082] Specifically, the preprocessing can be background modeling, in other words, background modeling based on Gaussian mixture is performed on the image to be detected, and a background model is obtained.

[0083] A human body image in the image to be detected is obtained according to the image to be detected and the background model.

[0084] Step 306 ¨ performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

[0085] Step 308 ¨ tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image.

[0086] Step 310 ¨ performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the Date Recue/Date Received 2022-03-10 classification recognition model is employed to perform article recognition.

[0087] Step 312¨ obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0088] In this embodiment, unwanted background image is shielded out in step 304 by preprocessing the image to be detected, only the human body image to be subsequently used is retained, whereby the volume of data to be processed in the following steps is reduced, and data processing efficiency is enhanced.

[0089] In one of the embodiments, the method further comprises:

[0090] A - obtaining human body position information according to the image to be detected;
wherein

[0091] the human body position information can indicate position information in a three-dimensional world coordinate system.

[0092] Specifically, collection position information of the image to be detected in the three-dimensional world coordinate system is obtained, three-dimensional world coordinate transformation is performed according to the position information of the human body image in the image to be detected and the collection position information, and position information of the human body in the three-dimensional world coordinate system is obtained.

[0093] B - obtaining a second interactive behavior recognition result according to the human body motion track information, the article recognition result, the human body position information and a preset shelf information, wherein the second interactive behavior recognition result is a human-goods interactive behavior recognition result.

[0094] The shelf information includes shelf position information and information of articles in Date Recue/Date Received 2022-03-10 the shelf, of which the shelf position information is a three-dimensional world coordinate position where the shelf locates.

[0095] Specifically, shelf information to which the human body position corresponds is obtained according to the human body position information and the preset shelf information; an interactive behavior between the human body and the shelf is determined by tracing the three-dimensional world coordinate positions where the human body and the shelf locate, and the occurrence of a valid human-goods interactive behavior is further determined in the tracing process by recognizing whether the hand area has any commodity associated with the shelf, the valid human-goods interactive behavior here can be a behavior of a customer completing one round of picking up goods from the shelf.

[0096] The present technical solution converts out the position of a customer in the world coordinate system through three-dimensional world coordinate transformation, and the association thereof with the shelf makes it possible to recognize whether the customer has effected a valid human-goods interactive behavior; on the other hand, on the basis of the recognition of the human-goods interactive behavior in conjunction with the article recognition result, under the premise that the shelf stock is known, it is possible to indirectly achieve counting of the existing stock by monitoring the number of times of valid interactions between humans and the shelf, in case of short supply, the server can timely remind shop assistants to administer the stock, whereby manual stock-taking cost is greatly reduced.

[0097] In one of the embodiments, as shown in Fig. 4, the method further comprises detection module training steps, which specifically include the following steps.

[0098] Step 402 ¨ obtaining sample image data.

[0099] Specifically, an image data collected by the image collection device at a preset second Date Recue/Date Received 2022-03-10 shooting angle within a preset time frame is obtained, i.e., interactive behavioral image data of a certain magnitude is collected; sample image data containing human-goods interactive behaviors are screened and obtained from the collected image data, the preset second shooting angle can be an overhead angle perpendicular to the ground or approximately perpendicular to the ground, and the sample image data is of RGBD data.

[0100] Step 404 ¨ marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data.

[0101] Specifically, the sample image data should essentially cover different human-goods interactive behaviors in actual scenarios, and it is further possible to enhance the sample data, increase the volume of the sample image data, and raise the proportion of training samples with large posture amplitudes in the interactive behavioral process, for instance, to raise the proportion of such human-goods interactive behavioral postures as bending, stooping and squatting etc., so as to enhance detection precision of the detection model.
During the process of specific implementation, a part of the first marked image data can be taken to serve as a training dataset, while the remaining part serves as a verification dataset.

[0102] Step 406 ¨ performing an image enhancing process on the first marked image data, and obtaining a first training dataset; during the process of specific implementation, the image enhancing process is performed on the training dataset in the first marked image data to obtain a first training dataset.

[0103] Specifically, the image enhancing process can include any one or more of the following image transforming methods, such as image normalization, random clipping of images, image zooming, image rollover, image affine transformation, image contrast change, image hue change, image saturation change, and adding tone interference blocks to images, etc.

Date Recue/Date Received 2022-03-10

[0104] Step 408 ¨ inputting the first training dataset to an HRNet model for training, and obtaining the detection model. Specifically, different network architectures of the HRNet model can be employed to train human body posture detection models, various models obtained through training by the different network architectures are then verified and appraised through the verification dataset, and a model with the optimal effect is selected to serve as the detection model.

[0105] In one of the embodiments, as shown in Fig. 5, the method further comprises classification recognition module training steps, which specifically include the following steps.

[0106] Step 502 ¨ obtaining sample image data.

[0107] Step 504¨ marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data.

[0108] Step 506 ¨ performing an image enhancing process on the second marked image data, and obtaining a second training dataset.

[0109] Specifically, the image enhancing process can include any one or more of the following image transforming methods, such as image normalization, random clipping of images, image zooming, image rollover, image affine transformation, image contrast change, image hue change, image saturation change, and adding tone interference blocks to images, etc.

[0110] Step 508 ¨ inputting the second training dataset to a yo1ov3-tiny network or a vgg16 network for training, and obtaining the preset classification recognition model.
Date Recue/Date Received 2022-03-10

[0111] The present technical solution collects RGBD data through a depth camera with angle perpendicular or approximately perpendicular to the ground, then manually sorts and collects the RGBD data containing human-goods interactive behaviors to serve as training samples, namely sample image data, employs deep learning training, and recognizes different postures of the human body with the trained model results, whereby the detection model can more flexibly and precisely recognize interactive behaviors, and possesses stronger transferability.

[0112] As should be understood, although the various steps in the flowcharts of Figs. 2-5 are sequentially displayed as indicated by arrows, these steps are not necessarily executed in the sequences indicated by arrows. Unless otherwise explicitly noted in this paper, execution of these steps is not restricted by any sequence, as these steps can also be executed in other sequences (than those indicated in the drawings). Moreover, at least partial steps in the flowcharts of Figs. 2-5 may include plural sub-steps or multi-phases, these sub-steps or phases are not necessarily completed at the same timing, but can be executed at different timings, and these sub-steps or phases are also not necessarily sequentially performed, but can be performed in turns or alternately with other steps or with at least some of sub-steps or phases of other steps.

[0113] There is provided an interactive behavior recognizing device, as shown in Fig. 6, the device comprises a first obtaining module 602, a first detecting module 604, a tracing module 606, a second detecting module 608 and a first interactive behavior recognizing module 610, of which:

[0114] the first obtaining module 602 is employed for obtaining an image to be detected;

[0115] the first detecting module 604 is employed for performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;

[0116] the tracing module 606 is employed for tracing the human body posture according to the Date Recue/Date Received 2022-03-10 human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;

[0117] the second detecting module 608 is employed for performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and

[0118] the first interactive behavior recognizing module 610 is employed for obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0119] In one of the embodiments, the first detecting module 604 is further employed for preprocessing the image to be detected, and obtaining a human body image in the image to be detected; and performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

[0120] In one of the embodiments, the device further comprises:

[0121] a human body position module, for obtaining human body position information according to the image to be detected; and

[0122] a second interactive behavior recognizing module, for obtaining a second interactive behavior recognition result according to the human body motion track information, the article recognition result, the human body position information and a preset shelf information, wherein the second interactive behavior recognition result is a human-goods interactive behavior recognition result.

[0123] In one of the embodiments, the first obtaining module 602 is further employed for obtaining the image to be detected as collected by an image collection device at a preset first shooting angle; preferably, the preset first shooting angle is an overhead angle Date Recue/Date Received 2022-03-10 perpendicular to the ground, and the image to be detected is of RGBD data.

[0124] In one of the embodiments, the device further comprises:

[0125] a second obtaining module, for obtaining sample image data;

[0126] a first marking module, for marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data;

[0127] a first enhancing module, for performing an image enhancing process on the first marked image data, and obtaining a first training dataset; and

[0128] a first training module, for inputting the first training dataset to an HRNet model for training, and obtaining the detection model.

[0129] In one of the embodiments, the device further comprises:

[0130] a second marking module, for marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data;

[0131] a second enhancing module, for performing an image enhancing process on the second marked image data, and obtaining a second training dataset; and

[0132] a second training module, for inputting the second training dataset to a yo1ov3-tiny network or a vgg16 network for training, and obtaining the preset classification recognition model.

[0133] In one of the embodiments, the second obtaining module is further employed for obtaining an image data collected by the image collection device at a preset second shooting angle within a preset time frame; and screening from the collected image data to obtain sample image data containing human-goods interactive behaviors, preferably, the preset second shooting angle is an overhead angle perpendicular to the ground, and the sample image data is of RGBD data.

[0134] Specific definitions relevant to the interactive behavior recognizing device may be Date Recue/Date Received 2022-03-10 inferred from the aforementioned definitions to the interactive behavior recognizing method, while no repetition is made in this context. The various modules in the aforementioned interactive behavior recognizing device can be wholly or partly realized via software, hardware, and a combination of software with hardware. The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.

[0135] In one embodiment, a computer equipment is provided, the computer equipment can be a server, and its internal structure can be as shown in Fig. 7. The computer equipment comprises a processor, a memory, a network interface, and a database connected to each other via a system bus. The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system, a computer program and a database. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The database of the computer equipment is employed to store data. The network interface of the computer equipment is employed to connect to an external terminal via network for communication. The computer program realizes an interactive behavior recognizing method when it is executed by a processor.

[0136] As understandable to persons skilled in the art, the structure illustrated in Fig. 7 is merely a block diagram of partial structure relevant to the solution of the present application, and does not constitute any restriction to the computer equipment on which the solution of the present application is applied, as the specific computer equipment may comprise component parts that are more than or less than those illustrated in Fig. 7, or may combine certain component parts, or may have different layout of component parts.

Date Recue/Date Received 2022-03-10

[0137] In one embodiment, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
obtaining an image to be detected; performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection; tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image; performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0138] In one embodiment, when the processor executes the computer program, the following steps are further realized: the step of performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information includes: preprocessing the image to be detected, and obtaining a human body image in the image to be detected; and performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

[0139] In one embodiment, when the processor executes the computer program, the following steps are further realized: obtaining human body position information according to the image to be detected; and obtaining a second interactive behavior recognition result according to the human body motion track information, the article recognition result, the human body position information and a preset shelf information, wherein the second Date Recue/Date Received 2022-03-10 interactive behavior recognition result is a human-goods interactive behavior recognition result.

[0140] In one embodiment, when the processor executes the computer program, the following steps are further realized: the step of obtaining an image to be detected includes: obtaining the image to be detected as collected by an image collection device at a preset first shooting angle; wherein preferably, the preset first shooting angle is an overhead angle perpendicular to the ground, and the image to be detected is of RGBD data.

[0141] In one embodiment, when the processor executes the computer program, the following steps are further realized: obtaining sample image data; marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data; performing an image enhancing process on the first marked image data, and obtaining a first training dataset; and inputting the first training dataset to an HRNet model for training, and obtaining the detection model.

[0142] In one embodiment, when the processor executes the computer program, the following steps are further realized: marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data; performing an image enhancing process on the second marked image data, and obtaining a second training dataset; and inputting the second training dataset to a convolutional neural network for training, and obtaining the preset classification recognition model.

[0143] In one embodiment, when the processor executes the computer program, the following steps are further realized: the step of obtaining sample image data includes:
obtaining an image data collected by the image collection device at a preset second shooting angle within a preset time frame; and screening from the collected image data to obtain sample image data containing human-goods interactive behaviors, wherein, preferably, the preset Date Recue/Date Received 2022-03-10 second shooting angle is an overhead angle perpendicular to the ground, and the sample image data is of RGBD data.

[0144] In one embodiment, there is provided a computer-readable storage medium storing thereon a computer program, and the following steps are realized when the computer program is executed by a processor: obtaining an image to be detected;
performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection; tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image; performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

[0145] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the step of performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information includes: preprocessing the image to be detected, and obtaining a human body image in the image to be detected; and performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

[0146] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: obtaining human body position information according to the Date Recue/Date Received 2022-03-10 image to be detected; and obtaining a second interactive behavior recognition result according to the human body motion track information, the article recognition result, the human body position information and a preset shelf information, wherein the second interactive behavior recognition result is a human-goods interactive behavior recognition result.

[0147] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the step of obtaining an image to be detected includes: obtaining the image to be detected as collected by an image collection device at a preset first shooting angle; wherein preferably, the preset first shooting angle is an overhead angle perpendicular to the ground, and the image to be detected is of RGBD data.

[0148] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: obtaining sample image data; marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data; performing an image enhancing process on the first marked image data, and obtaining a first training dataset; and inputting the first training dataset to an HRNet model for training, and obtaining the detection model.

[0149] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data; performing an image enhancing process on the second marked image data, and obtaining a second training dataset; and inputting the second training dataset to a convolutional neural network for training, and obtaining the preset classification recognition model.

[0150] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the step of obtaining sample image data includes:
obtaining an Date Recue/Date Received 2022-03-10 image data collected by the image collection device at a preset second shooting angle within a preset time frame; and screening from the collected image data to obtain sample image data containing human-goods interactive behaviors, wherein, preferably, the preset second shooting angle is an overhead angle perpendicular to the ground, and the sample image data is of RGBD data.

[0151] As comprehensible to persons ordinarily skilled in the art, the entire or partial flows in the methods according to the aforementioned embodiments can be completed via a computer program instructing relevant hardware, the computer program can be stored in a nonvolatile computer-readable storage medium, and the computer program can include the flows as embodied in the aforementioned various methods when executed. Any reference to the memory, storage, database or other media used in the various embodiments provided by the present application can all include nonvolatile and/or volatile memory/memories. The nonvolatile memory can include a read-only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM) or a flash memory. The volatile memory can include a random access memory (RAM) or an external cache memory. To serve as explanation rather than restriction, the RAM is obtainable in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.

[0152] Technical features of the aforementioned embodiments are randomly combinable, while all possible combinations of the technical features in the aforementioned embodiments are not exhausted for the sake of brevity, but all these should be considered to fall within the scope recorded in the description as long as such combinations of the technical features are not mutually contradictory.

Date Recue/Date Received 2022-03-10

[0153] The foregoing embodiments are merely directed to several modes of execution of the present application, and their descriptions are relatively specific and detailed, but they should not be hence misunderstood as restrictions to the inventive patent scope. As should be pointed out, persons with ordinary skill in the art may further make various modifications and improvements without departing from the conception of the present application, and all these should pertain to the protection scope of the present application.
Accordingly, the patent protection scope of the present application shall be based on the attached Claims.
Date Recue/Date Received 2022-03-10

Claims

CA 03154025 2022-03-10What is claimed is:

1. An interactive behavior recognizing method, characterized in comprising:
obtaining an image to be detected;
performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;
tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;
performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition; and obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

2. The method according to Claim 1, characterized in that the step of performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information includes:
preprocessing the image to be detected, and obtaining a human body image in the image to be detected; and performing human body posture detection on the human body image through the preset detection model, and obtaining the human body posture information and the hand position information.

3. The method according to Claim 2, characterized in further comprising:
obtaining human body position information according to the image to be detected; and obtaining a second interactive behavior recognition result according to the human body motion Date Recue/Date Received 2022-03-10 track information, the article recognition result, the human body position information and a preset shelf information, wherein the second interactive behavior recognition result is a human-goods interactive behavior recognition result.

4. The method according to Claim 3, characterized in that the step of obtaining an image to be detected includes:
obtaining the image to be detected as collected by an image collection device at a preset first shooting angle; wherein preferably, the preset first shooting angle is an overhead angle perpendicular to the ground, and the image to be detected is of RGBD data.

5. The method according to anyone of Claims 1 to 4, characterized in further comprising:
obtaining sample image data;
marking key points and hand position of a human body image in the sample image data, and obtaining a first marked image data;
performing an image enhancing process on the first marked image data, and obtaining a first training dataset; and inputting the first training dataset to an HRNet model for training, and obtaining the detection model.

6. The method according to Claim 5, characterized in further comprising:
marking a hand area in the sample image data and performing article category marking on an article located in the hand area, and obtaining a second marked image data;
performing an image enhancing process on the second marked image data, and obtaining a second training dataset; and inputting the second training dataset to a convolutional neural network for training, and obtaining the preset classification recognition model, wherein, preferably, the convolutional neural network is a yolov3-tiny network or a vgg16 network.

Date Recue/Date Received 2022-03-10

7. The method according to Claim 6, characterized in that the step of obtaining sample image data includes:
obtaining an image data collected by the image collection device at a preset second shooting angle within a preset time frame; and screening from the collected image data to obtain sample image data containing human-goods interactive behaviors, wherein, preferably, the preset second shooting angle is an overhead angle perpendicular to the ground, and the sample image data is of RGBD data.

8. An interactive behavior recognizing device, characterized in comprising:

a first obtaining module, for obtaining an image to be detected;
a first detecting module, for performing human body posture detection on the image to be detected through a preset detection model, and obtaining human body posture information and hand position information, wherein the detection model is employed to perform human body posture detection;
a tracing module, for tracing the human body posture according to the human body posture information to obtain human body motion track information, and performing a target tracing on the hand position according to the hand position information to obtain a hand area image;
a second detecting module, for performing article recognition on the hand area image through a preset classification recognition model, and obtaining an article recognition result, wherein the classification recognition model is employed to perform article recognition;
and a first interactive behavior recognizing module, for obtaining a first interactive behavior recognition result according to the human body motion track information and the article recognition result.

9. A computer equipment, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, characterized in that the method steps according to anyone of Claims 1 to 7 are realized when the processor executes the computer program.

10. A computer-readable storage medium, storing a computer program thereon, characterized Date Recue/Date Received 2022-03-10 in that the method steps according to anyone of Claims 1 to 7 are realized when the computer program is executed by a processor.

Date Recue/Date Received 2022-03-10