WO2021093329A1 - 交互行为识别方法、装置、计算机设备和存储介质 - Google Patents
交互行为识别方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2021093329A1 WO2021093329A1 PCT/CN2020/097002 CN2020097002W WO2021093329A1 WO 2021093329 A1 WO2021093329 A1 WO 2021093329A1 CN 2020097002 W CN2020097002 W CN 2020097002W WO 2021093329 A1 WO2021093329 A1 WO 2021093329A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- pedestrian
- preset
- detected
- key points
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Definitions
- This application relates to the field of computer vision technology, and in particular to an interactive behavior recognition method, device, computer equipment, and storage medium.
- the traditional human-goods interaction behavior recognition method generally uses sound, light, electricity and other sensor devices to realize behavior recognition, which requires high hardware costs, and its use scenarios are limited, and cannot be applied to complex environments such as supermarkets on a large scale; supermarket monitoring The equipment generates a large amount of video data every day. Analyzing the surveillance video can obtain a lot of information about the interaction between people and goods, but this requires a lot of manpower and also has the problem of low efficiency.
- An interactive behavior identification method which includes:
- the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
- the interaction behavior information between the pedestrian and the corresponding item rack is determined.
- the preset item rack image is a preset item rack mask image
- the interaction behavior information between the pedestrian and the corresponding item rack is determined based on the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, including :
- the pedestrian's hand area is obtained
- the method further includes:
- the method further includes:
- the item rack area to which the pedestrian is facing is obtained.
- obtaining the pedestrian's orientation information according to the key points of the pedestrian includes:
- the key points of the shoulder include the key points of the left shoulder and the key points of the right shoulder;
- the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
- the preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
- the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing the side of the image to be detected;
- orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
- acquiring the image to be detected includes:
- the method further includes:
- the labeled image data is input into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
- a recognition device for human-goods interaction behavior comprising:
- the acquisition module is used to acquire the image to be detected
- the detection module is used to input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected.
- the key points are located inside the detection frame.
- the multi-task model is used for pedestrian detection and human key point detection ;
- the recognition module is used to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
- a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
- the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
- the interaction behavior information between the pedestrian and the corresponding item rack is determined.
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
- the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
- the interaction behavior information between the pedestrian and the corresponding item rack is determined.
- the above-mentioned interactive behavior recognition method, device, computer equipment and storage medium obtain the image to be detected.
- the key points and detection frames of pedestrians in the image to be detected are obtained.
- the multi-task model of pedestrian detection and human body key point detection can simultaneously obtain pedestrian detection frame and human body key points, which improves the efficiency of image processing; key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame to achieve integration
- Use the detection frame and key points to improve the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, determine the interaction behavior information of the pedestrian and the corresponding item rack, which can efficiently identify the interaction behavior , And improve the recognition accuracy.
- Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment
- Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment
- FIG. 3 is a schematic flowchart of an interactive behavior judgment step in an embodiment
- Figure 5 is a structural block diagram of an interactive behavior recognition device in an embodiment
- Fig. 6 is an internal structure diagram of a computer device in an embodiment.
- the interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1.
- the terminal 102 communicates with the server 104 through the network through the network.
- the terminal 102 can be, but is not limited to, various image acquisition devices.
- the terminal 102 can be an existing monitoring device in a shopping mall, supermarket or library, and the server 104 can be an independent server or a server cluster composed of multiple servers. achieve.
- an interactive behavior recognition method is provided.
- the method is applied to the server in FIG. 1 as an example for description, including the following steps:
- Step 202 Obtain an image to be detected.
- the image to be detected is an image with pedestrians collected by an image acquisition device.
- the above-mentioned image acquisition device may be a monitoring device that has been installed and used in a target place such as a shopping mall, a supermarket or a library, such as an existing camera in the target place, without the need to monitor the target.
- the site is transformed, and the deployment cost is low.
- the surveillance video is acquired through the camera, and pictures with pedestrians are selected from the surveillance video as the image to be detected.
- Step 204 Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected.
- the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection.
- the multi-task model can obtain the detection frame of the pedestrian in the image to be detected through pedestrian detection, and at the same time, the key points of the human body are detected to obtain the key points of the pedestrian, so as to achieve the synchronization of the detection frame and key points of the pedestrian, and the feature sharing between different tasks. It reduces the amount of calculation, reduces the hardware resource occupation, and shortens the processing time of a single frame image. It can process the images to be detected obtained from multiple cameras at the same time, and realize the parallel processing of multiple cameras.
- the acquired image to be detected is input into a preset multi-task model.
- the multi-task model performs pedestrian detection and human key point detection on the image to be detected.
- the multi-task model can exclude The key points outside the detection frame make the output key points all located inside the detection frame.
- the multi-task model can output the key points and the detection frame of the pedestrian in the image to be detected.
- N is the number of pedestrians in the image to be detected
- Step 206 Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
- the existing cameras, target location layouts, and item racks are pre-positioned and marked, and each camera is configured with a corresponding preset item rack image. It is known that the image to be detected is obtained through one of the cameras, which can be seen by the same.
- the images to be detected acquired by one camera all correspond to the aforementioned cameras, so the images to be detected also correspond to the preset item rack images configured by the aforementioned cameras.
- the image to be detected is acquired, and the key points and detection frames of pedestrians in the image to be detected are obtained by inputting the image to be detected into a preset multi-task model.
- This method is used for pedestrian detection and key points of the human body.
- the multi-task detection model can obtain the pedestrian detection frame and the key points of the human body simultaneously, which improves the efficiency of image processing; the key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame, so as to achieve comprehensive utilization of the detection frame and key points ,
- the purpose of improving the accuracy of the key point labeling according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information of the pedestrian and the corresponding item rack can be determined, which can efficiently identify the interaction behavior and improve the recognition accuracy rate ; And this method can realize the full-process automatic processing without manual intervention, which greatly reduces labor costs.
- the preset item rack image is a preset item rack mask image
- the preset item rack mask image may be a frame of image extracted from a large number of surveillance videos, and then labeled with a polygon The image obtained from the outline of the item rack in the image; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined, including:
- Step 302 selecting a wrist key point among pedestrian key points
- the wrist key point data includes left wrist key point data and right wrist key point data.
- Step 304 Obtain the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;
- the left-hand area and the right-hand area are divided into the left-hand area and the right-hand area by taking the left-hand and right-hand key points as the center of the circle respectively, and the preset radius threshold is the radius, so as to obtain the image of the left-hand area and the image of the right-hand area.
- Step 306 Determine whether the intersecting area of the image of the hand area and the preset mask image of the article rack is greater than a preset area threshold
- Step 308 if yes, determine that the pedestrian interacts with the corresponding item rack
- step 310 if not, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
- the hand area includes a left-hand area and a right-hand area. Specifically, when the intersecting area of the image of at least one of the left-hand area and the right-hand area and the preset item rack mask image is greater than the preset area threshold At the time, it is determined that the pedestrian interacts with the corresponding item rack; otherwise, it is determined that the pedestrian does not interact with the corresponding item rack.
- E.g Indicates the hand area with the left wrist as the center and R as the radius, that is, the left hand area; Indicates the hand area with the right wrist as the center and R as the radius, that is, the right hand area;
- the preset area threshold is 150 unit area.
- an interactive behavior recognition method judges the interactive behavior by directly estimating the intersection area of the hand and the item rack, which is simple and easy to implement, has strong scalability and fast calculation speed.
- the real-time performance is better; this method is usually used for the recognition of human-goods interaction behaviors in shopping malls and supermarkets.
- the item racks are shelves in the shopping malls and supermarkets.
- this method can also be used for the recognition of human-object interaction behaviors in other places, such as libraries. At this time, the item rack is the library shelf.
- the method further includes:
- the center point of the detection frame is selected as the positioning point, which is convenient to select, and the center point can more accurately indicate the position of the pedestrian.
- the preset coordinate mapping relationship is the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system; specifically, the position of the image acquisition device in the world coordinate system is pre-calibrated through the position information of the image acquisition device, namely The coordinate position of the image to be detected in the world coordinate system collected by the image acquisition device can be obtained, so as to infer the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system.
- the preset time period is the time from when pedestrians enter the target place to when they leave the target place.
- the route map of the pedestrian within the preset time period is the route that the pedestrian passes from entering the target place to exiting the target place, that is, the pedestrian's moving line diagram.
- an interactive behavior recognition method can obtain the pedestrian's route map within a preset time period according to the pedestrian's detection frame and the preset coordinate mapping relationship, which is convenient to record the pedestrian's The movement trajectory in the target place.
- This method is applied to a shopping mall or supermarket, you can intuitively observe the data of the customer's movement route in the supermarket from entering to leaving. The staff can adjust the supermarket layout based on these data to make it more updated. Adapt to customers' shopping habits.
- the method further includes:
- shoulder key points include left shoulder key points And right shoulder key points
- the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
- the preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
- orientation angle When the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing one side of the image to be detected; when the orientation angle is greater than or equal to 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
- the item rack area to which the pedestrian is facing is obtained. Specifically, according to the orientation of the pedestrian in the image to be detected and the preset item rack image corresponding to the image to be detected, the area of the item rack to which the pedestrian is oriented can be obtained.
- an interactive behavior recognition method which uses the key point data of the shoulder to calculate the direction of the pedestrian, and the result of the direction is more robust, so as to determine the shelf area of the customer’s attention, which can be a business Super product placement provides reference.
- acquiring the image to be detected includes:
- the above-mentioned image acquisition equipment generally adopts webcam.
- an interactive behavior recognition method is provided.
- the method directly utilizes existing monitoring equipment at the target location, such as a camera in a shopping mall or supermarket, without the need to modify the venue, has low deployment cost, and is easy to promote.
- the method further includes:
- sample images specifically, obtain surveillance videos of shopping malls and supermarkets, and filter out a large number of images with pedestrians as sample images from the surveillance videos.
- the neural network model adopts the ResNet-101+FPN network model, which is a one-stage bottom-up multi-task network model, which is more similar to similar models. Compared with the stage algorithm, it saves processing time; compared with the top-down algorithm, the processing time does not change with the number of people in the picture.
- an interactive behavior recognition method which processes the images to be detected by establishing and training a multi-task model.
- the training and optimization of the model are completed in the background without affecting places such as shopping malls, supermarkets or libraries.
- the operation of the model; and the model has strong generalization ability, which can be easily and quickly deployed; the features can be shared between different tasks of the multi-task model, which reduces the amount of calculation, reduces the hardware resource occupation, shortens the processing time of a single frame, and realizes the parallelism of multiple cameras deal with.
- the method includes the following steps:
- Step 402 Obtain surveillance video of the target location
- Step 404 Filter out an image with pedestrians from the surveillance video as an image to be detected
- Step 406 Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected, and the key points are all located inside the detection frame;
- Step 408 Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected;
- Step 410 Obtain a route map of the pedestrian in a preset time period according to the mapping relationship between the detection frame of the pedestrian and the preset coordinate;
- Step 412 Obtain the direction information of the pedestrian according to the key points of the pedestrian.
- an interactive behavior recognition device which includes an acquisition module 502, a detection module 504, and an identification module 506, wherein:
- the obtaining module 502 is used to obtain the image to be detected
- the detection module 504 is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected.
- the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key points Detection
- the recognition module 506 is configured to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
- the preset item rack image is a preset item rack mask image
- the aforementioned recognition module 506 includes:
- the first key point selection unit is used to select the key point of the wrist among the key points of the pedestrian;
- the hand area unit is used to obtain the pedestrian's hand area according to the key points of the wrist and the preset radius threshold;
- the interaction determination unit is used to determine when the intersecting area of the hand area image and the preset item rack mask image is greater than the preset area threshold, the pedestrian interacts with the corresponding item rack; when the hand area image and the preset item rack are hidden When the intersecting area of the modular image is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
- the device further includes:
- the first position coordinate module is used to select any point in the detection frame of the pedestrian as the positioning point, and set the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;
- the second position coordinate module is used to map the first position coordinates of the pedestrian to the world coordinate system according to the preset coordinate mapping relationship to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;
- the route map module is used to collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.
- the device further includes:
- the orientation information module is used to obtain the orientation information of the pedestrian according to the key points of the pedestrian;
- the orientation area module is used to obtain the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
- the above-mentioned orientation information module includes:
- the second key point selection unit is used to select the shoulder key points among the key points of pedestrians.
- the shoulder key points include the left shoulder key point and the right shoulder key point;
- the direction angle calculation unit is used to calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector;
- the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
- the preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the heading angle of the pedestrian;
- the orientation determination unit is used to determine that the pedestrian is facing one side of the image to be detected when the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ ; when the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , determine that the pedestrian is facing the other side of the image to be detected .
- the above-mentioned obtaining module 502 includes:
- the video acquisition unit is used to acquire the surveillance video of the target location
- the image acquisition unit is used to screen out the image with pedestrians from the surveillance video as the image to be detected.
- the device further includes:
- the sample acquisition module is used to acquire sample images
- the sample data module is used to label the pedestrians in the sample image with key points and check boxes to obtain labeled image data
- the model training module is used to input the labeled image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
- Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
- the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
- the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
- the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the database of the computer equipment is used to store data.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer program is executed by the processor to realize an interactive behavior identification method.
- FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the computer program, the following steps are implemented: acquiring an image to be detected;
- the detection image is input into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected.
- the key points are located inside the detection frame.
- the multi-task model is used for pedestrian detection and human key point detection; according to the key points and key points of the pedestrian
- the preset item rack image corresponding to the image to be detected determines the interaction behavior information between pedestrians and the corresponding item rack.
- the processor further implements the following steps when executing the computer program: the preset item rack image is the preset item rack mask image, and the pedestrian is determined according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected
- the step of interacting behavior information with the corresponding item rack includes: selecting the wrist key point among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the wrist key point and the preset radius threshold; when the hand area image and When the intersection area of the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold, It is determined that there is no interaction between the pedestrian and the corresponding item rack.
- the processor further implements the following steps when executing the computer program: selecting any point in the detection frame of the pedestrian as an anchor point, and setting the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian.
- the second position coordinates are the position coordinates of the pedestrian in the world coordinate system; the pedestrian is collected at the preset time The second position coordinates of each time point in the segment are obtained, and the route map of the pedestrian in the preset time period is obtained.
- the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the key points of the pedestrian; obtaining the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
- the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the pedestrian's key points, including: selecting the shoulder key points of the pedestrian key points, and the shoulder key points include the left shoulder key Point and right shoulder key point; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector.
- the preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the heading angle of the pedestrian; when the heading angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing the image to be detected When the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
- the processor further implements the following steps when executing the computer program: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and filtering out images with pedestrians from the surveillance video as the image to be detected.
- the processor further implements the following steps when executing the computer program: acquiring a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain labeled image data; and inputting the labeled image data into the neural network model Perform training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
- a computer-readable storage medium on which a computer program is stored.
- the following steps are implemented: acquiring the image to be detected; inputting the image to be detected into a preset multitasking model , Obtain the key points and detection frame of pedestrians in the image to be detected.
- the key points are located inside the detection frame.
- the multi-task model is used for pedestrian detection and human key point detection; according to the key points of the pedestrian and the preset item rack corresponding to the image to be detected Image, determine the interaction behavior information of pedestrians and corresponding item racks.
- the preset item rack image is the preset item rack mask image
- the preset item rack image corresponding to the key points of the pedestrian and the image to be detected is determined
- the step of interactive behavior information between pedestrians and corresponding item racks includes: selecting the key points of the wrist among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold; when the image of the hand area is When the intersection area with the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold , It is determined that there is no interaction between the pedestrian and the corresponding item rack.
- the following steps are also implemented: select any point in the detection frame of the pedestrian as an anchor point, and set the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian ; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system, and the second position coordinates of the pedestrian are obtained.
- the second position coordinates are the position coordinates of the pedestrian in the world coordinate system; collect pedestrians in the preset The second position coordinates of each time point in the time period are obtained, and the route map of the pedestrian in the preset time period is obtained.
- the following steps are further implemented: obtaining the pedestrian's orientation information according to the key points of the pedestrian; and obtaining the item rack area to which the pedestrian is facing according to the pedestrian's orientation information and the preset item rack image.
- the following steps are further implemented: obtaining the orientation information of the pedestrian according to the key points of the pedestrian, including: selecting the shoulder key point among the key points of the pedestrian, and the shoulder key point includes the left shoulder Key points and right shoulder key points; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector, and the preset unit vector Is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the direction angle of the pedestrian; when the direction angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing to be detected One side of the image; when the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
- acquiring the image to be detected includes: acquiring a surveillance video of the target location; and screening an image with pedestrians from the surveillance video as the image to be detected.
- the following steps are also implemented: obtaining a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain annotated image data; inputting the annotated image data into the neural network model
- the multi-task model is obtained by training in the process; preferably, the neural network model adopts the ResNet-101+FPN network model.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
Description
Claims (10)
- 一种交互行为识别方法,其特征在于,所述方法包括:获取待检测图像;将所述待检测图像输入预设的多任务模型,得到所述待检测图像中行人的关键点和检测框,所述关键点均位于所述检测框内部,所述多任务模型用于行人检测和人体关键点检测;根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息。
- 根据权利要求1所述的方法,其特征在于,所述预设物品架图像为预设物品架掩模图像,所述根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息,包括:选取所述行人的关键点中的手腕关键点;根据所述手腕关键点和预设的半径阈值,得到所述行人的手部区域;当所述手部区域的图像和所述预设物品架掩模图像的相交面积大于预设面积阈值时,判定所述行人与对应物品架发生交互行为;当所述手部区域的图像和所述预设物品架掩模图像的相交面积小于或等于所述面积阈值时,判定所述行人与对应物品架未发生交互行为。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:选取所述行人的检测框中任一点作为定位点,将所述定位点在所述待测试图像中的位置坐标设定为所述行人的第一位置坐标;根据预设坐标映射关系,将所述行人的第一位置坐标映射到世界坐标系中,得到所述行人的第二位置坐标,所述第二位置坐标为所述行人在世界坐标系中的位置坐标;采集所述行人在预设时间段内各时间点的第二位置坐标,得到所述行人在所述预设时间段内的路线图。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:根据所述行人的关键点,得到所述行人的朝向信息;根据所述行人的朝向信息和所述预设物品架图像,得到所述行人朝向的物品架区域。
- 根据权利要求4所述的方法,其特征在于,所述根据所述行人的关键点,得到所述行人的朝向信息,包括:选取所述行人的关键点中的肩部关键点,所述肩部关键点包括左肩关键点和右肩关键点;对所述左肩关键点的坐标和所述右肩关键点的坐标求差,得到肩部向量;采用反余弦函数计算所述肩部向量与预设单位向量的夹角,所述预设单位向量为所述待检测图像的坐标系y轴负方向上的单位向量;对所述夹角的弧度值与π求和,得到所述行人的朝向角;当所述朝向角大于等于π且小于1.5π时,判定所述行人朝向所述待检测图像的一侧;当所述朝向角大于1.5π且小于等于2π时,判定所述行人朝向所述待检测图像的另一侧。
- 根据权利要求1至5任意一项所述的方法,其特征在于,所述获取待检测图像,包括:获取目标场所的监控视频;从所述监控视频中筛选出具有行人的图像作为所述待检测图像。
- 根据权利要求1至5任意一项所述的方法,其特征在于,所述方法还包括:获取样本图像;对所述样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;将所述标注图像数据输入神经网络模型中进行训练,得到所述多任务模型;优选地,所述神经网络模型采用ResNet-101+FPN网络模型。
- 一种交互行为识别装置,其特征在于,所述装置包括:获取模块,用于获取待检测图像;检测模块,用于将所述待检测图像输入预设的多任务模型,得到所述待检测图像中行人的关键点和检测框,所述关键点均位于所述检测框内部,所述多任务模型用于行人检测和人体关键点检测;识别模块,用于根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3160731A CA3160731A1 (en) | 2019-11-12 | 2020-06-19 | Interactive behavior recognizing method, device, computer equipment and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911100457.9 | 2019-11-12 | ||
CN201911100457.9A CN110991261A (zh) | 2019-11-12 | 2019-11-12 | 交互行为识别方法、装置、计算机设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021093329A1 true WO2021093329A1 (zh) | 2021-05-20 |
Family
ID=70083879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/097002 WO2021093329A1 (zh) | 2019-11-12 | 2020-06-19 | 交互行为识别方法、装置、计算机设备和存储介质 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110991261A (zh) |
CA (1) | CA3160731A1 (zh) |
WO (1) | WO2021093329A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516734A (zh) * | 2021-07-05 | 2021-10-19 | 西湖大学 | 一种基于自上而下深度学习架构的昆虫关键点自动标注方法及应用 |
CN114758239A (zh) * | 2022-04-22 | 2022-07-15 | 安徽工业大学科技园有限公司 | 基于机器视觉的物品飞离预定行进线路的监测方法和*** |
CN116862980A (zh) * | 2023-06-12 | 2023-10-10 | 上海玉贲智能科技有限公司 | 图像边缘的目标检测框位置优化校正方法、***、介质及终端 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991261A (zh) * | 2019-11-12 | 2020-04-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
CN113642361B (zh) * | 2020-05-11 | 2024-01-23 | 杭州萤石软件有限公司 | 一种跌倒行为的检测方法和设备 |
CN112307871A (zh) * | 2020-05-29 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | 信息采集方法和装置、关注度检测方法、装置和*** |
CN111611970B (zh) * | 2020-06-01 | 2023-08-22 | 城云科技(中国)有限公司 | 一种基于城管监控视频的乱扔垃圾行为检测方法 |
CN111798341A (zh) * | 2020-06-30 | 2020-10-20 | 深圳市幸福人居建筑科技有限公司 | 一种绿色物业管理方法、***计算机设备及其存储介质 |
CN111783724B (zh) * | 2020-07-14 | 2024-03-26 | 上海依图网络科技有限公司 | 一种目标对象识别方法和装置 |
CN112084984A (zh) * | 2020-09-15 | 2020-12-15 | 山东鲁能软件技术有限公司 | 一种基于改进的Mask RCNN的扶梯动作检测方法 |
CN112016528B (zh) * | 2020-10-20 | 2021-07-20 | 成都睿沿科技有限公司 | 行为识别方法、装置、电子设备及可读存储介质 |
CN112528850B (zh) * | 2020-12-11 | 2024-06-04 | 北京百度网讯科技有限公司 | 人体识别方法、装置、设备和存储介质 |
CN113377192B (zh) * | 2021-05-20 | 2023-06-20 | 广州紫为云科技有限公司 | 一种基于深度学习的体感游戏跟踪方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105245828A (zh) * | 2015-09-02 | 2016-01-13 | 北京旷视科技有限公司 | 物品分析方法和设备 |
CN106709422A (zh) * | 2016-11-16 | 2017-05-24 | 南京亿猫信息技术有限公司 | 超市购物车手部识别方法及其识别*** |
CN109934075A (zh) * | 2017-12-19 | 2019-06-25 | 杭州海康威视数字技术股份有限公司 | 异常事件检测方法、装置、***及电子设备 |
US20190266405A1 (en) * | 2016-10-26 | 2019-08-29 | Htc Corporation | Virtual reality interaction method, apparatus and system |
CN110991261A (zh) * | 2019-11-12 | 2020-04-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993067B (zh) * | 2019-03-07 | 2022-01-28 | 北京旷视科技有限公司 | 面部关键点提取方法、装置、计算机设备和存储介质 |
-
2019
- 2019-11-12 CN CN201911100457.9A patent/CN110991261A/zh active Pending
-
2020
- 2020-06-19 WO PCT/CN2020/097002 patent/WO2021093329A1/zh active Application Filing
- 2020-06-19 CA CA3160731A patent/CA3160731A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105245828A (zh) * | 2015-09-02 | 2016-01-13 | 北京旷视科技有限公司 | 物品分析方法和设备 |
US20190266405A1 (en) * | 2016-10-26 | 2019-08-29 | Htc Corporation | Virtual reality interaction method, apparatus and system |
CN106709422A (zh) * | 2016-11-16 | 2017-05-24 | 南京亿猫信息技术有限公司 | 超市购物车手部识别方法及其识别*** |
CN109934075A (zh) * | 2017-12-19 | 2019-06-25 | 杭州海康威视数字技术股份有限公司 | 异常事件检测方法、装置、***及电子设备 |
CN110991261A (zh) * | 2019-11-12 | 2020-04-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516734A (zh) * | 2021-07-05 | 2021-10-19 | 西湖大学 | 一种基于自上而下深度学习架构的昆虫关键点自动标注方法及应用 |
CN114758239A (zh) * | 2022-04-22 | 2022-07-15 | 安徽工业大学科技园有限公司 | 基于机器视觉的物品飞离预定行进线路的监测方法和*** |
CN114758239B (zh) * | 2022-04-22 | 2024-06-04 | 安徽工业大学科技园有限公司 | 基于机器视觉的物品飞离预定行进线路的监测方法和*** |
CN116862980A (zh) * | 2023-06-12 | 2023-10-10 | 上海玉贲智能科技有限公司 | 图像边缘的目标检测框位置优化校正方法、***、介质及终端 |
CN116862980B (zh) * | 2023-06-12 | 2024-01-23 | 上海玉贲智能科技有限公司 | 图像边缘的目标检测框位置优化校正方法、***、介质及终端 |
Also Published As
Publication number | Publication date |
---|---|
CA3160731A1 (en) | 2021-05-20 |
CN110991261A (zh) | 2020-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021093329A1 (zh) | 交互行为识别方法、装置、计算机设备和存储介质 | |
KR102261061B1 (ko) | 컨볼루션 신경망을 이용하여 poi 변화를 검출하기 위한 시스템 및 방법 | |
Wei et al. | A vision and learning-based indoor localization and semantic mapping framework for facility operations and management | |
TWI773797B (zh) | 用以追蹤真實空間之區域中的多關節主體之系統,方法及電腦程式產品 | |
JP6397144B2 (ja) | 画像からの事業発見 | |
CN105518744B (zh) | 行人再识别方法及设备 | |
US9014467B2 (en) | Image processing method and image processing device | |
CN104573706B (zh) | 一种物体图像识别方法及其*** | |
CN108304757A (zh) | 身份识别方法及装置 | |
US11093886B2 (en) | Methods for real-time skill assessment of multi-step tasks performed by hand movements using a video camera | |
WO2017088804A1 (zh) | 人脸图像中检测眼镜佩戴的方法及装置 | |
US20190122027A1 (en) | Processing uncertain content in a computer graphics system | |
US11113571B2 (en) | Target object position prediction and motion tracking | |
CN109522790A (zh) | 人体属性识别方法、装置、存储介质及电子设备 | |
De Beugher et al. | Automatic analysis of in-the-wild mobile eye-tracking experiments using object, face and person detection | |
TW201246089A (en) | Method for setting dynamic environmental image borders and method for instantly determining the content of staff member activities | |
Unzueta et al. | Efficient generic face model fitting to images and videos | |
TWI420440B (zh) | 物品展示系統及方法 | |
JP6331270B2 (ja) | 情報処理システム、情報処理方法及びプログラム | |
US9924865B2 (en) | Apparatus and method for estimating gaze from un-calibrated eye measurement points | |
US20230091536A1 (en) | Camera Placement Guidance | |
JP2018081452A (ja) | 画像処理装置、画像処理方法 | |
CN113887384B (zh) | 基于多轨迹融合的行人轨迹分析方法、装置、设备及介质 | |
Yang et al. | A dense flow-based framework for real-time object registration under compound motion | |
Yang et al. | Simultaneous active camera array focus plane estimation and occluded moving object imaging |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20887643 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3160731 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20887643 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20887643 Country of ref document: EP Kind code of ref document: A1 |