WO2023138445A1

WO2023138445A1 - Detection methods and devices for detecting if person has fallen and pick-up or put-back behavior of person

Info

Publication number: WO2023138445A1
Application number: PCT/CN2023/071657
Authority: WO
Inventors: 萨里尼约瑟夫
Original assignee: 索尼半导体解决方案公司; 萨里尼约瑟夫
Priority date: 2022-01-24
Filing date: 2023-01-10
Publication date: 2023-07-27
Also published as: CN116524584A

Abstract

The present invention discloses a detection method and device for detecting if a person has fallen on premises and a detection method and device for detecting a pick-up or put-back behavior of a person in front of a shelf. The method for detecting if a person has fallen comprises the steps: calibrating cameras, so as to have a suitable vertical vector in a field of view of each camera; obtaining image data comprising a person on premises by means of least a portion of the cameras, and extracting from the image data key point data of the skeleton of the person; estimating a personal verticality of the person by means of using the key point data; calculating vertical angles of the person on the basis of the personal verticality and the vertical vectors in the field of view of at least a portion of the cameras; aggregating all the vertical angles of the person in the fields of view of at least a portion of the cameras at some moment to obtain a final vertical angle of the person; and determining whether the person has fallen down on the basis of a fall score calculated from the final vertical angle. The right to privacy of a customer can be protected, and detection can be accurately and quickly performed.

Description

用于检测人摔倒、人的拾取或放回行为的检测方法和设备Detection method and device for detecting human fall, human pick-up or put-back behavior

相关申请的引用References to related applications

本申请要求于2022年01月24日向中华人民共和国国家知识产权局提交的第202210078571.1号中国专利申请的权益，在此将其全部内容以援引的方式整体并入本文中。This application claims the rights and interests of the Chinese patent application No. 202210078571.1 submitted to the State Intellectual Property Office of the People's Republic of China on January 24, 2022, the entire contents of which are hereby incorporated herein by reference.

技术领域technical field

本发明涉及计算机视觉中的人类行为检测和识别。具体地，本发明涉及能够准确地检测和识别人的摔倒、人的拾取或放回行为的检测方法和设备。The present invention relates to human action detection and recognition in computer vision. In particular, the present invention relates to a detection method and device capable of accurately detecting and identifying a person's fall, and a person's pick-up or put-back behavior.

背景技术Background technique

作为一门研究如何使机器通过“看”去理解世界的学科，计算机视觉技术是目前人工智能领域最热门的研究领域之一。具体来说，通过相机与计算单元的结合，在一定场景下机器视觉***能够代替人眼对目标进行识别、跟踪和测量等工作。在计算机视觉技术的诸多应用领域之中，基于计算机视觉技术的人类行为的检测和识别是非常重要的一个方面，并且被应用于例如视频监控、群体互动、消费者行为识别等等场景。As a discipline that studies how to make machines understand the world by "seeing", computer vision technology is currently one of the hottest research fields in the field of artificial intelligence. Specifically, through the combination of cameras and computing units, machine vision systems can replace human eyes in identifying, tracking, and measuring targets in certain scenarios. Among the many application fields of computer vision technology, the detection and recognition of human behavior based on computer vision technology is a very important aspect, and it is applied to scenarios such as video surveillance, group interaction, consumer behavior recognition and so on.

尤其是近年来，随着无人超市、自助零售商店等智能零售场景模式的普及，大量的研究都关注于如何检测并识别无人零售环境中的人(即，顾客)的行为。在当前典型的无人零售环境(例如，小型无人超市)中，顾客在通过身份识别入场后，往往随意地在场所中移动并随机地在货架前拾取或放回货品，然后在无购物或经确认已自助结账的情况下离场。因此，通过计算机视觉检测并识别出顾客在购物场所中的行为，尤其是货架前对于商品的拾取和放回行为，不仅对于追踪并确认顾客的最终购买行为，更加快速地确定顾客的消费金额，提升顾客的消费体验有着非常直接的帮助，而且能够对无人零售环境中的货架布置、货品摆放等工作提供非常有价值的参考信息。另外，由于无人超市等购物场所往往为人流量较少的封闭空间，因此基于最基本的安全需求，还要求能够通过计算机视觉及时地发现在其中的顾客的例如突然倒地等具有潜在安全风险的行为。Especially in recent years, with the popularity of smart retail scenarios such as unmanned supermarkets and self-service retail stores, a large number of studies have focused on how to detect and identify the behavior of people (ie, customers) in unmanned retail environments. In the current typical unmanned retail environment (for example, a small unmanned supermarket), customers often move randomly in the venue after entering the venue through identification, randomly pick up or put back goods in front of the shelves, and then leave the venue without shopping or confirmed that they have checked out by themselves. Therefore, through computer vision detection and recognition of customer behavior in shopping places, especially the behavior of picking up and putting back goods in front of the shelf, it is not only helpful to track and confirm the final purchase behavior of customers, determine the amount of consumption of customers more quickly, and improve the consumption experience of customers, but also provide very valuable reference information for shelf layout and product placement in unmanned retail environments. In addition, since shopping places such as unmanned supermarkets are often closed spaces with less traffic, based on the most basic security requirements, it is also required to be able to detect potential safety risks of customers in them, such as sudden falls, through computer vision.

目前，用于检测并识别顾客上述行为的最佳方法是创建一个自动***，该***通常具有RGB摄像机和智能信息处理设备。该***首先通过为垂直安装的RGB摄像机检测到的每名顾客分配一个唯一的ID，跟踪他们在商店内的活动，并且检测他们与货架的互动。接着进行分析和分类交互，记录顾客在商店内的活动，指示商品是否已被顾客从货架取出或者取出后的商品是否又被放回原处。由此，***能够识别货架前顾客的行为活动。Currently, the best way to detect and recognize the above behavior of customers is to create an automatic system, usually with RGB cameras and intelligent information processing equipment. The system begins by assigning a unique ID to each customer detected by vertically mounted RGB cameras, tracking their movements within the store and detecting their interactions with the shelves. This is followed by analysis and sorting interactions, recording customer movement within the store, indicating whether an item has been removed from the shelf by the customer or has been replaced. Thus, the system can identify the behavior of customers in front of the shelf.

发明内容Contents of the invention

要解决的技术问题technical problem to be solved

然而，当前应用于无人超市等智能零售场景的现有的顾客行为检测和识别的技术方案通常都需要RGB摄像机拍摄大量的图像视频文件，并涉及进行基于这些图像视频文件的大量的文件传输和处理。因此，导致整个***的数据处理速度较为缓慢。另外，如上所述，这些方法往往需要为每名顾客分配ID，而这样的处理通常是基于面部识别完成的。在未获得顾客授权的情况下，这样的信息收集和处理会有侵犯顾客隐私权的风险。However, the existing technical solutions for customer behavior detection and recognition currently applied to smart retail scenarios such as unmanned supermarkets usually require RGB cameras to capture a large number of image and video files, and involve a large number of file transfers and processing based on these image and video files. Therefore, the data processing speed of the whole system is relatively slow. Also, as mentioned above, these methods often require assigning an ID to each customer, which is usually done based on facial recognition. Without the customer's authorization, such information collection and processing may infringe the customer's right to privacy.

鉴于上述问题，本发明期望提供一种能够在保护顾客隐私权的前提下准确、快速地检测和识别顾客的摔倒、顾客的拾取或放回行为的检测方法和设备。In view of the above problems, the present invention expects to provide a detection method and device that can accurately and quickly detect and identify customers' falls, customers' pick-up or put-back behaviors under the premise of protecting customers' privacy.

技术方案Technical solutions

本发明所要解决的技术问题是通过如下技术方案实现的。The technical problem to be solved by the present invention is achieved through the following technical solutions.

根据本发明的第一实施例，提供了一种用于检测分布有相机的场所中的人的摔倒的检测方法，包括如下步骤：对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；通过至少一部分所述相机获得包含所述场所中的人的图像数据，并且从所述图像数据中提取出所述人的骨架的关键点的数据；通过使用所述关键点数据估算所述人的个人垂直度；针对所述至少一部分相机中的各者，基于相应的相机的所述视场中的所述垂直向量和所述个人垂直度计算所述人的垂直角度；通过对所述人在某一时刻在所述至少一部分相机的各所述视场中的所有所述垂直角度进行聚合，获得所述人的最终垂直角度；基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。According to a first embodiment of the present invention, there is provided a detection method for detecting a fall of a person in a place where cameras are distributed, comprising the steps of: calibrating all the cameras so that there is an appropriate vertical vector in the field of view of each of the cameras; obtaining image data containing people in the place through at least a part of the cameras, and extracting data of key points of the person's skeleton from the image data; estimating the personal verticality of the person by using the key point data; The vertical vector in the field and the personal verticality calculate the vertical angle of the person; by aggregating all the vertical angles of the person in each of the field of view of the at least a part of the cameras at a certain moment, the final vertical angle of the person is obtained; based on the falling score obtained from the final vertical angle, it is determined whether the person has fallen.

根据本发明的第一实施例，还提供了一种用于检测场所内的人的摔倒的检测设备，包括：多个相机，所述多个相机分布在所述场所内并且具有不同的视场，所述多个相机能够获得包含所述场所中的人的图像数据；处理单元，所述处理单元对所述多个相机获得的所述图像数据进行处理，以判定所述场所内的所述人是否摔倒。所述处理单元包括：标定模块，对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；数据处理模块，对从所述多个相机传输来的所述图像数据进行处理，从而获得所述人在所述多个相机之中的至少一部分相机的所述视场中的个人垂直度；计算模块，基于从所述标定模块发送来的所述垂直向量和从所述数据处理模块发送来的所述个人垂直度计算所述人的最终垂直角度；以及判定模块，基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。According to the first embodiment of the present invention, there is also provided a detection device for detecting a fall of a person in a place, including: a plurality of cameras, the plurality of cameras are distributed in the place and have different fields of view, and the plurality of cameras can obtain image data including the person in the place; a processing unit, the processing unit processes the image data obtained by the plurality of cameras to determine whether the person in the place has fallen. The processing unit includes: a calibration module, which calibrates all the cameras so that each camera has an appropriate vertical vector in the field of view; a data processing module, which processes the image data transmitted from the plurality of cameras, so as to obtain the personal perpendicularity of the person in the field of view of at least a part of the cameras; a calculation module, which calculates the final vertical angle of the person based on the vertical vector sent from the calibration module and the personal vertical degree sent from the data processing module; Determine whether the person has fallen or not based on the falling score obtained.

根据本发明的第一实施例，还提供了一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施前述的根据本发明的第一实施例的检测方法。According to the first embodiment of the present invention, there is also provided a storage medium on which a computer-readable program is stored, and when the program is executed on a processor, the aforementioned detection method according to the first embodiment of the present invention is implemented.

根据本发明的第二实施例，提供了一种用于检测人在货架前的拾取或放回行为的检测方法，包括如下步骤：从图像数据中获取所述人的骨架的包含手部关键点的多个关键点的数据，并且从所述图像数据提取所述货架的外轮廓线，其中，所述外轮廓线包括所述货架的外部多边形以及与所述货架的真实外轮廓相对应的内部多边形，所述外部多边形在所述内部多边形的外部的接近区域中；在所述人的至少一只手的所述手部关键点被检测到进入所述外部多边形的情况下，针对所述人的进入所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的进入物品检测；在所述人的至少一只手的所述手部关键点被检测到退出所述外部多边形的情况下，针对所述人的退出所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的退出物品检测；基于所述进入物品检测的结果和所述退出物品检测的结果来判定所述人在所述货架前的所述拾取或放回行为。优选地，在执行所述进入物品检测与进行所述退出物品检测之间还包括步骤：针对所述人的进入所述外部多边形的每只手，分别记录所述手部关键点附近的所述物品在所述外部多边形与所述内部多边形之间的轨迹。According to a second embodiment of the present invention, there is provided a detection method for detecting a person's picking or putting back behavior in front of a shelf, comprising the following steps: acquiring data of a plurality of key points of a skeleton of the person including hand key points from image data, and extracting the outer contour line of the shelf from the image data, wherein the outer contour line includes an outer polygon of the shelf and an inner polygon corresponding to the real outer contour of the shelf, the outer polygon is in a proximity area outside the inner polygon; In the case that the key points of the hand of at least one hand of the person are detected to enter the outer polygon, for each hand of the person that enters the outer polygon, an incoming object detection for detecting objects near the key points of the hand is respectively performed; in the case where the key points of the hand of the at least one hand of the person are detected to exit the outer polygon, for each hand of the person that exits the outer polygon, Exit object detection for detecting objects near the key points of the hand is respectively performed; based on the result of the incoming object detection and the The pick-up or put-back behavior of the person in front of the shelf is determined by the result of the exit item detection. Preferably, between performing the detection of the entering object and the detection of the exiting object, a step is further included: for each hand of the person entering the outer polygon, respectively record the trajectory of the object near the key point of the hand between the outer polygon and the inner polygon.

根据本发明的第二实施例，还提供了一种用于检测人在货架前的拾取或放回行为的检测设备，包括：至少一台相机或图像传感器，用于获取图像数据；处理单元，所述处理单元根据前述的用于检测人在货架前的拾取或放回行为的检测方法对所述图像数据进行处理。According to the second embodiment of the present invention, there is also provided a detection device for detecting a person's picking up or putting back behavior in front of a shelf, including: at least one camera or image sensor for acquiring image data; a processing unit, which processes the image data according to the aforementioned detection method for detecting a person's picking up or putting back behavior in front of a shelf.

根据本发明的第二实施例，还提供了一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施前述的根据本发明第二实施例的检测方法。According to the second embodiment of the present invention, there is also provided a storage medium on which a computer-readable program is stored, and when the program is executed on a processor, the aforementioned detection method according to the second embodiment of the present invention is implemented.

技术效果technical effect

根据本发明，由于在获取视频或图片数据后就立即对这些数据进行了简化提取处理，在后续的所有步骤中均只需要传输和处理提取出的图片中的关键点的数据，因此大大减少了数据处理量，能够快速准确地实现顾客的行为的检测和识别。此外，由于不会对顾客进行面部识别，并且在经过最初的数据简化提取处理后，存储和传输的数据中将只包含顾客骨架关键点的相关信息，因此保护了顾客隐私权。According to the present invention, since the video or picture data is immediately simplified and extracted, in all subsequent steps, it is only necessary to transmit and process the data of the key points in the extracted picture, thus greatly reducing the amount of data processing, and can quickly and accurately realize the detection and identification of customer behavior. In addition, customers' privacy is protected because customers' faces will not be recognized, and after the initial data simplification extraction process, the stored and transmitted data will only contain relevant information on key points of customers' skeletons.

应当理解，本发明的有益效果不限于上述效果，而可以是本文中说明的任何有益效果。It should be understood that the beneficial effects of the present invention are not limited to the above effects, but may be any beneficial effects described herein.

附图说明Description of drawings

构成本申请的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings constituting a part of the present application are used to provide a further understanding of the present invention, and the schematic embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:

图1是图示了根据本发明第一实施例的用于检测和识别顾客的摔倒的方法的主要步骤的框图；1 is a block diagram illustrating the main steps of a method for detecting and identifying falls of a customer according to a first embodiment of the present invention;

图2是图示了根据本发明第一实施例的相机的标定的示意图；2 is a schematic diagram illustrating calibration of a camera according to a first embodiment of the present invention;

图3是图示了根据本发明第一实施例的从图像数据中提取的顾客的骨架的关键点的示意图；3 is a schematic diagram illustrating key points of a skeleton of a customer extracted from image data according to a first embodiment of the present invention;

图4是图示了根据本发明第一实施例的在相机的拍摄视场中的顾客的个人垂直度的示意图；4 is a schematic diagram illustrating a customer's personal verticality in a photographing field of view of a camera according to a first embodiment of the present invention;

图5的a和b分别图示了根据本发明第一实施例的在不同的拍摄视场中的顾客的个人垂直度以及垂直角度的示意图；a and b of Fig. 5 respectively illustrate the schematic diagrams of personal verticality and vertical angle of customers in different shooting fields of view according to the first embodiment of the present invention;

图6的a至e分别图示了根据本发明第一实施例的在相机的拍摄视场中当顾客站立时的垂直角度的示例；a to e of FIG. 6 respectively illustrate examples of vertical angles when a customer is standing in the shooting field of view of the camera according to the first embodiment of the present invention;

图7的a至d分别图示了根据本发明第一实施例的当同一顾客摔倒时在不同相机的拍摄视场中的不同垂直角度的示意图；a to d of FIG. 7 respectively illustrate schematic diagrams of different vertical angles in the shooting fields of view of different cameras when the same customer falls according to the first embodiment of the present invention;

图8图示了根据本发明第一实施例的最终垂直角度与摔倒分值的转关系的示意图；8 illustrates a schematic diagram of the relationship between the final vertical angle and the fall score according to the first embodiment of the present invention;

图9的a和b图示了根据本发明第一实施例的顾客的最终垂直角度和摔倒分值的转换以及顾客是否摔倒的判定的示例图；A and b of Fig. 9 illustrate the conversion of the customer's final vertical angle and fall score and the judgment of whether the customer has fallen according to the first embodiment of the present invention;

图10是图示了根据本发明第一实施例的用于检测和识别顾客的摔倒的设备的示意性框图；10 is a schematic block diagram illustrating an apparatus for detecting and identifying a fall of a customer according to a first embodiment of the present invention;

图11是图示了根据本发明第二实施例的用于检测和识别顾客的拾取或放回行为的方法的主要步骤的示意性框图；11 is a schematic block diagram illustrating the main steps of a method for detecting and identifying customer pick-up or put-back behaviors according to a second embodiment of the present invention;

图12是图示了顾客在货架前的主要拾取或放回行为的分类示意图；Fig. 12 is a classification schematic diagram illustrating the main pick-up or put-back behaviors of customers in front of shelves;

图13的a和b图示了在根据本发明的第二实施例的检测方法中的物品存在的判定条件的示例。a and b of Fig. 13 illustrate examples of determination conditions for the presence of an item in the detection method according to the second embodiment of the present invention.

图14图示了在根据本发明的第二实施例的检测方法中的物品的轨迹比较的示意图；Fig. 14 illustrates a schematic diagram of trajectory comparison of items in the detection method according to the second embodiment of the present invention;

图15图示了在根据本发明的第二实施例的检测方法中的第一个FSM运行时的状态判定的示例性流程图；Fig. 15 illustrates an exemplary flow chart of state determination when the first FSM is running in the detection method according to the second embodiment of the present invention;

图16图示了在根据本发明的第二实施例的检测方法中的第二个FSM运行时的状态判定的示例性流程图；Fig. 16 illustrates an exemplary flow chart of state determination during the operation of the second FSM in the detection method according to the second embodiment of the present invention;

图17图示了在根据本发明的第二实施例的检测方法中的仅有一个FSM运行的情况下获得的FSM的状态表；Fig. 17 illustrates the state table of the FSM obtained under the condition that only one FSM operates in the detection method according to the second embodiment of the present invention;

图18中的a和b图示了在根据本发明的第二实施例的检测方法中的两个FSM运行的情况下获得的FSM的状态表；A and b in Fig. 18 illustrate the state table of the FSM obtained under the situation that two FSMs in the detection method according to the second embodiment of the present invention operate;

图19图示了在根据本发明的第二实施例的检测方法中的将图18中所示的两个FSM状态表相结合后的最终状态表；Fig. 19 illustrates the final state table after combining the two FSM state tables shown in Fig. 18 in the detection method according to the second embodiment of the present invention;

图20图示了在根据本发明的第二实施例的检测方法中的考虑基于多组图像数据的检测结果时的物品存在的判定条件的示例。FIG. 20 illustrates an example of determination conditions for the presence of an item in consideration of detection results based on multiple sets of image data in the detection method according to the second embodiment of the present invention.

具体实施方式Detailed ways

下面，将参照附图详细说明根据本发明的各具体实施例。需要强调的是，附图中的所有尺寸、形状、位置关系等仅是示意性的，并且为了图示方便而不一定是按照真实比例图示的，因而不具有限定性。此外，在以下所述的实施方案中，相同的部件、配置和步骤等由相同的附图标记表示，并且将省略对它们的重复说明。Hereinafter, specific embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be emphasized that all dimensions, shapes, positional relationships, etc. in the drawings are only schematic, and are not necessarily shown in true scale for the convenience of illustration, and thus are not limiting. Furthermore, in the embodiments described below, the same components, configurations, steps, and the like are denoted by the same reference numerals, and their repeated descriptions will be omitted.

此外，将根据以下项的顺序来说明本发明。In addition, the present invention will be described according to the order of the following items.

1、第一实施例(用于检测和识别顾客的摔倒的方法和设备)1. First Embodiment (Method and Device for Detecting and Recognizing a Customer's Fall)

2、第二实施例(用于检测和识别顾客在货架前的拾取或放回行为的方法和设备)2. The second embodiment (method and device for detecting and identifying customers' pick-up or put-back behavior in front of shelves)

1、第一实施例1. The first embodiment

1.1、用于检测和识别顾客的摔倒的方法和设备的概述1.1. Overview of Methods and Devices for Detecting and Recognizing Customer Falls

首先，将说明根据本发明第一实施例的用于检测和识别顾客的摔倒的方法和设备的概况。图1示出了根据本发明第一实施例的用于检测和识别顾客的摔倒的方法的主要步骤。First, an overview of a method and apparatus for detecting and recognizing a fall of a customer according to a first embodiment of the present invention will be explained. Fig. 1 shows the main steps of a method for detecting and recognizing a fall of a customer according to a first embodiment of the invention.

在诸如无人超市、自助服务商店等无人智能售卖环境中，由于没有服务人员巡场和值守，所以当顾客在这样的环境中例如因突发疾病或意外事故而摔倒时，就存在着因无法及时发现而耽误救治的隐患。为了解决上述问题，根据本发明第一实施例，在这样的无人售卖环境中分布有能够拍摄获取环境的图像数据的相机。为了能够准确地识别出环境中顾客的姿态，首先需要对所有的相机进行标定处理，以使每一台相机都能够真实地反映出各自的拍摄视场中的垂直向量。经过标定后的相机获取无人售卖环境的环境图像数据，并对环境的图像数据执行数据提取处理。当提取到的环境图像数据中存在顾客图像时，则能够通过数据提取处理将图像数据中的顾客图像简化为骨架(skeleton)图像，进而提取出的顾客的关键点数据信息。后续的处理都是基于这些不会透露顾客隐私信息的关键点数据信息进行的。随后，通过使用这些关键点信息估算出顾客在各个相机的拍摄视场中的个人垂直度。容易理解地，这样的个人垂直度例如可以通过关键点数据中从代表顾客足部的关键点到代表顾客的头部的关键点的向量来表示。然后，针对各个相机的拍摄视场，基于在前面步骤中求出的垂直向量和个人垂直度计算出各拍摄视场中的顾客的垂直角。容易理解地，由于各个相机的拍摄视角不同，因此需要从不同拍摄视场内的垂直角中选择出最能够反映顾客真实身体姿态的垂直角。换言之，需要对在某时刻由各个相机拍摄的垂直角进行数据聚合，并从中获得最终的能够反映顾客真实身体姿态的垂直角。显然，当人站立时和摔倒在地时，垂直角显然存在巨大差异。因此，随后就能够基于由垂直角得出的摔倒分值来确定被拍摄到的顾客是否摔倒。In unmanned smart sales environments such as unmanned supermarkets, self-service stores, etc., because there are no service personnel to patrol and be on duty, when customers fall in such an environment, for example due to sudden illness or accident, there is a hidden danger of delaying treatment due to failure to find out in time. In order to solve the above problems, according to the first embodiment of the present invention, cameras capable of capturing image data of the acquisition environment are distributed in such an unattended vending environment. In order to accurately recognize the customer's posture in the environment, it is first necessary to calibrate all the cameras so that each camera can truly reflect the vertical vector in its respective field of view. The calibrated camera acquires the environmental image data of the unmanned vending environment, and performs data extraction processing on the environmental image data. When there is a customer image in the extracted environment image data, the customer image in the image data can be simplified into a skeleton image through data extraction processing, and then key point data information of the customer can be extracted. Subsequent processing is based on these key point data information that will not disclose customer privacy information. Then, the customer's personal verticality in the field of view of each camera is estimated by using these key point information. It is easy to understand that such personal verticality can be represented, for example, by a vector in the keypoint data from a keypoint representing the customer's feet to a keypoint representing the customer's head. Then, for the imaging field of view of each camera, the vertical angle of the customer in each imaging field of view is calculated based on the vertical vector obtained in the previous step and the personal perpendicularity. It is easy to understand that since the shooting angles of each camera are different, it is necessary to select the vertical angle that can best reflect the customer's real body posture from the vertical angles in different shooting fields of view. In other words, it is necessary to aggregate the vertical angles captured by various cameras at a certain moment, and obtain the final vertical angle that can reflect the customer's real body posture. Apparently there is a huge difference in the vertical angle when the person is standing and when they are on the ground. Therefore, it is then possible to determine whether or not the photographed customer has fallen based on the fall score derived from the vertical angle.

上述的根据本发明第一实施例的用于检测和识别顾客的摔倒的方法中的除了获取环境的图像数据的步骤是由各相机执行之外，其余的各步骤的处理均可通过与各相机通信连接的中央处理器等处理器或数据处理芯片执行。或者，优选地，这些步骤中的处理也可以由集成在各相机中的具有AI处理功能的传感器执行。这样的传感器兼具数据处理能力和数据存储能力，无需额外的硬件即可执行机器学习驱动的计算机视觉处理任务。In the above-mentioned method for detecting and identifying a customer's fall according to the first embodiment of the present invention, except that the step of acquiring the image data of the environment is performed by each camera, the processing of the remaining steps can be performed by a processor such as a central processing unit or a data processing chip that is communicatively connected with each camera. Alternatively, preferably, the processing in these steps may also be performed by a sensor with an AI processing function integrated in each camera. Such sensors combine data processing and data storage capabilities to perform machine learning-driven computer vision processing tasks without the need for additional hardware.

在下文中，将详细说明根据本发明第一实施例的用于检测和识别顾客的摔倒的方法中的各个步骤中的处理。Hereinafter, the processing in each step in the method for detecting and recognizing a customer's fall according to the first embodiment of the present invention will be explained in detail.

1.2、相机的标定1.2. Calibration of the camera

图2示出了根据本发明第一实施例的用于检测和识别顾客的摔倒的方法中的相机的标定处理的示意图。Fig. 2 shows a schematic diagram of the camera calibration process in the method for detecting and identifying falls of customers according to the first embodiment of the present invention.

由于相机往往以不同的角度分布在无人超市等场所中的不同位置处，因此，如图2中的短实线所示，在各个相机的拍摄视场中，垂直向量的分布状态取决于拍摄相机的拍摄角度和图像中的物体所处的位置。换言之，在不同的相机的拍摄视场中，垂直向量的分布状态通常是不相同的。因而，需要对各相机进行垂直向量的标定处理，以使各相机能够真实地反映出所拍摄的视场中不同位置处的垂直向量方向，从而能够在后续的步骤中为判断拍摄到的顾客是否站立提供基准。Since the cameras are often distributed at different positions in unmanned supermarkets and other places at different angles, as shown by the short solid line in Figure 2, in the field of view of each camera, the distribution state of the vertical vector depends on the shooting angle of the camera and the location of the object in the image. In other words, in the shooting fields of view of different cameras, the distribution states of the vertical vectors are usually different. Therefore, it is necessary to calibrate the vertical vector of each camera, so that each camera can truly reflect the direction of the vertical vector at different positions in the captured field of view, so as to provide a benchmark for judging whether the captured customer is standing in the subsequent steps.

相机的标定(calibration)简单来说就是从世界坐标系转换为相机坐标系，再由相机坐标系转换为图像坐标系的过程。换言之，也就是求出相机的最终的投影矩阵的过程。具体而言，世界坐标系(world coordinate system)是指用户定义的三维世界的坐标系，为了描述目标物在真实世界里的位置而被引入；相机坐标系(camera coordinate system)是指在相机上建立的坐标系，为了从相机的角度描述物***置而定义，作为沟通世界坐标系和图像/像素坐标系的中间一环；图像坐标系(image coordinate system)是指为了描述成像过程中物体从相机坐标系到图像坐标系的投影透射关系而引入，方便进一步得到像素坐标系下的坐标。从世界坐标系到相机坐标系的转换是从三维点到三维点的转换，需要使用旋转矩阵R和平移矢量t等相机外参(camera extrinsic information)。从相机坐标系到图像坐标系的转换是从三维点到二维点的转换，需要使用焦距、像主点坐标、畸变参数等相机内参。例如，可以通过使用棋盘格图片等作为标定图片，并利用OpenGL、OpenGL等软件工具中的相机标定功能实现上述的相机的标定。Camera calibration is simply the process of converting from the world coordinate system to the camera coordinate system, and then converting from the camera coordinate system to the image coordinate system. In other words, it is the process of finding the final projection matrix of the camera. Specifically, the world coordinate system (world coordinate system) refers to the coordinate system of the three-dimensional world defined by the user, which is introduced to describe the position of the target object in the real world; the camera coordinate system (camera coordinate system) refers to the coordinate system established on the camera, which is defined in order to describe the position of the object from the perspective of the camera, as an intermediate link between the world coordinate system and the image/pixel coordinate system; The projection-transmission relationship is introduced to facilitate further obtaining the coordinates in the pixel coordinate system. The conversion from the world coordinate system to the camera coordinate system is a conversion from a three-dimensional point to a three-dimensional point, which requires the use of camera extrinsic information such as the rotation matrix R and the translation vector t. The transformation from the camera coordinate system to the image coordinate system is the transformation from a 3D point to a 2D point, which requires the use of internal camera parameters such as focal length, image principal point coordinates, and distortion parameters. For example, the above-mentioned camera calibration can be realized by using a checkerboard picture or the like as a calibration picture, and using a camera calibration function in software tools such as OpenGL and OpenGL.

1.3、顾客的骨架及其关键点数据的提取1.3. Customer skeleton and key point data extraction

在根据本发明第一实施例的用于检测和识别顾客的摔倒的方法中，在经过标定的相机拍摄到图像数据之后，在进行后续的处理之前，会对图像数据进行数据提取处理。在拍摄到顾客的情况下，这样的数据提取处理会将原始的图像数据转换为仅包含图像中所需的诸如顾客的骨架信息等简化图像数据，从而极大地减少了后续数据传输和处理的数据量，并且避免了侵犯顾客隐私权的风险。需要说明的是，现有技术中已经存在大量的用于从多种类型的包含人物图像的图像数据中提取人体骨架特征点的技术，本文中为了突出说明重点，在此就不再进行详述。可以采用诸如Hrnet等任何合适的已知的专用软件和算法来进行本步骤中的人体骨架及其关键点的数据提取。例如，如图3所示，可以提取出包括人的眼部、耳朵、鼻子、肩部、肘部、腕部、胯部、膝部、脚踝等部位在内的17个部位的图像数据作为人体骨架的关键点信息。In the method for detecting and identifying a customer's fall according to the first embodiment of the present invention, after the image data is captured by the calibrated camera, data extraction processing is performed on the image data before subsequent processing. In the case of photographing a customer, such data extraction process will convert the original image data into simplified image data that only contains the required information in the image, such as the customer's skeleton information, thereby greatly reducing the amount of data for subsequent data transmission and processing, and avoiding the risk of violating the customer's privacy. It should be noted that there are already a large number of technologies for extracting human body skeleton feature points from various types of image data including person images in the prior art. In order to highlight the key points of the description, details will not be described here. Any suitable known special software and algorithm such as Hrnet can be used to extract the data of the human skeleton and its key points in this step. For example, as shown in Figure 3, the image data of 17 parts including human eyes, ears, nose, shoulders, elbows, wrists, crotch, knees, ankles and other parts can be extracted as the key point information of the human skeleton.

需要特别说明的是，在用于图示本实施例的附图中，在示出提取出的顾客骨架关键点信息之外，为了便于理解和说明，也示出了完整的图片信息。但在实际的使用过程中，在提取步骤之后的各步骤中，均不需要传输、存储和处理完整的图片或视频数据。It should be noted that in the drawings used to illustrate this embodiment, in addition to showing the extracted key point information of the customer skeleton, complete picture information is also shown for ease of understanding and explanation. However, in actual use, there is no need to transmit, store and process complete picture or video data in each step after the extraction step.

1.4、顾客的个人垂直度的估算1.4. Estimation of Customer's Personal Perpendicularity

图4示出了基于提取出的顾客的人体骨骼关键点信息来获得顾客在相应的相机拍摄视场中的个人垂直度的示意图。FIG. 4 shows a schematic diagram of obtaining the customer's personal verticality in the field of view of the corresponding camera based on the extracted key point information of the customer's human skeleton.

如图4中的箭头所示，可以通过计算从表示人的脚部的关键点到表示人的头部的关键点的身体向量来获得个人垂直度信息。具体地，如上文中提到的，提取出的与人的头部相关的关键点例如可以包括人的眼部、耳朵、鼻子等部位的关键点；提取出的与人的脚部相关的关键点例如可以包括人的踝部等部位的关键点。另外，由于不同的相机具有不同的拍摄视场，可以想象的是，处于某一特定位置和姿态下的人对于不同的相机而言，其可拍摄到的关键点可能是不同的。因此，可以使用对于相应的相机而言是可见的顾客的眼部、耳朵、鼻子的关键点信息的平均值来代表在该相机拍摄的图片中的顾客的头部位置信息，使用对于相应的相机而言是可见的顾客的踝部的关键点信息的平均值来代表在该相机拍摄的图片中的顾客的脚部位置信息。例如，图5中的a示出了在某相机的拍摄视场中，顾客的一个踝部是不可见的，因此仅使用可见的那一个踝部的关键点信息来计算在该拍摄视场中在该拍摄时刻的顾客的个人垂直度。图5的b示出了在另一相机的拍摄视场中，顾客的一个踝部不可见，且眼部和鼻子也不可见。因此，使用双耳的关键点信息的平均值以及可见的那一个踝部的关键点信息来计算该拍摄视场中在该拍摄时刻的顾客的个人垂直度。As shown by the arrows in FIG. 4 , personal perpendicularity information can be obtained by calculating a body vector from a key point representing a person's feet to a key point representing a person's head. Specifically, as mentioned above, the extracted key points related to a person's head may include, for example, key points of a person's eyes, ears, nose, etc.; the extracted key points related to a person's feet may include, for example, key points of a person's ankle. In addition, since different cameras have different shooting fields of view, it is conceivable that different cameras may capture different key points of a person in a certain position and posture. Therefore, the average value of the key point information of the customer's eyes, ears, and nose that are visible to the corresponding camera can be used to represent the customer's head position information in the picture taken by the camera, and the customer's foot position information in the picture captured by the camera can be represented by the average value of the key point information of the customer's ankle that is visible to the corresponding camera. For example, a in FIG. 5 shows that in the shooting field of view of a certain camera, one ankle of the customer is invisible, so only the key point information of the visible ankle is used to calculate the personal verticality of the customer at the shooting moment in the shooting field of view. Fig. 5b shows that in the shooting field of view of another camera, one ankle of the customer is not visible, and the eyes and nose are also not visible. Therefore, the average value of the key point information of both ears and the key point information of the visible ankle is used to calculate the customer's personal verticality in the shooting field of view at the shooting moment.

此外，还可以定义置信因子c＝N _v/N _t，其中，N _v表示从某相机拍摄的图片数据中(例如，特定帧中)提取的顾客的可见的关键点数量，N _t表示定义的关键点总数。例如，可以设定当c＜0.5(也即是，当有一半以上的关键点不可见时)时，我们认为从该相机拍摄的该图片数据中提取的顾客的关键点信息不足(例如，在此时顾客并没有完全走入该相机的拍摄视场中)，该相机在此情况下是无效相机。那么，将不使用这样的无效相机所拍摄的图片中的关键点信息来计算顾客的个人垂直度。置信因子的置信阈值不限于0.5，可以根据需要而更改和设定。应当理解的是，随着顾客在环境中的移动，在不同的拍摄时刻(例如，不同帧)，某RBG相机的有效/无效状态可能会发生转变。 In addition, a confidence factor c=N _v /N _t can also be defined, wherein N _v represents the number of visible key points of customers extracted from image data taken by a certain camera (for example, in a specific frame), and N _t represents the total number of defined key points. For example, it can be set that when c<0.5 (that is, when more than half of the key points are not visible), we believe that the key point information of the customer extracted from the image data captured by the camera is insufficient (for example, at this time, the customer does not completely walk into the camera's field of view), and the camera is an invalid camera in this case. Then, the customer's personal verticality will not be calculated using the keypoint information in the picture taken by such an invalid camera. The confidence threshold of the confidence factor is not limited to 0.5, and can be changed and set as required. It should be understood that as the customer moves in the environment, the active/inactive state of a certain RBG camera may change at different shooting moments (eg, different frames).

1.5、顾客的垂直角度的确定1.5. Determination of the customer's vertical angle

如图5的a和b中的箭头所示，通过将在上一步骤中求出的表示个人垂直度的关键点向量与所在拍摄视场的对应的垂直向量进行比较，就可以获得这两个向量之间的角度α _v，作为表示顾客的姿态的垂直角度。关于顾客在不同姿态下的垂直角度，容易想象的是，如图6的a至e所示，当顾客站立时，对于所有相机而言，垂直角度α _v都接近于0。 As shown by the arrows in a and b of Figure 5, by comparing the key point vector representing the personal verticality calculated in the previous step with the corresponding vertical vector in the shooting field of view, the angle _αv between these two vectors can be obtained as the vertical angle representing the customer's posture. Regarding the vertical angle of the customer in different postures, it is easy to imagine that, as shown in a to e of Figure 6, when the customer is standing, the vertical angle α _v is close to 0 for all cameras.

然而，在顾客摔倒在地的情况下，如图7的a至d所示，由于不同相机的拍摄视场不同，基于各相机所拍摄的图片信息所求出的反映同一顾客相同姿态的垂直角度却可能存在差异。换言之，在顾客摔倒的情况下，往往每个相机所拍摄到的图片信息只能反映出顾客姿态的真实信息的一部分，因此需要对从所有有效相机拍摄的图片信息中提取出的数据进行数据聚合(data aggregation)，以获得反映真实信息的全局结果。例如，这里的数据的聚合可以定义为：However, in the case of a customer falling to the ground, as shown in a to d of Figure 7, due to the different shooting fields of view of different cameras, the vertical angles obtained based on the picture information captured by each camera to reflect the same posture of the same customer may be different. In other words, in the case of a customer falling down, often the picture information captured by each camera can only reflect a part of the real information of the customer's posture, so it is necessary to perform data aggregation on the data extracted from the picture information captured by all valid cameras to obtain a global result that reflects the real information. For example, the aggregation of the data here can be defined as:

其中，c表示有效相机。Among them, c represents the effective camera.

由上式(1)可知，可以使用基于各有效相机拍摄的图片信息而求出的垂直角度中的最大值作为反映拍摄时刻顾客的姿态的最终垂直角度。当然，可以采用其他的数据聚合方式获得最终垂直角度。例如，也可以采用基于各有效相机拍摄的图片信息而求出的垂直角度中的第二大值作为反映拍摄时刻顾客的姿态的最终垂直角度。或者，也可以采用基于各有效相机拍摄的图片信息而求出的垂直角度中的去掉最大值和最小值之后的其余垂直角度的平均值作为反映拍摄时刻顾客的姿态的最终垂直角度。It can be seen from the above formula (1) that the maximum value among the vertical angles obtained based on the image information captured by each valid camera can be used as the final vertical angle reflecting the posture of the customer at the time of shooting. Of course, other data aggregation methods can be used to obtain the final vertical angle. For example, the second largest value among the vertical angles obtained based on the picture information captured by each valid camera may be used as the final vertical angle reflecting the posture of the customer at the time of shooting. Alternatively, the average value of the remaining vertical angles after removing the maximum value and the minimum value among the vertical angles calculated based on the picture information captured by each valid camera may be used as the final vertical angle reflecting the customer's posture at the time of shooting.

1.6、基于摔倒分值确定顾客是否摔倒1.6. Determine whether the customer has fallen based on the fall score

由于在最终的判定结果中，顾客仅存在“站立”和“摔倒”两种姿态判定结果，因此，可以将最终垂直角度转换为摔倒分值作为最终判定顾客是否摔倒的基准。Since in the final judgment result, the customer only has two posture judgment results of "standing" and "falling", therefore, the final vertical angle can be converted into a fall score as the benchmark for finally judging whether the customer has fallen.

例如，可以如下定义摔倒分值。For example, a fall score may be defined as follows.

若α _v＜T _l，则s _f＝0； If α _v <T _l , then s _f =0;

若T _l＜α _v＜T _h，则

If T _l <α _v <T _h , then

若α _v＞T _h，则s _f＝1 式(2) If α _v ＞T _h , then s _f ＝1 Formula (2)

其中，T _l为设定的垂直角度的下限，T _h为设定的垂直角度的上限，s _f为摔倒分值。T _l和T _h可以根据需要而进行设定和调整，例如T _l可以设定为40度，T _h可以设定为80度。 Among them, T _l is the lower limit of the set vertical angle, _Th is the upper limit of the set vertical angle, and s _f is the fall score. T _l and T _h can be set and adjusted according to needs, for example, T _l can be set to 40 degrees, and T _h can be set to 80 degrees.

根据上面的式(2)可知，摔倒分值可以被看成是一个模糊逻辑值。其如图8所示，具有在0与1(包括端点)之间浮动变化的值，以反映“站立”和“摔倒”状态。可以根据需要，设定摔倒分值的判定阈值s _T，从而能够基于摔倒分值与判定阈值的比较，准确地对顾客的“站立”和“摔倒”状态进行判定。例如，若s _f＞s _T，则判定顾客处于“摔倒”状态；否则，则判定顾客处于“站立”状态。s _T的值可以根据安全需求等因素而进行设定。优选地，s _T的值可以被设定为0.5与0.8之间(包含端点)的值。例如，s _T的值可以设定为0.5。 According to the above formula (2), it can be seen that the fall score can be regarded as a fuzzy logic value. As shown in Figure 8, it has a value that floats between 0 and 1 (inclusive) to reflect the "standing" and "falling" states. The judgment threshold s _T of the fall score can be set as needed, so that based on the comparison between the fall score and the judgment threshold, the customer's "standing" and "fall" states can be accurately judged. For example, if s _f >s _T , it is determined that the customer is in a "falling"state; otherwise, it is determined that the customer is in a "standing" state. The value of s _T can be set according to factors such as security requirements. Preferably, the value of s _T may be set to a value between 0.5 and 0.8 (inclusive). For example, the value of s _T can be set to 0.5.

图9的a和b示出了在T _l设定为40度，T _h设定为80度，且s _T设定为0.5的情况下，基于某顾客的所有有效相机拍摄的图片信息获得的最终垂直角度α _v与摔倒分值s _f的转换关系以及最终的状态判定结果。 Figure 9a and b show the conversion relationship between the final vertical angle α _v and the fall score s _f obtained based on the picture information taken by all valid cameras of a certain customer and the final state judgment result when T _l is set to 40 degrees, T _h is set to 80 degrees, and s _T is set to 0.5.

1.7、用于检测和识别顾客的摔倒的设备的示例1.7. Example of a device for detecting and recognizing a customer's fall

图10图示了根据本发明第一实施例的用于检测和识别顾客的摔倒的设备的构成的示意性框图。例如，根据本发明第一实施例的用于检测和识别顾客的摔倒的检测设备1可以包括多个相机101和处理单元102。Fig. 10 illustrates a schematic block diagram of the composition of the device for detecting and recognizing a customer's fall according to the first embodiment of the present invention. For example, the detection device 1 for detecting and recognizing a customer's fall according to the first embodiment of the present invention may include a plurality of cameras 101 and a processing unit 102 .

在诸如无人超市等无人零售场所内可以布置有一个或多个相机101，其具有不同的视野并且能够获得包括该场所内的顾客的图像数据。相机101可以是RGB相机，也可以是RGB-D相机等其它类型的相机，以获取诸如IR图像、RGB图像或激光图像等类型的图像信息。One or more cameras 101 can be arranged in an unmanned retail place such as an unmanned supermarket, which have different fields of view and can obtain image data including customers in the place. The camera 101 may be an RGB camera, or another type of camera such as an RGB-D camera, so as to acquire image information such as an IR image, an RGB image or a laser image.

处理单元102接收由多个相机拍摄获得的图像信息，并包含多个数据处理模块。处理单元102能够通过多个组成模块执行作为软件或固件而被存储在其中的存储元件中或与其互连的存储器或数据存储器中的应用程序或例程，从而执行上文中所述的根据本发明第一实施例的用于检测和识别顾客的摔倒的方法。这里的处理单元102例如由中央处理单元(CPU)和存储元件构成。例如，处理单元102可以包括一个或多个通用处理器、控制器、现场可编程门阵列(FPGA)、图形处理单元(GPU)、专用集成电路(ASIC)或它们的组合等，作为与各个相机101数据通信的专用数据处理器或数据处理芯片。或者，处理单元102也可以是集成在各相机101中的彼此能够互相进行数据交互的具有AI处理功能的传感器。这样的传感器兼具数据处理能力和数据存储能力，无需额外的硬件即可与相机101配合执行根据本发明第一实施例的用于检测和识别顾客的摔倒的方法。The processing unit 102 receives image information captured by multiple cameras and includes multiple data processing modules. The processing unit 102 is capable of executing an application program or a routine stored as software or firmware in a storage element therein or in a memory or data storage interconnected therewith through a plurality of constituent modules, thereby performing the method for detecting and identifying a customer's fall according to the first embodiment of the present invention described above. The processing unit 102 here is constituted by, for example, a central processing unit (CPU) and a storage element. For example, the processing unit 102 may include one or more general-purpose processors, controllers, Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs), Application Specific Integrated Circuits (ASICs), or combinations thereof, as dedicated data processors or data processing chips in data communication with each camera 101. Alternatively, the processing unit 102 may also be a sensor with an AI processing function integrated in each camera 101 and capable of data interaction with each other. Such a sensor has both data processing capability and data storage capability, and can cooperate with the camera 101 to execute the method for detecting and identifying a customer's fall according to the first embodiment of the present invention without additional hardware.

例如，处理单元102包括：标定模块1021，其用于对多个相机101进行标定，以使各相机101的视野中具有正确的垂直向量；数据处理模块1022，其用于处理从所述多个相机101发送来的图像数据，以获得在多个相机101的至少一部分相机的拍摄视场中的顾客的个人垂直度；计算模块1023，用于基于从标定模块1021发送的垂直向量和从数据处理模块1022发送的个人垂直度来计算顾客在至少一部分相机的拍摄视场中的垂直角度，并对各个垂直角度进行聚合以获得顾客的最终垂直角度；和判定模块1024，用于将最终垂直角度转换为摔倒分数，并根据摔倒分数判定顾客是否摔倒。其中，数据处理模块1022可以包括：提取模块10221，其从图像数据中提取出顾客的骨架的关键点的数据；和估算模块10222，其通过使用关键点的数据估算出顾客的个人垂直度。其中，计算模块1023可以包括角度计算模块10231和聚合模块10232。For example, the processing unit 102 includes: a calibration module 1021, which is used to calibrate a plurality of cameras 101, so that there is a correct vertical vector in the field of view of each camera 101; a data processing module 1022, which is used to process the image data sent from the plurality of cameras 101, so as to obtain the personal verticality of the customer in the shooting field of view of at least a part of the cameras 101; Personal verticality to calculate the vertical angle of the customer in at least a part of the camera's field of view, and aggregate each vertical angle to obtain the customer's final vertical angle; and a decision module 1024, which is used to convert the final vertical angle into a fall score, and determine whether the customer has fallen according to the fall score. Wherein, the data processing module 1022 may include: an extraction module 10221, which extracts the key point data of the customer's skeleton from the image data; and an estimation module 10222, which estimates the customer's personal verticality by using the key point data. Wherein, the calculation module 1023 may include an angle calculation module 10231 and an aggregation module 10232 .

估算模块，通过使用来自所述提取模块的所述关键点的数据估算所述人的所述个人垂直度。an estimating module for estimating the personal verticality of the person by using the keypoint data from the extraction module.

根据本发明的第一实施例，由于在获取视频或图片数据后就立即对这些数据进行了提取关键点信息的数据简化处理，在后续的所有步骤中均只需要传输和处理提取出的人骨架的关键点数据，因此大大减少了数据处理量，能够快速、准确地识别出无人购物场所中顾客的摔倒状态和站立状态。此外，由于不需要对顾客进行面部识别，并且在经过最初的数据简化处理后，存储和传输的数据中将只包含顾客的骨架关键点数据，因此保护了顾客隐私权。According to the first embodiment of the present invention, since the data simplification process of extracting key point information is performed on these data immediately after the video or picture data is acquired, in all subsequent steps, only the key point data of the extracted human skeleton need to be transmitted and processed, thus greatly reducing the amount of data processing, and the falling state and standing state of customers in unmanned shopping places can be quickly and accurately identified. In addition, since facial recognition of customers is not required, and after the initial data simplification process, only the skeleton key point data of customers will be included in the stored and transmitted data, thus protecting customer privacy.

2、第二实施例2. The second embodiment

2.1、用于检测和识别顾客的拾取和放回行为的方法和设备的概述2.1. Overview of methods and devices for detecting and identifying customer pick-and-put behavior

首先，将说明根据本发明第二实施例的用于检测和识别顾客的拾取和放回物品行为的方法和设备的概况。图11示出了根据本发明第二实施例的用于检测和识别顾客的拾取和放回物品行为的方法的主要步骤。First, an overview of a method and apparatus for detecting and recognizing a customer's act of picking up and putting back an item according to a second embodiment of the present invention will be described. Fig. 11 shows the main steps of the method for detecting and identifying customer's behavior of picking up and putting back items according to the second embodiment of the present invention.

在诸如无人超市、自助服务商店等无人智能售卖环境中，由于没有服务人员巡场和值守，所以需要借助计算机视觉手段确定顾客对货架上陈列的商品的取放行为，这样不仅能够辅助自助结算***更快速地进行商品结算，还可以更有效地防止偷盗行为。因此，根据本发明第二实施例，提供了用于检测和识别顾客的拾取和放回行为的方法和设备。In unmanned smart sales environments such as unmanned supermarkets and self-service stores, since there are no service personnel to patrol and be on duty, it is necessary to use computer vision to determine the customer's pick-and-place behavior of the goods displayed on the shelf. This will not only assist the self-service settlement system to settle goods more quickly, but also prevent theft more effectively. Therefore, according to a second embodiment of the present invention, a method and apparatus for detecting and identifying customer pick-up and put-back behavior is provided.

首先，将概述根据本发明第二实施例的用于检测和识别顾客的拾取和放回行为的方法的主要步骤。需要说明的是，在开始执行根据本发明第二实施例的用于检测和识别顾客的拾取和放回行为的方法之前，需要获得布置有货架的购物场所的图像数据。这样的图像数据可以是由布置在无人购物场所中的至少一台相机拍摄的，也可以是已经被传输至并存储在用于执行根据本发明第二实施例的用于检测和识别顾客的拾取和放回行为的方法的诸如AI传感器等处理器中。因此，这样的图像数据获取步骤既可以被看做是根据本发明第二实施例的用于检测和识别顾客的拾取和放回行为的方法中的一个步骤，也可以被看做是根据本发明第二实施例的用于检测和识别顾客的拾取和放回行为的方法的准备步骤。First, the main steps of the method for detecting and recognizing pick-up and put-back behaviors of customers according to the second embodiment of the present invention will be outlined. It should be noted that before starting to execute the method for detecting and identifying customers' pick-up and put-back behaviors according to the second embodiment of the present invention, it is necessary to obtain image data of a shopping place where shelves are arranged. Such image data may be taken by at least one camera arranged in an unmanned shopping place, or may have been transmitted to and stored in a processor such as an AI sensor for performing the method for detecting and identifying customers' pick-up and put-back behaviors according to the second embodiment of the present invention. Therefore, such an image data acquisition step can be regarded as either a step in the method for detecting and recognizing customer's picking and putting back behavior according to the second embodiment of the present invention, or it can be regarded as a preparatory step of the method for detecting and recognizing customer's picking and putting back behavior according to the second embodiment of the present invention.

在获得图像数据后，执行从图像数据中提取后续处理所需的数据的数据提取处理。在本实施例中，这样的数据提取处理包括两个部分：货架的轮廓数据的提取和顾客的包括手部、腕部等骨架的关键点数据的提取。顾客的包括手部、腕部等骨架的关键点数据的提取与第一实施例中的信息提取步骤类似。因此，同样地，极大地减少了后续数据传输和处理的数据量，并且避免了侵犯顾客隐私权的风险。需要说明的是，现有技术中已经存在大量的用于从多种类型的包含人物图像的图像数据中提取人体骨架特征点的技术，本文中为了突出说明重点，在此就不再进行详述。可以采用诸如Hrnet等任何合适的已知的专用软件来进行本步骤中的人体骨架及其关键点的数据提取。此外，货架的轮廓数据的提取也可以采用诸如LabelMe等任何已知的适合的软件来进行。考虑到货架的实际形状，例如可以将货架图像定义并提取为包括至少两个多边形轮廓。其中，内部多边形是基于货架的真实外部轮廓提取出的多边形轮廓线，其表示货架的实际边界；处于内部多边形之外的外部多边形是用于定义内部多边形的接近区域的范围的轮廓线。After the image data is obtained, data extraction processing of extracting data required for subsequent processing from the image data is performed. In this embodiment, such data extraction process includes two parts: the extraction of the outline data of the shelf and the extraction of key point data of the customer's skeleton including hands and wrists. The extraction of key point data of the customer's skeleton including hands and wrists is similar to the steps of information extraction in the first embodiment. Therefore, likewise, the amount of data for subsequent data transmission and processing is greatly reduced, and the risk of violating the privacy rights of customers is avoided. It should be noted that there are already a large number of technologies for extracting human body skeleton feature points from various types of image data including person images in the prior art. In order to highlight the key points of the description, details will not be described here. Any suitable known special software such as Hrnet can be used to extract the data of the human skeleton and its key points in this step. In addition, the extraction of the outline data of the shelf can also be performed using any known suitable software such as LabelMe. Taking into account the actual shape of the shelf, the shelf image can eg be defined and extracted to include at least two polygonal contours. Wherein, the inner polygon is a polygon contour extracted based on the real outer contour of the shelf, which represents the actual boundary of the shelf; the outer polygon outside the inner polygon is a contour for defining the range of the approach area of the inner polygon.

此后，在检测到顾客的手的手部关键点进入到外部多边形的情况下，针对顾客的每只手分别执行手部关键点附近是否存在物品的进入物品检测，从而确定所拍摄到的货架前的顾客的手在进入货架前是否持握有物品以及在持有物品的情况下物品的种类、数量等。需要说明的是，对于手部关键点附近的物品的检测可以通过借助诸如YoloX等工具采用任何已知的适合的方法来进行。随后，在检测到顾客的手的手部关键点退出外部多边形的情况下，针对顾客的每只手再分别执行手部关键点附近是否存在物品的退出物品检测，从而确定所拍摄到的货架前的顾客的手在离开货架时是否持握有物品以及持有物品的情况下物品的种类、数量等。Afterwards, when it is detected that the hand key points of the customer's hand enter the external polygon, for each hand of the customer, the detection of whether there is an item near the key point of the hand is performed respectively, so as to determine whether the captured hand of the customer in front of the shelf is holding an item before entering the shelf, and the type and quantity of the item when holding the item. It should be noted that the detection of objects near the key points of the hand can be performed by using any known suitable method with the help of tools such as YoloX. Subsequently, when it is detected that the hand key points of the customer's hand exit the external polygon, for each hand of the customer, the exit item detection of whether there is an item near the key point of the hand is performed respectively, so as to determine whether the captured hand of the customer in front of the shelf is holding an item when leaving the shelf, and the type and quantity of the item when holding the item.

可选地，在检测到顾客的两只手都分别进入外部多边形的情况下，还可以分别记录手部关键点附近的各物品在外部多边形与内部多边形之间轨迹。通过对不同物品的轨迹进行比对，例如能够有助于分辨出某物品是否是被顾客的双手持握。最后，根据进入物品检测结果和退出物品检测结果，优选地，再结合手部关键点附近的物品的轨迹，能够判定顾客在货架前的拾取或放回行为。例如，这样的判定过程可以通过基于各前序步骤中的检测结果，对顾客的双手分别使用有限状态机(FSM)来实现。Optionally, when it is detected that both hands of the customer have entered the outer polygon, the trajectory of each item near the key point of the hand between the outer polygon and the inner polygon can also be recorded respectively. By comparing the trajectories of different items, for example, it can be helpful to distinguish whether an item is held by both hands of the customer. Finally, according to the detection results of the incoming items and the outgoing items, preferably, combined with the trajectory of the items near the key points of the hand, it is possible to determine the customer's picking or putting behavior in front of the shelf. For example, such a determination process can be realized by using a finite state machine (FSM) for each of the customer's hands based on the detection results in the respective preceding steps.

根据本发明的第二实施例，发明人定义了位于代表货架的实际轮廓线的内部多边形外部的接近区域的外部多边形，从而使用在内部多边形与外部多边形之间的区域内获得的图像数据作为进行检测和识别的基础，并且以顾客的手进出外部多边形的时间点来作为检测和识别的触发点。通过这样的方法，能够有效地消除顾客的手在货架附近的多余或复杂动作对拾取或放回动作识别的干扰，从而更加准确有效地进行检测和识别。另外，容易理解的是，在真实情况下，顾客的手是在货架内(即，货架的内部多边形内)进行的物品的拾取或放回行为。然而，由于货架的遮挡，我们通常无法基于手部关键点及其附近的区域在内部多边形内的图像数据来直接判定出顾客在货架前可能出现的拾取或放回行为。因此，通过这样的方法，还能够有效消除货架的遮挡对于拾取或放回动作识别的影响。According to the second embodiment of the present invention, the inventor defines an outer polygon located in an approach area outside the inner polygon representing the actual outline of the shelf, thereby using the image data obtained in the area between the inner polygon and the outer polygon as the basis for detection and recognition, and taking the time point when the customer's hand enters and exits the outer polygon as the trigger point for detection and recognition. Through such a method, it is possible to effectively eliminate the interference of the redundant or complicated actions of the customer's hands near the shelf on the recognition of picking or putting back actions, so as to perform detection and identification more accurately and effectively. In addition, it is easy to understand that in a real situation, the customer's hand is inside the shelf (ie, within the inner polygon of the shelf) to pick up or put back the item. However, due to the occlusion of the shelf, we usually cannot directly determine the customer's possible pick-up or put-back behavior in front of the shelf based on the image data of the key points of the hand and its nearby areas within the internal polygon. Therefore, through such a method, the influence of the occlusion of the shelf on the recognition of the pick-up or put-back action can also be effectively eliminated.

优选地，根据本实施例的检测方法在在所述步骤S2之后，在顾客的手进入货架的所述外部多边形并针对顾客的每只手分别执行进入物品检测之后，还包括判定所述人的所述手部关键点是否进入所述内部多变形的步骤。在此情况下，只有在确定所述人的所述手部关键点进入所述内部多边形的情况下，根据本实施例的检测方法才继续进行后续步骤。例如，只有在顾客的手部关键点在获取的图像数据的至少3个连续帧中都处于货架的内部多边形内，则该手部关键点才被判定为进入了货架的内部多边形。Preferably, the detection method according to this embodiment, after step S2, after the customer's hand enters the outer polygon of the shelf and the entering object detection is performed for each hand of the customer, further includes the step of determining whether the key point of the person's hand enters the inner polygon. In this case, only when it is determined that the key point of the hand of the person enters the internal polygon, the detection method according to this embodiment proceeds to the subsequent steps. For example, only if the key point of the hand of the customer is within the internal polygon of the shelf in at least 3 consecutive frames of the acquired image data, then the key point of the hand is determined to have entered the internal polygon of the shelf.

在上述的根据本发明第二实施例的用于检测和识别顾客的拾取或放回行为的方法中，除了获取环境的图像数据的步骤是由各相机执行之外，其余的各步骤的处理均可通过与各相机通信连接的诸如电脑、CPU、TPU、GPU、FPGA等处理器或专用数据处理芯片执行。或者，优选地，这些步骤中的处理也可以由具有AI处理功能的传感器执行。这样的传感器兼具数据处理能力和数据存储能力，无需额外的硬件即可执行机器学习驱动的计算机视觉处理任务。这样的传感器例如可以集成设置在相机中。In the above-mentioned method for detecting and identifying customers’ pick-up or put-back behavior according to the second embodiment of the present invention, except that the step of acquiring the image data of the environment is performed by each camera, the processing of the remaining steps can be performed by processors such as computers, CPUs, TPUs, GPUs, FPGAs, etc. that are communicatively connected with each camera or dedicated data processing chips. Alternatively, preferably, the processing in these steps may also be performed by a sensor having an AI processing function. Such sensors combine data processing and data storage capabilities to perform machine learning-driven computer vision processing tasks without the need for additional hardware. Such a sensor can be integrated in a camera, for example.

下面，将详细地说明如何利用有限状态机基于进入物品检测结果和物品检测结果等来判定顾客在货架前的拾取或放回行为。Next, how to use the finite state machine to determine the customer's pick-up or put-back behavior in front of the shelf based on the incoming item detection result and the item detection result will be described in detail.

与第一实施例中类似地，在用于图示本实施例的附图中，在示出提取出的顾客骨架关键点信息、物品信息以及货架轮廓信息之外，为了便于理解和说明，也示出了完整的图片信息。但在实际的使用过程中，在提取步骤之后的各步骤中，均不需要传输、存储和处理完整的图片或视频数据。Similar to the first embodiment, in the drawings used to illustrate this embodiment, in addition to showing the extracted customer skeleton key point information, item information and shelf outline information, complete picture information is also shown for easy understanding and explanation. However, in actual use, there is no need to transmit, store and process complete picture or video data in each step after the extraction step.

2.2、顾客在货柜前的拾取或放回行为的简要分析2.2. A brief analysis of customers' pick-up or put-back behavior in front of the container

为了便于理解，这里先将顾客在货架前可能出现的拾取或放回行为进行简要说明和分类。For ease of understanding, here is a brief description and classification of the possible pick-up or put-back behaviors of customers in front of the shelves.

图12示意性地示出了顾客在货架期的主要四种拾取或放回行为。Figure 12 schematically shows the main four types of pick-up or put-back behaviors of customers during the shelf life.

(1)如图12右上部分所示，只有远离相机或图形传感器那侧的一只手进入货架的外部多边形，并且继而进入内部多边形放回和/或拾取一件物品。在此情况下，顾客两只手的关键点及其附近的区域均没有被遮挡，因此能够容易地检测到手部的关键点的动作以及手部的关键点附近的区域的物品。需要注意的是，在此情况下，两只手对于手部的关键点附近区域的物品(即，被手持握的物品)的遮挡情况是不一样的，因此在使用软件检测物品时，针对两只手的不同遮挡情况设定不同的检测阈值是优选的。(1) As shown in the upper right part of Figure 12, only one hand on the side away from the camera or image sensor enters the outer polygon of the shelf, and then enters the inner polygon to put back and/or pick up an item. In this case, neither the key points of the customer's hands nor the surrounding areas are blocked, so the movement of the key points of the hands and the items in the area near the key points of the hands can be easily detected. It should be noted that in this case, the two hands have different occlusion conditions for the items in the vicinity of the key points of the hands (that is, the items being held by the hand). Therefore, when using software to detect items, it is preferable to set different detection thresholds for different occlusion conditions of the two hands.

(2)如图12左上部分所示，只有靠近相机或图形传感器那侧的一只手进入货架的外部多边形，并且继而进入内部多边形放回和/或拾取一件物品。在此情况下，顾客的远侧的另一手的关键点及其附近的区域很可能被遮挡。因此，在根据本发明第二实施例的方法中，为了便于进行运算中，将这种情况看做是下文中将要说明的第四种情况进行处理。(2) As shown in the upper left part of Figure 12, only one hand near the camera or image sensor side enters the outer polygon of the shelf, and then enters the inner polygon to put back and/or pick up an item. In this case, the key point of the customer's far side of the other hand and the surrounding area are likely to be occluded. Therefore, in the method according to the second embodiment of the present invention, for the convenience of calculation, this situation is treated as the fourth situation to be described below.

(3)如图12左下部分所示，顾客的两只手均进入货架的外部多边形，在此情况下，如果检测到两只手的关键点附近的区域中的物品是不同的物品，则认为两只手分别放回和/或拾取了一件物品。也即是，该顾客放回和/或拾取了两件物品。(3) As shown in the lower left part of Figure 12, both hands of the customer enter the outer polygon of the shelf. In this case, if it is detected that the items in the area near the key points of the two hands are different items, it is considered that the two hands have put back and/or picked up an item respectively. That is, the customer puts back and/or picks up two items.

(4)如图12右下部分所示，顾客的两只手均进入货架的外部多边形，在此情况下，如果检测到两只手的关键点附近的区域中的物品是相同的物品，则认为两只手共同放回和/或拾取一件物品。也即是，该顾客一共放回和/或拾取了一件物品。(4) As shown in the lower right part of Figure 12, both hands of the customer enter the outer polygon of the shelf. In this case, if it is detected that the items in the area near the key points of the two hands are the same item, it is considered that the two hands jointly put back and/or pick up an item. That is, the customer put back and/or picked up one item in total.

需要注意的是，在顾客的两只手均进入货架的外部多边形的上述第(3)和第(4)种情况下，存在着两只手先后在货架上放回和/或拾取物品以及换手持握物品等复杂情况。但是，无论在放回和/或拾取物品的过程中的动作如何复杂，只要当双手退出货架的外部多边形时分别针对双手的手部关键点附近的物品进行检测并基于下文中说明的记录的物品轨迹判定是否为同一物体，即可简化放回和/或拾取物品行为的判定。It should be noted that in the above cases (3) and (4) where both hands of the customer enter the outer polygon of the shelf, there are complex situations such as putting back and/or picking up items on the shelf and changing hands to hold items with both hands. However, no matter how complicated the actions in the process of putting back and/or picking up items are, as long as the items near the key points of the hands of both hands are detected when the hands exit the outer polygon of the shelf and whether they are the same object is determined based on the recorded item trajectories described below, the determination of the behavior of putting back and/or picking up items can be simplified.

根据以上分析，更加具体地，上述4种主要拾取或放回行为可以进一步扩展为包括如下12种情况。According to the above analysis, more specifically, the above four main pick-up or put-back behaviors can be further expanded to include the following 12 situations.

只有一只手进入货架的情况下：Where only one hand enters the shelf:

(1)一只手放回一件物品；(1) Putting back an item with one hand;

(2)一只手拾取一件物品；(2) pick up an item with one hand;

(3)一只手放回一件物品并且拾取一件物品；(3) One hand puts back an item and picks up an item;

两只手均进入货架的情况下：With both hands entering the shelf:

(4)两只手放回一件物品；(4) put back an item with both hands;

(5)两只手放回两件物品；(5) put back two items with both hands;

(6)两只手拾取一件物品；(6) pick up an item with two hands;

(7)两只手拾取两件物品；(7) Pick up two items with two hands;

(8)两只手放回一件物品，一只手拾取一件物品；(8) put back an item with both hands, and pick up an item with one hand;

(9)两只手放回一件物品，两只手拾取一件物品；(9) put back an item with both hands, and pick up an item with both hands;

(10)两只手放回一件物品，两只手拾取两件物品；(10) put back one item with two hands, and pick up two items with two hands;

(11)两只手放回两件物品，一只手拾取一件物品；(11) Put back two items with both hands, and pick up an item with one hand;

(12)两只手放回两件物品，两只手拾取两件物品。(12) Put back two items with both hands and pick up two items with both hands.

因此，在下文的关于在本实施例中使用的有限状态机的说明中，将根据此处分析的顾客在货架前的可能存在的行为状态类别对有限状态机的参数进行定义和使用。Therefore, in the following description about the finite state machine used in this embodiment, the parameters of the finite state machine will be defined and used according to the possible behavior state categories of customers in front of the shelves analyzed here.

2.3、物品存在的判定条件的设定2.3, the setting of the judgment condition of the existence of the item

在上文中已经提及，在根据本发明的第二实施例的用于检测和识别顾客的拾取或放回行为的方法中，需要采用诸如YoloX等任何已知的适合的软件对手部关键点附近的物品进行进入物品检测和退出物品检测等物品检测。As mentioned above, in the method for detecting and identifying the customer’s pick-up or put-back behavior according to the second embodiment of the present invention, it is necessary to use any known suitable software such as YoloX to perform item detection such as entering item detection and exiting item detection for items near key points of the hand.

需要说明的是，为了使物品检测更加的准确，需要设定判定条件。例如，只有在所拍摄或获取的图像数据中的具有检测到的物品的帧数等于或大于预定的最小帧数，并且具有检测到的所述物品的帧数与图像数据的总帧数的比值等于或大于预定的最小比值的情况下，才确定为检测出进入物品。这里，进入物品是指顾客的手在进入货架时就持有的物品。类似地，只有在所拍摄或获取的图像数据中的具有检测到的物品的帧数等于或大于预定的最小帧数，并且具有检测到的所述物品的帧数与图像数据的总帧数的比值等于或大于预定的最小比值的情况下，才确定为检测出退出物品。这里，退出物品是指顾客的手在退出货架时持有的物品。这里的最小帧数和最小比值可以根据需要设定。例如，最小帧数可以为2帧，最小比值可以为0.03。或者，最小帧数可以为5帧，最小比值可以为0.06。It should be noted that, in order to make the item detection more accurate, it is necessary to set the determination condition. For example, only when the number of frames with detected items in the photographed or acquired image data is equal to or greater than a predetermined minimum number of frames, and the ratio of the number of frames with detected items to the total number of frames of image data is equal to or greater than a predetermined minimum ratio, it is determined that an incoming item is detected. Here, the entry item refers to the item that the customer's hand holds when entering the shelf. Similarly, only when the number of frames with the detected item in the photographed or acquired image data is equal to or greater than a predetermined minimum number of frames, and the ratio of the number of frames with the detected item to the total number of frames of the image data is equal to or greater than the predetermined minimum ratio, it is determined that an ejected item is detected. Here, withdrawn items refer to items held by the customer's hand when withdrawn from the shelf. The minimum number of frames and the minimum ratio here can be set as required. For example, the minimum number of frames may be 2 frames, and the minimum ratio may be 0.03. Alternatively, the minimum number of frames may be 5 frames, and the minimum ratio may be 0.06.

例如，图13的a和b是图示了根据本发明的第二实施例的物品存在的判定条件的示例的列表。如图13的a所示，在针对顾客的左手进行的进入物品检测过程中，在10个帧(时间戳)中，检测到物品的帧数为5。因此，判定左手在进入时持有物品(存在)。如图13的b所示，在针对顾客的左手进行的退出物品检测过程中，在10个帧(时间戳)中，检测到物品的帧数为1。因此，判定左手在退出时未持有物品(不存在)。For example, a and b of FIG. 13 are lists illustrating examples of determination conditions for the existence of an item according to the second embodiment of the present invention. As shown in a of FIG. 13 , in the process of detecting an incoming item for the customer's left hand, among 10 frames (time stamps), the number of frames in which the item is detected is 5. Therefore, it is determined that the left hand holds the item (existence) at the time of entry. As shown in b of FIG. 13 , in the process of detecting the withdrawn item for the left hand of the customer, among 10 frames (time stamps), the number of frames in which the item is detected is 1. Therefore, it is determined that the left hand does not hold the item (does not exist) at the time of exit.

2.4、本实施例中使用的有限状态机的说明2.4. Description of the finite state machine used in this example

有限状态机(FSM)是一种用于进行对象行为建模的工具，其作用主要是描述对象在它的生命周期内所经历的状态序列，以及如何响应来自外界的各种事件。在计算机科学中，有限状态机已经被广泛用于建模应用行为、硬件电路***设计、软件工程，编译器、网络协议、和计算与语言的研究。因此，有限状态机非常适合用来协助顾客在货架前的拾取或放回行为。Finite state machine (FSM) is a tool for object behavior modeling, its role is mainly to describe the state sequence that the object experiences in its life cycle, and how to respond to various events from the outside world. In computer science, finite state machines have been widely used to model application behavior, hardware circuit system design, software engineering, compilers, network protocols, and computing and language research. Therefore, finite state machines are well suited to assist customers with picking or putting back at the shelf.

简而言之，有限状态机可归纳为4个要素，即现态、条件、动作、次态。“现态”和“条件”是因，“动作”和“次态”是果。具体定义如下：In short, the finite state machine can be summarized into 4 elements, namely present state, condition, action and next state. "Current state" and "condition" are the cause, and "action" and "next state" are the effect. The specific definition is as follows:

现态：是指对象当前所处的状态。例如，在本实施例中，顾客的手部进入外部多边形时是否持握有物品的状态可以作为FSM的现态。Current state: refers to the current state of the object. For example, in this embodiment, when the customer's hand enters the outer polygon, the state of whether the customer is holding the item can be used as the current state of the FSM.

条件：又称为“事件”。当一个条件被满足，将会触发一个动作，或者执行一次状态的迁移。Conditions: Also known as "events". When a condition is met, an action will be triggered, or a state transition will be performed.

动作：条件满足后执行的动作。动作执行完毕后，可以迁移到新的状态，也可以仍旧保持原状态。动作不是必需的，当条件满足后，也可以不执行任何动作，直接迁移到新状态。Action: The action to be executed after the condition is met. After the action is executed, it can migrate to a new state, or it can remain in the original state. Actions are not required. When the conditions are met, you can directly migrate to the new state without performing any actions.

例如，在本实施例中，顾客的手部在货柜内的拾取或放回动作就可以作为FSM的“事件”或者“条件满足后执行的动作”，其导致迁移至新状态。For example, in this embodiment, the action of picking up or putting back the customer's hand in the container can be used as an "event" or "action performed after the condition is met" of the FSM, which leads to a transition to a new state.

次态：条件满足后要迁往的新状态。“次态”是相对于“现态”而言的，“次态”一旦被激活，就转变成新的“现态”了。例如，在本实施例中，顾客的手部退出外部多边形时是否持握有物品的状态可以作为FSM的次态。Next state: The new state to move to after the condition is met. The "secondary state" is relative to the "present state". Once the "secondary state" is activated, it will transform into a new "present state". For example, in this embodiment, the state of whether the customer's hand is holding the item when it exits the outer polygon can be used as the second state of the FSM.

在本实施例中，当用于一只手的一个FSM结束时，能够询问该FSM的当前估计，从而判断与这只手相关的物品的拾取或放回行为。例如，为了实施根据本发明的第二实施例的上述检测方法，可以为各FSM定义并获取用于顾客的各只手的内部变量accumulate_entry、accumulate_exit和accumulate_inside。其中，accumulate_entry是当手部关键点进入货架的外部多边形时的帧的列表，该内部变量在上述进入物品检测时用于表达是否存在检测到的物品；accumulate_exit是当手部关键点退出货架的外部多边形时的帧的列表，该内部变量在上述退出物品检测时用于表达是否存在检测到的物品；accumulate_inside是当手部关键点位于货架内部(即，位于货架的内部多边形)时的帧的列表，该内部变量用于确认顾客的手部是否实际进行了放回或拾取动作，并且如前文所述，例如以至少连续的3帧作为判断基础。各FSM能够通过调用前文所述的在进入状态期间的进入物品检测和在退出状态期间的退出物品检测的检测结果，来设置和修改这些内部参数。FSM根据进入物品检测的检测结果(即，FSM的现态)和退出物品检测的检测结果(即，FSM的次态)就能判断出顾客在货架前的拾取或放回动作(即，FSM的事件)。In this embodiment, when an FSM for a hand ends, the current estimate of that FSM can be interrogated to determine the pick or put back behavior of items associated with that hand. For example, to implement the above detection method according to the second embodiment of the present invention, internal variables accumulate_entry, accumulate_exit and accumulate_inside for each hand of the customer can be defined and obtained for each FSM. Among them, accumulate_entry is a list of frames when the key point of the hand enters the outer polygon of the shelf. This internal variable is used to express whether there is a detected item during the above-mentioned entry item detection; accumulate_exit is a list of frames when the key point of the hand exits the outer polygon of the shelf, and this internal variable is used to express whether there is a detected item during the above-mentioned exit item detection; shape), this internal variable is used to confirm whether the customer's hand has actually performed a put-back or pick-up action, and as described above, for example, at least 3 consecutive frames are used as the basis for judgment. Each FSM can set and modify these internal parameters by invoking the detection results of the incoming item detection during the entry state and the exiting item detection during the exit state as described above. According to the detection results of the incoming item detection (i.e., the current state of the FSM) and the detection results of the exiting item detection (i.e., the next state of the FSM), the FSM can determine the customer's pick-up or put-back action (i.e., the FSM event) in front of the shelf.

另外，在本实施例的检测方法中，在通过利用Java等工具使用FSM来完成状态判定的过程中，例如可以定义以下多种可能的状态：In addition, in the detection method of this embodiment, in the process of completing the state determination by using tools such as Java and using FSM, for example, the following multiple possible states can be defined:

unknown：顾客进入了货架的外部多边形，但没有进入内部多边形；unknown: the customer entered the outer polygon of the shelf, but not the inner polygon;

did_not_enter_shelf：顾客进入了货架的外部多边形但没有进入内部多边形，并随后退出了外部多边形；did_not_enter_shelf: The customer entered the outer polygon of the shelf but did not enter the inner polygon, and then exited the outer polygon;

other_hand：顾客用两只手拾取/放回物品，一只手退出了外部多边形，但另一只手还没有退出外部多边形；other_hand: The customer picks up/puts back items with two hands, one hand has exited the outer polygon, but the other hand has not exited the outer polygon;

both_hand_entry：顾客使用两只手在持有物品的情况下进入货架，仅仅是提醒；both_hand_entry: Customers use both hands to enter the shelf while holding the item, just a reminder;

both_hand_exit：顾客使用两只手在持有物品的情况下退出货架，仅仅是提醒；both_hand_exit: The customer uses both hands to exit the shelf while holding the item, just a reminder;

pick：顾客拾取了一件物品(在此状态下，无进入物品但存在退出物品)；pick: The customer picked up an item (in this state, there are no entry items but there are exit items);

release：顾客放回了一件物品(在此状态下，存在进入物品但不存在退出物品)release: The customer has put back an item (in this state, there are incoming items but no outgoing items)

no_change：顾客在退出时与进入时具有相同的状态(认为不存在进入物品也不存在退出物品)。no_change: Patron has the same state on exit as on entry (considers no entry nor exit).

需要说明的是，以上的定义仅仅是示例。FSM作为一种用于进行对象行为建模的工具，其具体实现方法在本领域内已经比较成熟，在本文中就不再赘述。It should be noted that the above definitions are only examples. As a tool for object behavior modeling, FSM has a relatively mature implementation method in this field, so it will not be repeated in this article.

2.5、基于物品轨迹的相同物品判定2.5. Judgment of the same item based on item trajectory

由前文中的说明可知，顾客在货架前的拾取/放回行为的最终判定是通过使用FSM基于进入货架时的进入物品的数量、退出货架时的退出物品的数量的比较来确定的。这又取决于进入的手的数量以及双手是否持握着相同的物品。As can be seen from the previous description, the final judgment of the customer’s pick-up/put-back behavior before the shelf is determined by using FSM based on the comparison of the number of incoming items when entering the shelf and the number of exiting items when exiting the shelf. This again depends on the number of hands coming in and whether both hands are holding the same item.

如前文所述，根据本发明的第二实施例的检测方法还包括记录顾客的手部关键点附近的物品在外部多边形与内部多边形之间的轨迹。因此，通过比较顾客的双手的手部关键点附近的进入物品的轨迹，就能够确定左手的手部关键点附近的进入物品与右手的手部关键点附近的进入物品是否为同一物品，例如图12左下和右下两部分所示的情况。例如，这样的轨迹比较是通过核查拍摄或获取的图像数据的各帧中的左手的手部关键点附近的进入物品与右手的手部关键点附近的进入物品之间的距离而进行的。例如，仅在各帧中的距离均低于预定的距离阈值的情况下，确定为轨迹类似，进而确定为两只手的手部关键点附近的进入物品为同一物品。或者，在各帧中的距离的平均值低于预定的距离阈值的情况下，确定为轨迹类似，进而确定为两只手的手部关键点附近的进入物品为同一物品。上述距离阈值可以根据不同应用场景中的物品的最小尺寸或者平均尺寸等因素而设定。例如，可以将距离阈值设定为25个像素。在这里，如果确定为是同一物品，则说明顾客将用两只手放回一件物品；如果确定为是不同的物品，则说明顾客将用两只手放回两件物品。As mentioned above, the detection method according to the second embodiment of the present invention further includes recording the trajectories of the items near the key points of the customer's hand between the outer polygon and the inner polygon. Therefore, by comparing the trajectories of the incoming items near the key points of the hands of the customer, it can be determined whether the incoming items near the key points of the left hand and the incoming items near the key points of the right hand are the same item, such as the situation shown in the lower left and lower right parts of Figure 12. For example, such trajectory comparison is performed by checking the distance between the incoming object near the key point of the hand of the left hand and the incoming object near the key point of the right hand in each frame of captured or acquired image data. For example, only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the trajectories are similar, and then it is determined that the incoming items near the hand key points of the two hands are the same item. Or, when the average value of the distances in each frame is lower than a predetermined distance threshold, it is determined that the trajectories are similar, and further determined that the incoming items near the hand key points of the two hands are the same item. The above-mentioned distance threshold may be set according to factors such as minimum size or average size of items in different application scenarios. For example, the distance threshold may be set to 25 pixels. Here, if it is determined to be the same item, it means that the customer will return one item with two hands; if it is determined to be different items, it means that the customer will return two items with two hands.

类似地，通过比较顾客的双手的手部关键点附近的退出物品的轨迹，就能够确定左手的手部关键点附近的退出物品与右手的手部关键点附近的进入物品是否为同一物品。例如，这样的轨迹比较是通过核查拍摄或获取的图像数据的各帧中的左手的手部关键点附近的退出物品与右手的手部关键点附近的退出物品之间的距离而进行的。例如，仅在各帧中的距离均低于预定的距离阈值的情况下，确定为轨迹类似，进而确定为两只手的手部关键点附近的退出物品为同一物品。或者，在各帧中的距离的平均值低于预定的距离阈值的情况下，确定为轨迹类似，进而确定为两只手的手部关键点附近的退出物品为同一物品。类似地，例如可以将距离阈值设定为25个像素。在这里，如果确定为是同一物品，则说明顾客用两只手从货架上拾取了一件物品；如果确定为是不同的物品，则说明顾客用两只手从货架上拾取了两件物品。Similarly, by comparing the trajectories of exiting items near the hand key points of the customer's both hands, it can be determined whether the exiting item near the left hand key point and the incoming item near the right hand key point are the same item. For example, such trajectory comparison is performed by checking the distance between exiting items near the hand key point of the left hand and exiting items near the hand key point of the right hand in each frame of captured or acquired image data. For example, only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the trajectories are similar, and then it is determined that the exiting items near the hand key points of the two hands are the same item. Or, when the average value of the distances in each frame is lower than a predetermined distance threshold, it is determined that the trajectories are similar, and it is further determined that the withdrawn items near the hand key points of the two hands are the same item. Similarly, the distance threshold may be set to 25 pixels, for example. Here, if it is determined to be the same item, it means that the customer has picked up one item from the shelf with two hands; if it is determined to be different items, it means that the customer has picked up two items from the shelf with two hands.

图14示出了上述轨迹比较的示例。如图14中所示，在图像数据的10个帧(时间戳)中，针对左手的进入物品检测共在6帧中检测到了进入物品，针对右手的进入物品检测共在7帧中检测到了进入物品。通过比较各帧中物品的坐标之间的距离，可知所有的距离均低于25个像素。因此，可以认为左手和右手在进入时是持握的相同的物品。Fig. 14 shows an example of the trajectory comparison described above. As shown in FIG. 14 , among the 10 frames (time stamps) of the image data, the entry object detection for the left hand detected entry articles in 6 frames in total, and the entry article detection for the right hand detected entry articles in 7 frames in total. By comparing the distances between the coordinates of the items in each frame, it can be seen that all the distances are below 25 pixels. Therefore, it can be considered that the left and right hands are holding the same item when entering.

另外，在进行上述的距离比较时，可以设想，如果通过比较判定左手附近的进入物品与右手附近的退出物品具有相同的轨迹，或者右手附近的进入物品与左手附近的退出物品具有相同的轨迹，则我们将难以判断这样的物品到底是被那一只手持握的。因此，当发生这样的情况时，需要去除这样的干扰点。例如，当在某帧图像数据中，左手的退出物品与右手的进入物品的距离均小于25个像素时，将这样的干扰点的数据直接去除。In addition, when performing the above-mentioned distance comparison, it can be imagined that if the entry item near the left hand and the exit item near the right hand have the same trajectory through the comparison, or the entry item near the right hand and the exit item near the left hand have the same trajectory, then it will be difficult for us to judge which hand the object is held by. Therefore, when such a situation occurs, it is necessary to remove such interference points. For example, when in a certain frame of image data, the distances between the exiting items of the left hand and the entering items of the right hand are both less than 25 pixels, the data of such interference points are directly removed.

2.6、基于FSM的拾取或放回行为的行为判定2.6. Behavior Judgment Based on FSM Pickup or Putback Behavior

下面，将结合上文中的说明，论述根据本发明第二实施例的检测方法中的基于FSM的拾取或放回行为的行为判定的情况。在下面的说明中，可以将FSM返回的状态判定信息作为顾客在货架前的拾取或放回行为的行为判定结果。In the following, in conjunction with the above description, the situation of the behavior determination of the pick-up or put-back behavior based on FSM in the detection method according to the second embodiment of the present invention will be discussed. In the following description, the state judgment information returned by the FSM can be used as the behavior judgment result of the customer's picking or putting back behavior in front of the shelf.

当至少一只手离开货架的外部多边形时，触发FSM的状态判定并且针对各帧进行核查。此时，FSM根据前文中说明的进入物品的判定结果、退出物品的判定结果和物品轨迹等数据中的至少一部分来进行拾取或放回行为的分析和判定。应当理解的是，由于FSM是针对两只手分别运行的，所以可以先针对其中的一只手(例如，先离开的那只手)进行状态判定，然后再针对另一只手(例如，后离开的那只支手)进行状态判定。When at least one hand leaves the outer polygon of the shelf, a state determination of the FSM is triggered and checked for each frame. At this time, the FSM analyzes and judges the behavior of picking up or putting back according to at least a part of the data such as the determination results of the incoming items, the determination results of the exiting items, and the trajectory of the items described above. It should be understood that since the FSM operates separately for the two hands, the state judgment can be made for one of the hands (for example, the hand that leaves first) first, and then for the other hand (for example, the hand that leaves later).

图15示出了第一个FSM运行时的状态判定的示例性流程图。如图15中所示，当FSM的编号为1时，当对应的手部离开货架的外部多边形时，基于前文说明的进入物品检测和退出物品检测进行判定。当没有进入物品(～entry)且具有退出物品(exit)时，返回状态“拾取一件物品(拾取+1)”。当具有进入物品(entry)且没有退出物品(～exit)时，返回状态“放回一件物品(放回－1)”。当没有进入物品(～entry)且没有退出物品(～exit)时，返回状态“没有变化”。当具有进入物品(entry)且具有退出物品(exit)时，返回状态“放回，拾取”。FIG. 15 shows an exemplary flow chart of status determination when the first FSM is running. As shown in FIG. 15 , when the number of the FSM is 1, when the corresponding hand leaves the outer polygon of the shelf, a judgment is made based on the detection of the incoming item and the detection of the outgoing item explained above. When there is no entry item (~entry) and there is an exit item (exit), return state "pick up an item (pickup+1)". When there is an entry item (entry) and no exit item (~exit), return the state "put back an item (put back -1)". When there is no entry (~entry) and no entry (~exit), return status "no change". Return state "put back, pick up" when there is an entry and there is an exit.

接着，如果另一只手也退出了货架的外部多边形，则进行第二个FSM的状态判定，如图16的示例性流程图所示。当FSM的编号为2时，首先检查第二只手在进入时的进入状态。当这只手在进入不持有物品(～object)时，则继续检查这只手在退出时的状态。如果退出时未持有物品(～object)，则返回状态“没有变化”；如果退出时持有物品(object)，则比较两只手的物品的退出轨迹。如果退出轨迹相同，则返回状态“双手退出”；如果退出轨迹不同，则返回状态“拾取一件物品(拾取+1)”。Then, if the other hand also exits the outer polygon of the shelf, the state judgment of the second FSM is performed, as shown in the exemplary flowchart of FIG. 16 . When the number of the FSM is 2, first check the entry status of the second hand on entry. When the hand is entering without holding an item (~object), continue to check the state of the hand when exiting. If the object (~object) is not held when exiting, return the status "no change"; if the object (object) is held when exiting, compare the exit trajectory of the object in both hands. If the exit trajectories are the same, return the state "Exit with both hands"; if the exit trajectories are different, return the state "Pick up an item (Pick +1)".

此外，在第二只手在进入时持有物品(object)的情况下，则比较两只手的物品的进入轨迹，并分别继续检查退出状态。在两只手的物品的进入轨迹相同的情况下，若第二只手的退出状态为未持有物品(～object)，则返回状态“双手进入”；若第二只手的退出状态为持有物品(object)，则比较两只手的物品的退出轨迹，如果退出轨迹相同，则返回状态“放回，拾取”，如果退出轨迹不同，则返回状态“拾取一件物品(拾取+1)”。在两只手的物品的进入轨迹不同的情况下，若第二只手的退出状态为未持有物品(～object)，则返回状态“放回一件物品(放回－1)”；若第二只手的退出状态为持有物品(object)，则比较两只手的物品的退出轨迹，如果退出轨迹不同，则返回状态“放回，拾取”，如果退出轨迹不同，则返回状态“放回一件物品(放回－1)”。Also, in the case where the second hand holds the object upon entry, then the entry trajectories of the objects of both hands are compared, and the exit status is checked separately. In the case that the entry trajectories of the items of the two hands are the same, if the exit state of the second hand is not holding an object (~object), return the state "both hands enter"; if the exit state of the second hand is holding an object (object), then compare the exit trajectories of the objects of the two hands. In the case that the entry trajectories of the items in the two hands are different, if the exit state of the second hand is not holding an object (~object), return the state "put back an object (put back -1)";

图17图示了在仅有针对一只手的FSM运行的情况下获得的FSM的状态表。可以理解的是，当仅有一只手进入货架的情况下，图17的状态表中反映出的拾取或放回行为状态即是顾客的拾取或放回行为检测最终状态分类。图18图示了在针对顾客的两只手的FSM运行的情况下获得的FSM的状态表。图19示出了通过将图18中的两个FSM的状态表相结合之后获得的最终状态表。可以理解的是，当顾客的两只手均进入货架的情况下，图19的状态表中反映出的拾取或放回行为状态是顾客的拾取或放回行为检测的最终判定情况。Fig. 17 illustrates the state table of the FSM obtained when only the FSM for one hand is run. It can be understood that when only one hand enters the shelf, the pick-up or put-back behavior status reflected in the state table in FIG. 17 is the final status classification of the customer's pick-up or put-back behavior detection. FIG. 18 illustrates the state table of the FSM obtained in the case of the FSM run for both hands of the customer. FIG. 19 shows the final state table obtained by combining the state tables of the two FSMs in FIG. 18 . It can be understood that when both hands of the customer enter the shelf, the pick-up or put-back behavior status reflected in the state table in FIG. 19 is the final determination of the customer's pick-up or put-back behavior detection.

2.7、变型例2.7. Variations

在上面的根据本发明的第二实施例的说明中，图像数据是来自于一台相机或图像传感器从一个视角拍摄的一组图像数据。然而，本发明不限于此，例如，可以使用来自多个相机或图像传感器的从不同的视角拍摄的多组图像数据。在此情况下，能够避免因拍摄视角的盲区或物体的遮挡而导致的检测错误。例如，在此情况下，只要在来自多个相机或图像传感器的多组图像数据中的具有检测到的物品的帧数之和等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，进入物品检测/退出物品检测就能够确认检测出进入物品/退出物品。例如，如图20所示，在第一台相机获取的图像数据中，仅检测出在右手关键点附近的区域中存在退出物品。在第二台相机获取的图像数据中，仅检测出在左手关键点附近的区域中存在退出物品。当结合考虑基于多组图像数据的检测结果时，可以判定为在左手关键点附近的区域中和在右手关键点附近的区域中分别都检测到了退出物品。In the above description according to the second embodiment of the present invention, the image data is a set of image data taken from one viewing angle by one camera or image sensor. However, the present invention is not limited thereto, for example, multiple sets of image data taken from multiple cameras or image sensors from different viewing angles may be used. In this case, it is possible to avoid detection errors caused by blind areas of shooting angles or occlusions of objects. For example, in this case, the entry article detection/exit article detection can confirm the detection of the entry article/exit article as long as the sum of the number of frames with the detected article in the plurality of sets of image data from multiple cameras or image sensors is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with the object detected to the total number of frames is equal to or greater than the predetermined minimum ratio. For example, as shown in FIG. 20 , in the image data acquired by the first camera, only exiting objects are detected in the area near the right-hand key point. In the image data acquired by the second camera, exit items were detected only in the area around the left-hand keypoint. When considering the detection results based on multiple sets of image data, it can be determined that exiting items are detected in the area near the left-hand key point and in the area near the right-hand key point.

另外，在上面的根据本发明的第二实施例的说明中，已经说明了使用FSM作为工具基于手部进入外部多边形时的进入物品检测的检测结果和退出外部多边形时的退出物品检测的检测结果以及物品的轨迹比较的比较结果的至少一部分来实现顾客在货柜前的拾取或放回动作的检测和识别。但是，本发明不限于此，而是可以使用任何本领域中适合的用于描述对象的状态序列以及如何响应来自外界的各种事件的工具。例如，通过上文中的说明，可以知道顾客在货柜前的拾取或放回动作是有限的，其与手部进出货架的外部多边形时的物品检测结果以及物品的轨迹比较结果的关联性也是相对固定的。因此，例如还可以预先将完全地记载有拾取或放回动作的各种情况以及上述各种检查结果与各种情况的关联性和对应关系的查找表存储在处理器的存储元件中或处理器能够访问的存储元件中。这样，能够使用查找表作为工具，基于手部进入外部多边形时的进入物品检测的检测结果和退出外部多边形时的退出物品检测的检测结果以及物品的轨迹比较的比较结果的至少一部分，能够查表的方式来实现根据本发明的第二实施例的检测方法。In addition, in the above description according to the second embodiment of the present invention, it has been explained that FSM is used as a tool to realize the detection and recognition of the customer's pick-up or put-back action in front of the container based on at least a part of the detection results of the incoming item detection when the hand enters the outer polygon, the detection results of the exiting item detection when the hand exits the outer polygon, and the comparison results of the trajectory comparison of the items. However, the present invention is not limited thereto, but any tool suitable in the art for describing the state sequence of an object and how to respond to various events from the outside can be used. For example, from the above description, it can be known that the customer’s pick-up or put-back action in front of the container is limited, and its correlation with the item detection results and item trajectory comparison results when the hand enters and exits the outer polygon of the shelf is also relatively fixed. Therefore, for example, a look-up table that fully records the various situations of the pick-up or put-back action and the correlation and correspondence between the above-mentioned various inspection results and various situations may also be stored in the storage element of the processor or in a storage element accessible to the processor in advance. In this way, the lookup table can be used as a tool, based on at least a part of the detection result of the entry object detection when the hand enters the external polygon, the detection result of the exit object detection when the hand exits the external polygon, and the comparison result of the trajectory comparison of the object, the detection method according to the second embodiment of the present invention can be implemented in a table lookup manner.

2.8、用于检测和识别顾客的拾取或放回行为的设备的示例2.8. Examples of devices used to detect and identify customer pick-up or put-back behavior

根据本发明第二实施例的用于检测和识别顾客的拾取或放回行为的检测设备例如可以包括：至少一台相机或图像传感器，以用于获取图像数据；处理单元，所述处理单元接收由所述至少一台相机或图像传感器获取的这些图像数据，并且能够采用前文中所述的根据本发明第二实施例的检测方法对所述图像数据进行处理，以检测和识别顾客在货架前的拾取或放回行为。处理单元例如能够通过多个组成模块执行作为软件或固件而被存储在其中的存储元件中或与其互连的存储器或数据存储器中的应用程序或例程，从而执行上文中所述的根据本发明第二实施例的用于检测和识别顾客在货架前的拾取或放回行为的方法。这里的处理单元例如由中央处理单元(CPU)和存储元件构成。例如，处理单元可以包括一个或多个通用处理器、控制器、现场可编程门阵列(FPGA)、图形处理单元(GPU)、专用集成电路(ASIC)或它们的组合等，作为与各个相机或图像传感器数据通信的专用数据处理器或数据处理芯片。或者，处理单元也可以是集成在各相机或图像传感器中的彼此能够互相进行数据交互的具有AI处理功能的传感器元件。这样的传感器元件兼具数据处理能力和数据存储能力，无需额外的硬件即可执行根据本发明第二实施例的用于检测和识别顾客在货架前的拾取或放回行为的方法。The detection device for detecting and identifying customers' pick-up or put-back behavior according to the second embodiment of the present invention may include, for example: at least one camera or image sensor for acquiring image data; a processing unit that receives the image data acquired by the at least one camera or image sensor, and can use the detection method described above according to the second embodiment of the present invention to process the image data, so as to detect and identify customers' pick-up or put-back behavior in front of the shelf. The processing unit can execute, for example, an application program or a routine stored in a storage element as software or firmware or in a memory or a data storage interconnected therewith through a plurality of constituent modules, thereby performing the method for detecting and identifying a customer's pick-up or put-back behavior in front of a shelf according to the second embodiment of the present invention described above. The processing unit here is constituted by, for example, a central processing unit (CPU) and a storage element. For example, the processing unit may include one or more general-purpose processors, controllers, Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs), Application Specific Integrated Circuits (ASICs), or combinations thereof, as dedicated data processors or data processing chips for data communication with respective cameras or image sensors. Alternatively, the processing unit may also be a sensor element with an AI processing function that is integrated in each camera or image sensor and can perform data interaction with each other. Such a sensor element has both data processing capability and data storage capability, and can implement the method for detecting and identifying customers' pick-up or put-back behavior in front of the shelf according to the second embodiment of the present invention without additional hardware.

根据本发明的第二实施例，由于在获取视频或图片数据后就立即对这些数据进行了提取关键点信息的数据简化处理，在后续的所有步骤中均只需要传输和处理提取出的人骨架的关键点数据，因此大大减少了数据处理量，能够快速、准确地识别出无人购物场所中顾客在货架前的拾取或放回行为。此外，由于不需要对顾客进行面部识别，并且在经过最初的数据简化处理后，存储和传输的数据中将只包含顾客的骨架的包含手部在内的关键点的数据，因此保护了顾客隐私权。According to the second embodiment of the present invention, since the data simplification process of extracting key point information is performed on these data immediately after the video or picture data is acquired, only the key point data of the extracted human skeleton need to be transmitted and processed in all subsequent steps, thus greatly reducing the amount of data processing, and can quickly and accurately identify the behavior of picking up or putting back the customer in front of the shelf in an unmanned shopping place. In addition, since customers do not need to be facially recognized, and after initial data simplification, the stored and transmitted data will only contain key points of the customer's skeleton, including hands, thus protecting customer privacy.

本发明例如可以如下进行实施、构造或配置。The present invention can be implemented, constructed or arranged, for example, as follows.

(1)一种用于检测分布有相机的场所中的人的摔倒的检测方法，其特征在于，包括如下步骤：(1) A detection method for detecting a fall of a person in a place where a camera is distributed, characterized in that, comprising the steps of:

步骤S1：对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；Step S1: Calibrate all the cameras so that each camera has an appropriate vertical vector in its field of view;

步骤S2：通过至少一部分所述相机获得包含所述场所中的人的图像数据，并且从所述图像数据中提取出所述人的骨架的关键点的数据；Step S2: Obtaining image data containing people in the place through at least a part of the cameras, and extracting data of key points of the person's skeleton from the image data;

步骤S3：通过使用所述关键点的数据估算所述人的个人垂直度；Step S3: estimating the personal verticality of the person by using the data of the key points;

步骤S4：针对所述至少一部分相机中的各者，基于相应的相机的所述视场中的所述垂直向量和所述个人垂直度计算所述人的垂直角度；Step S4: For each of said at least a portion of cameras, calculating said person's vertical angle based on said vertical vector in said field of view of the corresponding camera and said person's verticality;

步骤S5：通过对所述人在某一时刻在所述至少一部分相机的各所述视场中的所有所述垂直角度进行聚合，获得所述人的最终垂直角度；Step S5: Obtaining the final vertical angle of the person by aggregating all the vertical angles of the person in each of the fields of view of the at least a part of the cameras at a certain moment;

步骤S6：基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。Step S6: Determine whether the person has fallen based on the fall score obtained from the final vertical angle.

(2)(2)

根据(1)中所述的方法，其特征在于，在所述步骤S1中，所述标定时基于各所述相机的内部参数和外部参数进行的。According to the method described in (1), it is characterized in that, in the step S1, the calibration is performed based on internal parameters and external parameters of each of the cameras.

(3)(3)

根据(1)中所述的方法，其特征在于，在所述步骤S2中，所述人的所述关键点的数据的提取是在不进行所述人的面部识别的情况下进行的。According to the method described in (1), it is characterized in that, in the step S2, the data of the key points of the person is extracted without face recognition of the person.

(4)(4)

根据(1)中所述的方法，其特征在于，在所述步骤S2中，提取所述人的17个所述关键点的数据。According to the method described in (1), it is characterized in that in the step S2, the data of the 17 key points of the person are extracted.

(5)(5)

根据(1)中所述的方法，其特征在于，在所述步骤S3中，只在有效视场中估算所述人的所述个人垂直度，所述有效视场是指其中的置信因子c等于或大于预定的置信阈值的视场，所述置信因子c如下定义：According to the method described in (1), it is characterized in that, in the step S3, the personal verticality of the person is only estimated in the effective field of view, the effective field of view refers to the field of view in which the confidence factor c is equal to or greater than a predetermined confidence threshold, and the confidence factor c is defined as follows:

c＝N _v/N _t c=N _v /N _t

其中，N _v是在相应的视场中的所述人的能够观察到的所述关键点的数量，N _t是所述人的所述关键点的总数。 Wherein, N _v is the number of key points that can be observed by the person in the corresponding field of view, and N _t is the total number of key points of the person.

(6)(6)

根据(5)中所述的方法，其特征在于，在所述步骤S3中，通过计算在所述有效视场中的所述人的从代表足部的所述关键点到代表头部的所述关键点的身体向量而获得所述人的个人垂直度。According to the method described in (5), it is characterized in that, in the step S3, the personal verticality of the person is obtained by calculating the body vector from the key point representing the foot to the key point representing the head of the person in the effective field of view.

(7)(7)

根据(6)中所述的方法，其特征在于，基于所述人的能够观察到的代表所述头部的所述关键点的平均坐标和能够观察到的代表所述足部的所述关键点的平均坐标计算所述身体向量。According to the method described in (6), it is characterized in that the body vector is calculated based on the observed average coordinates of the key points representing the head and the observed average coordinates of the key points representing the feet of the person.

(8)(8)

根据(1)中所述的方法，其特征在于，在所述步骤S5中，所述人的所述最终垂直角度等于所述人在所述至少一部分相机的各所述视场中的所述垂直角度的最大值。According to the method described in (1), it is characterized in that, in the step S5, the final vertical angle of the person is equal to the maximum value of the vertical angles of the person in each of the fields of view of the at least some cameras.

(9)(9)

根据(1)中所述的方法，其特征在于，在所述步骤S6中，所述最终垂直角度α _v与所述摔倒分值s _f之间的关系满足如下等式： According to the method described in (1), it is characterized in that, in the step S6, the relationship between the final vertical angle α _v and the fall score s _f satisfies the following equation:

若α _v＜T _l，则s _f＝0； If α _v <T _l , then s _f =0;

若T _l＜α _v＜T _h，则

If T _l <α _v <T _h , then

若α _v＞T _h，则s _f＝1， If α _v >T _h , then s _f =1,

其中，T _l为设定的所述最终垂直角度的下限，T _h为设定的所述最终垂直角度的上限，并且 Wherein, _T1 is the lower limit of the final vertical angle set, _Th is the upper limit of the final vertical angle set, and

仅在所述摔倒分值s _f大于判定阈值的情况下，才判定并检测出所述人的摔倒。 Only when the fall score s _f is greater than the judgment threshold, the person's fall is judged and detected.

(10)(10)

根据(9)中所述的方法，其特征在于，所述最终垂直角度的所述下限为40度，所述最终垂直角度的所述上限为80度，并且所述判定阈值为0.5。According to the method described in (9), it is characterized in that the lower limit of the final vertical angle is 40 degrees, the upper limit of the final vertical angle is 80 degrees, and the determination threshold is 0.5.

(11)(11)

一种用于检测场所内的人的摔倒的检测设备，其特征在于，包括：A detection device for detecting falls of people in a place, characterized in that it includes:

至少一个相机，所述至少一个相机分布在所述场所内并且具有不同的视场，所述至少一个相机能够获得包含所述场所中的人的图像数据；at least one camera, distributed within the location and having different fields of view, capable of obtaining image data comprising persons in the location;

处理单元，所述处理单元对所述多个相机获得的所述图像数据进行处理，以判定所述场所内的所述人是否摔倒，其中，所述处理单元包括：A processing unit, the processing unit processes the image data obtained by the plurality of cameras to determine whether the person in the place has fallen, wherein the processing unit includes:

标定模块，对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；A calibration module, which calibrates all the cameras so that there is an appropriate vertical vector in the field of view of each camera;

数据处理模块，对从所述多个相机传输来的所述图像数据进行处理，从而获得所述人在所述多个相机之中的至少一部分相机的所述视场中的个人垂直度；a data processing module, processing the image data transmitted from the plurality of cameras, so as to obtain the personal verticality of the person in the field of view of at least a part of the cameras among the plurality of cameras;

计算模块，基于从所述标定模块发送来的所述垂直向量和从所述数据处理模块发送来的所述个人垂直度计算所述人的最终垂直角度；以及a calculation module that calculates the final vertical angle of the person based on the vertical vector sent from the calibration module and the personal vertical degree sent from the data processing module; and

判定模块，基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。A judging module that judges whether the person has fallen based on the fall score obtained from the final vertical angle.

(12)(12)

根据(11)中所述的检测设备，其特征在于，所述数据处理模块包括：According to the detection device described in (11), it is characterized in that the data processing module includes:

提取模块，从所述图像数据中提取出所述人的骨架的关键点的数据；和an extraction module, extracting data of key points of the human skeleton from the image data; and

(13)(13)

根据(12)中所述的检测设备，其特征在于，所述计算模块包括：According to the detection device described in (12), it is characterized in that the calculation module includes:

角度计算模块，针对所述至少一部分相机的各所述视场分别基于所述垂直向量和所述个人垂直度计算所述人的垂直角度；以及an angle calculation module for calculating the vertical angle of the person based on the vertical vector and the personal verticality for each of the fields of view of the at least a portion of the cameras; and

聚合模块，通过对所述人在某一时刻在所述至少一部分相机的各所述视场中的所有所述垂直角度进行聚合，获得所述人的所述最终垂直角度。The aggregation module is configured to obtain the final vertical angle of the person by aggregating all the vertical angles of the person in each of the fields of view of the at least a part of the cameras at a certain moment.

(14)(14)

根据(11)至(13)中任一项所述的检测设备，其特征在于，所述标定模块基于所述多个相机的各者的内部参数和外部参数进行所述标定。The detection device according to any one of (11) to (13), wherein the calibration module performs the calibration based on internal parameters and external parameters of each of the plurality of cameras.

(15)(15)

根据(12)中所述的检测设备，其特征在于，所述提取模块仅将所述人的所述关键点的数据发送给所述估算模块。According to the detection device described in (12), it is characterized in that the extraction module only sends the data of the key points of the person to the estimation module.

(16)(16)

根据(12)中所述的检测设备，其特征在于，所述提取模块提取所述人的17个所述关键点的数据。According to the detection device described in (12), it is characterized in that the extraction module extracts the data of the 17 key points of the person.

(17)(17)

根据(12)中所述的检测设备，其特征在于，所述估算模块只估算所述多个相机的各所述视场之中的有效视场中的所述人的所述个人垂直度，所述有效视场是指其中的置信因子c等于或大于预定的置信阈值的视场，所述置信因子c如下定义：According to the detection device described in (12), it is characterized in that the estimation module only estimates the personal verticality of the person in the effective field of view among the respective fields of view of the plurality of cameras, the effective field of view refers to a field of view in which a confidence factor c is equal to or greater than a predetermined confidence threshold, and the confidence factor c is defined as follows:

c＝N _v/N _t c=N _v /N _t

(18)(18)

根据(17)中所述的检测设备，其特征在于，所述估算模块通过计算在所述有效视场中的所述人的从代表足部的所述关键点到代表头部的所述关键点的身体向量而获得所述人的个人垂直度。According to the detection device described in (17), it is characterized in that the estimation module obtains the personal verticality of the person by calculating a body vector of the person in the effective field of view from the key point representing the foot to the key point representing the head.

(19)(19)

根据(18)中所述的检测设备，其特征在于，所述估算模块基于所述人的能够观察到的代表所述头部的所述关键点的平均坐标和能够观察到的代表所述足部的所述关键点的平均坐标计算所述身体向量。According to the detection device described in (18), it is characterized in that the estimation module calculates the body vector based on the observed average coordinates of the key points representing the head and the observed average coordinates of the key points representing the feet of the person.

(20)(20)

根据(13)中所述的检测设备，其特征在于，所述聚合模块将所述人在所述至少一部分相机的各所述视场中的所述垂直角度的最大值设定为所述人的所述最终垂直角度。The detection device according to (13), wherein the aggregation module sets the maximum value of the vertical angles of the person in each of the fields of view of the at least a part of the cameras as the final vertical angle of the person.

(21)(twenty one)

根据(11)至(13)中任一项所述的检测设备，其特征在于，所述判定模块仅在摔倒分值s _f大于判定阈值的情况下，才判定并检测出所述人的摔倒， According to the detection device described in any one of (11) to (13), it is characterized in that the judgment module judges and detects the fall of the person only when the fall score s _f is greater than the judgment threshold,

其中，所述最终垂直角度α _v与所述摔倒分值sf之间的关系满足如下等式： Wherein, the relationship between the final vertical angle α _v and the fall score sf satisfies the following equation:

若α _v＜T _l，则s _f＝0； If α _v <T _l , then s _f =0;

若T _l＜α _v＜T _h，则

If T _l <α _v <T _h , then

若α _v＞T _h，则s _f＝1， If α _v >T _h , then s _f =1,

其中，T _l为设定的所述最终垂直角度的下限，T _h为设定的所述最终垂直角度的上限。 Wherein, T ₁ is the lower limit of the set final vertical angle, and _Th is the upper limit of the set final vertical angle.

(22)(twenty two)

根据(21)中所述的检测设备，其特征在于，所述最终垂直角度的所述下限为40度，所述最终垂直角度的所述上限为80度，并且所述判定阈值为0.5。The detection device according to (21), characterized in that the lower limit of the final vertical angle is 40 degrees, the upper limit of the final vertical angle is 80 degrees, and the determination threshold is 0.5.

(23)(twenty three)

一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施如前述(1)-(10)中任一项所述的方法。A storage medium on which a computer-readable program is stored, and when the program is executed on a processor, the method as described in any one of the aforementioned (1)-(10) is implemented.

(24)(twenty four)

一种用于检测人在货架前的拾取或放回行为的检测方法，其特征在于，包括如下步骤：A detection method for detecting the behavior of picking up or putting back a person in front of a shelf, characterized in that it includes the following steps:

步骤S1：从图像数据中获取所述人的骨架的包含手部关键点的多个关键点的数据，并且从所述图像数据提取所述货架的外轮廓线，其中，所述外轮廓线包括所述货架的外部多边形以及与所述货架的真实外轮廓相对应的内部多边形，所述外部多边形在所述内部多边形的外部的接近区域中；Step S1: Acquiring the data of a plurality of key points of the human skeleton including hand key points from the image data, and extracting the outer contour line of the shelf from the image data, wherein the outer contour line includes an outer polygon of the shelf and an inner polygon corresponding to the real outer contour of the shelf, and the outer polygon is in an external proximity area of the inner polygon;

步骤S2：在所述人的至少一只手的所述手部关键点被检测到进入所述外部多边形的情况下，针对所述人的进入所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的进入物品检测；Step S2: In the case that the key points of the hand of at least one hand of the person are detected to enter the outer polygon, for each hand of the person that enters the outer polygon, perform an incoming object detection for detecting objects in the vicinity of the key points of the hand;

步骤S4：在所述人的至少一只手的所述手部关键点被检测到退出所述外部多边形的情况下，针对所述人的退出所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的退出物品检测；Step S4: In the case that the key point of the hand of at least one hand of the person is detected to exit the outer polygon, for each hand of the person that exits the outer polygon, perform exit object detection for detecting objects near the key point of the hand;

步骤S5：基于所述进入物品检测的结果和所述退出物品检测的结果来判定所述人在所述货架前的所述拾取或放回行为。Step S5: Determine the pick-up or put-back behavior of the person in front of the shelf based on the detection result of the incoming item and the detection result of the exiting item.

(25)(25)

根据(24)中所述的方法，其特征在于，在所述步骤S1中，从图像数据中获取所述人的所述多个关键点的数据包括：According to the method described in (24), it is characterized in that, in the step S1, obtaining the data of the plurality of key points of the person from the image data includes:

通过至少一部相机拍摄包含所述货架和所述货架前的所述人的所述图像数据；capturing the image data including the shelf and the person in front of the shelf by at least one camera;

从所述图像数据中提取所述人的包含所述手部关键点的所述多个关键点的数据。Data of the plurality of key points including the hand key points of the person are extracted from the image data.

(26)(26)

根据(24)中所述的方法，其特征在于，在所述步骤S2之后，还包括判定所述人的所述手部关键点是否进入所述内部多边形的步骤，其中According to the method described in (24), it is characterized in that, after the step S2, further comprising the step of determining whether the key point of the hand of the person enters the internal polygon, wherein

在确定所述人的所述手部关键点进入所述内部多变形的情况下，执行所述步骤S2的后续步骤。In the case that it is determined that the key point of the hand of the person enters the internal polymorph, the subsequent steps of the step S2 are executed.

(27)(27)

根据(26)中所述的方法，其特征在于，According to the method described in (26), characterized in that,

只有在所述人的所述手部关键点在所述图像数据的至少3个连续帧中都处于所述内部多边形内，所述手部关键点才被确定为进入所述内部多边形。The key point of the hand of the person is determined to enter the internal polygon only if the key point of the hand of the person is within the internal polygon in at least 3 consecutive frames of the image data.

(28)(28)

根据(24)中所述的方法，其特征在于，在所述步骤S5中，通过基于所述进入物品检测的结果和所述退出物品检测的结果针对所述人的每只手分别使用一个有限状态机，来判定所述人在所述货架前的所述拾取或放回行为。According to the method described in (24), it is characterized in that, in the step S5, by using a finite state machine for each hand of the person based on the result of the detection of the incoming item and the result of the detection of the exiting item, the behavior of picking up or putting back the person in front of the shelf is determined.

(29)(29)

根据(24)中所述的方法，其特征在于，在所述步骤S2与所述步骤S4之间还包括步骤S3：According to the method described in (24), it is characterized in that a step S3 is also included between the step S2 and the step S4:

针对所述人的进入所述外部多边形的每只手，分别记录所述手部关键点附近的所述物品在所述外部多边形与所述内部多边形之间的轨迹。For each hand of the person entering the outer polygon, the trajectory of the item near the key point of the hand between the outer polygon and the inner polygon is recorded respectively.

(30)(30)

根据(29)中所述的方法，其特征在于，在所述步骤S5中，通过基于所述进入物品检测的结果、所述退出物品检测的结果和所述物品的所述轨迹，针对所述人的每只手分别使用一个有限状态机，来判定所述人在所述货架前的所述拾取或放回行为。According to the method described in (29), it is characterized in that, in the step S5, by using a finite state machine for each hand of the person based on the result of the detection of the entering item, the result of the detection of the exiting item and the trajectory of the item, to determine the behavior of picking up or putting back the person in front of the shelf.

(31)(31)

根据(30)中所述的方法，其特征在于，若在所述步骤S2中针对所述人的两只手的所述进入物品检测均确认检测出进入物品，则在所述步骤S5中，通过比较所述人的左手的所述手部关键点附近的所述进入物品的所述轨迹与右手的所述手部关键点附近的所述进入物品的所述轨迹，确定左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品是否为同一物品。According to the method described in (30), it is characterized in that, if in the step S2, the detection of the incoming object for the two hands of the person confirms that the incoming object is detected, then in the step S5, by comparing the trajectory of the incoming object near the key point of the left hand of the person with the trajectory of the incoming object near the key point of the right hand, it is determined whether the incoming object near the key point of the left hand and the incoming object near the key point of the right hand are the same object.

(32)(32)

根据(31)中所述的方法，其特征在于，所述人的左手的所述手部关键点附近的所述进入物品的所述轨迹与右手的所述手部关键点附近的所述进入物品的所述轨迹的所述比较是通过核查所述图像数据的各帧中的左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品之间的距离而进行的，并且The method described in (31), wherein the comparison of the trajectory of the incoming item near the hand key point of the person's left hand with the trajectory of the incoming item near the hand key point of the right hand is performed by checking the distance between the incoming item near the hand key point of the left hand and the incoming item near the hand key point of the right hand in each frame of the image data, and

仅在各帧中的所述距离均低于预定的距离阈值的情况下，确定左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品为同一物品。Only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the incoming item near the hand key point of the left hand and the incoming item near the hand key point of the right hand are the same item.

(33)(33)

根据(30)至(32)中任一项所述的方法，其特征在于，若在所述步骤S4中针对所述人的两只手的所述退出物品检测均确认检测出退出物品，则在所述步骤S5中，通过比较所述人的左手的所述手部关键点附近的所述退出物品的所述轨迹与右手的所述手部关键点附近的所述退出物品的所述轨迹，确定左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品是否为同一物品。According to the method described in any one of (30) to (32), it is characterized in that, if in the step S4, the detection of the withdrawn items for the two hands of the person confirms that the withdrawn items are detected, then in the step S5, by comparing the trajectory of the withdrawn items near the key points of the left hand of the person with the trajectory of the withdrawn items near the key points of the right hand, it is determined whether the withdrawn items near the key points of the left hand and the withdrawn items near the key points of the right hand are the same items.

(34)(34)

根据(33)中所述的方法，其特征在于，所述人的左手的所述手部关键点附近的所述退出物品的所述轨迹与右手的所述手部关键点附近的所述退出物品的所述轨迹的所述比较是通过核查所述图像数据的各帧中的左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品之间的距离而进行的，并且The method described in (33), wherein the comparison of the trajectory of the exiting item near the hand key point of the person's left hand with the trajectory of the exiting item near the hand key point of the right hand is performed by checking the distance between the exiting item near the hand key point of the left hand and the exiting item near the hand key point of the right hand in each frame of the image data, and

仅在各帧中的所述距离均低于预定的距离阈值的情况下，确定左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品为同一物品。Only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the exiting item near the hand key point of the left hand and the exiting item near the hand key point of the right hand are the same item.

(35)(35)

根据(24)至(30)中任一项所述的方法，其特征在于，在所述步骤S2中，只有在所述图像数据中的具有检测到的物品的帧数等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述进入物品检测确认检测出进入物品。According to the method described in any one of (24) to (30), it is characterized in that, in the step S2, only when the number of frames with detected items in the image data is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the incoming item detection confirms that an incoming item is detected.

(36)(36)

根据(24)至(30)中任一项所述的方法，其特征在于，在所述步骤S4中，只有在所述图像数据的具有检测到的物品的帧数等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述退出物品检测确认检测出退出物品。According to the method described in any one of (24) to (30), it is characterized in that, in the step S4, only when the number of frames of the image data with detected items is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with the detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the withdrawn item detection confirmation detects an exited item.

(37)(37)

根据(35)中所述的方法，其特征在于，所述图像数据包括来自多个相机或图像传感器的多组图像数据，并且The method described in (35), wherein the image data comprises multiple sets of image data from multiple cameras or image sensors, and

在所述步骤S2中，只有在来自所述多个相机或图像传感器的所述多组图像数据中的具有检测到的物品的帧数之和等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述进入物品检测确认检测出进入物品。In the step S2, only when the sum of the number of frames with detected items in the multiple sets of image data from the plurality of cameras or image sensors is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with the detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the incoming item detection confirms that an incoming item is detected.

(38)(38)

根据(28)或(30)所述的方法，其特征在于，所述有限状态机包括下列拾取或放回行为：无变化、拾取一个物品、拾取两个物品、放回一个物品和放回两个物品。The method according to (28) or (30), wherein the finite state machine includes the following pick or put behaviors: no change, pick one item, pick two items, put back one item, and put back two items.

(39)(39)

一种用于检测人在货架前的拾取或放回行为的检测设备，其特征在于，所述检测设备包括：A detection device for detecting the behavior of picking up or putting back a person in front of a shelf, characterized in that the detection device includes:

至少一台相机或图像传感器，获取图像数据；at least one camera or image sensor to acquire image data;

处理单元，所述处理单元根据如(24)至(38)中任一项所述的用于检测人在货架前的拾取或放回行为的检测方法对所述图像数据进行处理。A processing unit, the processing unit processes the image data according to the detection method described in any one of (24) to (38) for detecting the behavior of picking or putting back a person in front of a shelf.

(40)(40)

一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施如(24)至(38)中任一项所述的方法。A storage medium on which a computer-readable program is stored, and the program implements the method described in any one of (24) to (38) when executed on a processor.

尽管在上面已经参照附图说明了根据本发明的补偿方法、成像装置和存储介质，但是本发明不限于此，且本领域技术人员应理解，在不偏离本发明随附权利要求书限定的实质或范围的情况下，可以做出各种改变、组合、次组合以及变型。Although the compensation method, imaging device and storage medium according to the present invention have been described above with reference to the accompanying drawings, the present invention is not limited thereto, and those skilled in the art should understand that various changes, combinations, sub-combinations and modifications can be made without departing from the essence or scope defined by the appended claims of the present invention.

Claims

一种用于检测分布有相机的场所中的人的摔倒的检测方法，其特征在于，包括如下步骤：A detection method for detecting falls of people in a place where cameras are distributed, characterized in that it comprises the following steps:

步骤S1：对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；Step S1: Calibrate all the cameras so that each camera has an appropriate vertical vector in its field of view;

步骤S2：通过至少一部分所述相机获得包含所述场所中的人的图像数据，并且从所述图像数据中提取出所述人的骨架的关键点的数据；Step S2: Obtaining image data containing people in the place through at least a part of the cameras, and extracting data of key points of the person's skeleton from the image data;

步骤S3：通过使用所述关键点的数据估算所述人的个人垂直度；Step S3: estimating the personal verticality of the person by using the data of the key points;

步骤S4：针对所述至少一部分相机中的各者，基于相应的相机的所述视场中的所述垂直向量和所述个人垂直度计算所述人的垂直角度；Step S4: For each of said at least a portion of cameras, calculating said person's vertical angle based on said vertical vector in said field of view of the corresponding camera and said person's verticality;

步骤S5：通过对所述人在某一时刻在所述至少一部分相机的各所述视场中的所有所述垂直角度进行聚合，获得所述人的最终垂直角度；Step S5: Obtaining the final vertical angle of the person by aggregating all the vertical angles of the person in each of the fields of view of the at least a part of the cameras at a certain moment;

步骤S6：基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。Step S6: Determine whether the person has fallen based on the fall score obtained from the final vertical angle.
根据权利要求1所述的方法，其特征在于，在所述步骤S1中，所述标定时基于各所述相机的内部参数和外部参数进行的。The method according to claim 1, characterized in that, in the step S1, the calibration is performed based on internal parameters and external parameters of each of the cameras.
根据权利要求1所述的方法，其特征在于，在所述步骤S2中，所述人的所述关键点的数据的提取是在不进行所述人的面部识别的情况下进行的。The method according to claim 1, characterized in that, in the step S2, the extraction of the data of the key points of the person is performed without facial recognition of the person.
根据权利要求1所述的方法，其特征在于，在所述步骤S2中，提取所述人的17个所述关键点的数据。The method according to claim 1, characterized in that, in the step S2, the data of the 17 key points of the person are extracted.
根据权利要求1所述的方法，其特征在于，在所述步骤S3中，只在有效视场中估算所述人的所述个人垂直度，所述有效视场是指其中的置信因子c等于或大于预定的置信阈值的视场，所述置信因子c如下定义：The method according to claim 1, characterized in that, in the step S3, the personal verticality of the person is only estimated in the effective field of view, the effective field of view refers to the field of view in which the confidence factor c is equal to or greater than a predetermined confidence threshold, and the confidence factor c is defined as follows:

c＝N _v/N _t c=N _v /N _t

其中，N _v是在相应的视场中的所述人的能够观察到的所述关键点的数量，N _t是所述人的所述关键点的总数。 Wherein, N _v is the number of key points that can be observed by the person in the corresponding field of view, and N _t is the total number of key points of the person.
根据权利要求5所述的方法，其特征在于，在所述步骤S3中，通过计算在所述有效视场中的所述人的从代表足部的所述关键点到代表头部的所述关键点的身体向量而获得所述人的个人垂直度。The method according to claim 5, characterized in that, in the step S3, the personal verticality of the person is obtained by calculating the body vector of the person in the effective field of view from the key point representing the foot to the key point representing the head.
根据权利要求6所述的方法，其特征在于，基于所述人的能够观察到的代表所述头部的所述关键点的平均坐标和能够观察到的代表所述足部的所述关键点的平均坐标计算所述身体向量。The method according to claim 6, wherein the body vector is calculated based on the observed average coordinates of the key points representing the head and the observed average coordinates of the key points representing the feet of the person.
根据权利要求1所述的方法，其特征在于，在所述步骤S5中，所述人的所述最终垂直角度等于所述人在所述至少一部分相机的各所述视场中的所述垂直角度的最大值。The method according to claim 1, characterized in that, in the step S5, the final vertical angle of the person is equal to the maximum value of the vertical angles of the person in each of the fields of view of the at least some cameras.
根据权利要求1所述的方法，其特征在于，在所述步骤S6中，所述最终垂直角度α _v与所述摔倒分值s _f之间的关系满足如下等式： The method according to claim 1, characterized in that, in the step S6, the relationship between the final vertical angle _αv and the fall score _sf satisfies the following equation:

若α _v＜T _l，则s _f＝0； If α _v <T _l , then s _f =0;

若T _l＜α _v＜T _h，则
If T _l <α _v <T _h , then

若α _v＞T _h，则s _f＝1， If α _v >T _h , then s _f =1,

其中，T _l为设定的所述最终垂直角度的下限，T _h为设定的所述最终垂直角度的上限，并且 Wherein, _T1 is the lower limit of the final vertical angle set, _Th is the upper limit of the final vertical angle set, and

仅在所述摔倒分值s _f大于判定阈值的情况下，才判定并检测出所述人的摔倒。 Only when the fall score s _f is greater than the judgment threshold, the person's fall is judged and detected.
根据权利要求9所述的方法，其特征在于，所述最终垂直角度的所述下限为40度，所述最终垂直角度的所述上限为80度，并且所述判定阈值为0.5。The method according to claim 9, wherein the lower limit of the final vertical angle is 40 degrees, the upper limit of the final vertical angle is 80 degrees, and the decision threshold is 0.5.
一种用于检测场所内的人的摔倒的检测设备，其特征在于，包括：A detection device for detecting falls of people in a place, characterized in that it includes:

至少一个相机，所述至少一个相机分布在所述场所内并且具有不同的视场，所述至少一个相机能够获得包含所述场所中的人的图像数据；at least one camera, distributed within the location and having different fields of view, capable of obtaining image data comprising persons in the location;

处理单元，所述处理单元对所述多个相机获得的所述图像数据进行处理，以判定所述场所内的所述人是否摔倒，其中，所述处理单元包括：A processing unit, the processing unit processes the image data obtained by the plurality of cameras to determine whether the person in the place has fallen, wherein the processing unit includes:

标定模块，对所有所述相机进行标定，以使在各所述相机的视场中均具有恰当的垂直向量；A calibration module, which calibrates all the cameras so that there is an appropriate vertical vector in the field of view of each camera;

数据处理模块，对从所述多个相机传输来的所述图像数据进行处理，从而获得所述人在所述多个相机之中的至少一部分相机的所述视场中的个人垂直度；a data processing module, processing the image data transmitted from the plurality of cameras, so as to obtain the personal verticality of the person in the field of view of at least a part of the cameras among the plurality of cameras;

计算模块，基于从所述标定模块发送来的所述垂直向量和从所述数据处理模块发送来的所述个人垂直度计算所述人的最终垂直角度；以及a calculation module that calculates the final vertical angle of the person based on the vertical vector sent from the calibration module and the personal vertical degree sent from the data processing module; and

判定模块，基于由所述最终垂直角度求出的摔倒分值判定所述人是否摔倒。A judging module that judges whether the person has fallen based on the fall score obtained from the final vertical angle.
根据权利要求11所述的检测设备，其特征在于，所述数据处理模块包括：The detection device according to claim 11, wherein the data processing module comprises:

提取模块，从所述图像数据中提取出所述人的骨架的关键点的数据；和an extraction module, extracting data of key points of the human skeleton from the image data; and

估算模块，通过使用来自所述提取模块的所述关键点的数据估算所述人的所述个人垂直度。an estimating module for estimating the personal verticality of the person by using the keypoint data from the extraction module.
根据权利要求12所述的检测设备，其特征在于，所述计算模块包括：The detection device according to claim 12, wherein the calculation module comprises:

角度计算模块，针对所述至少一部分相机的各所述视场分别基于所述垂直向量和所述个人垂直度计算所述人的垂直角度；以及an angle calculation module for calculating the vertical angle of the person based on the vertical vector and the personal verticality for each of the fields of view of the at least a portion of the cameras; and

聚合模块，通过对所述人在某一时刻在所述至少一部分相机的各所述视场中的所有所述垂直角度进行聚合，获得所述人的所述最终垂直角度。The aggregation module is configured to obtain the final vertical angle of the person by aggregating all the vertical angles of the person in each of the fields of view of the at least a part of the cameras at a certain moment.
根据权利要求11至13中任一项所述的检测设备，其特征在于，所述标定模块基于所述多个相机的各者的内部参数和外部参数进行所述标定。The detection device according to any one of claims 11 to 13, wherein the calibration module performs the calibration based on internal parameters and external parameters of each of the plurality of cameras.
根据权利要求12所述的检测设备，其特征在于，所述提取模块仅将所述人的所述关键点的数据发送给所述估算模块。The detection device according to claim 12, wherein the extraction module only sends the data of the key points of the person to the estimation module.
根据权利要求12所述的检测设备，其特征在于，所述提取模块提取所述人的17个所述关键点的数据。The detection device according to claim 12, wherein the extraction module extracts data of 17 key points of the person.
根据权利要求12所述的检测设备，其特征在于，所述估算模块只估算所述多个相机的各所述视场之中的有效视场中的所述人的所述个人垂直度，所述有效视场是指其中的置信因子c等于或大于预定的置信阈值的视场，所述置信因子c如下定义：The detection device according to claim 12, wherein the estimation module only estimates the personal verticality of the person in the effective field of view of each of the plurality of cameras, the effective field of view refers to a field of view in which a confidence factor c is equal to or greater than a predetermined confidence threshold, and the confidence factor c is defined as follows:

c＝N _v/N _t c=N _v /N _t

其中，N _v是在相应的视场中的所述人的能够观察到的所述关键点的数量，N _t是所述人的所述关键点的总数。 Wherein, N _v is the number of key points that can be observed by the person in the corresponding field of view, and N _t is the total number of key points of the person.
根据权利要求17所述的检测设备，其特征在于，所述估算模块通过计算在所述有效视场中的所述人的从代表足部的所述关键点到代表头部的所述关键点的身体向量而获得所述人的个人垂直度。The detection device according to claim 17, wherein the estimation module obtains the personal verticality of the person by calculating a body vector of the person in the effective field of view from the key point representing the foot to the key point representing the head.
根据权利要求18所述的检测设备，其特征在于，所述估算模块基于所述人的能够观察到的代表所述头部的所述关键点的平均坐标和能够观察到的代表所述足部的所述关键点的平均坐标计算所述身体向量。The detection device according to claim 18, wherein the estimation module calculates the body vector based on the observed average coordinates of the key points representing the head and the observed average coordinates of the key points representing the feet of the person.
根据权利要求13所述的检测设备，其特征在于，所述聚合模块将所述人在所述至少一部分相机的各所述视场中的所述垂直角度的最大值设定为所述人的所述最终垂直角度。The detection device according to claim 13, wherein the aggregation module sets the maximum value of the vertical angles of the person in each of the fields of view of the at least a part of the cameras as the final vertical angle of the person.
根据权利要求11至13中任一项所述的检测设备，其特征在于，所述判定模块仅在摔倒分值s _f大于判定阈值的情况下，才判定并检测出所述人的摔倒， The detection device according to any one of claims 11 to 13, wherein the determination module determines and detects the fall of the person only when the fall score _sf is greater than the determination threshold,

其中，所述最终垂直角度α _v与所述摔倒分值sf之间的关系满足如下等式： Wherein, the relationship between the final vertical angle α _v and the fall score sf satisfies the following equation:

若α _v＜T _l，则s _f＝0； If α _v <T _l , then s _f =0;

若T _l＜α _v＜T _h，则
If T _l <α _v <T _h , then

若α _v＞T _h，则s _f＝1， If α _v >T _h , then s _f =1,

其中，T _l为设定的所述最终垂直角度的下限，T _h为设定的所述最终垂直角度的上限。 Wherein, T ₁ is the lower limit of the set final vertical angle, and _Th is the upper limit of the set final vertical angle.
根据权利要求21所述的检测设备，其特征在于，所述最终垂直角度的所述下限为40度，所述最终垂直角度的所述上限为80度，并且所述判定阈值为0.5。The detection device according to claim 21, wherein the lower limit of the final vertical angle is 40 degrees, the upper limit of the final vertical angle is 80 degrees, and the decision threshold is 0.5.
一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施如权利要求1-10中任一项所述的方法。A storage medium on which a computer-readable program is stored, the program implements the method according to any one of claims 1-10 when executed on a processor.
一种用于检测人在货架前的拾取或放回行为的检测方法，其特征在于，包括如下步骤：A detection method for detecting the behavior of picking up or putting back a person in front of a shelf, characterized in that it includes the following steps:

步骤S1：从图像数据中获取所述人的骨架的包含手部关键点的多个关键点的数据，并且从所述图像数据提取所述货架的外轮廓线，其中，所述外轮廓线包括所述货架的外部多边形以及与所述货架的真实外轮廓相对应的内部多边形，所述外部多边形在所述内部多边形的外部的接近区域中；Step S1: Acquiring the data of a plurality of key points of the human skeleton including hand key points from the image data, and extracting the outer contour line of the shelf from the image data, wherein the outer contour line includes an outer polygon of the shelf and an inner polygon corresponding to the real outer contour of the shelf, and the outer polygon is in an external proximity area of the inner polygon;

步骤S2：在所述人的至少一只手的所述手部关键点被检测到进入所述外部多边形的情况下，针对所述人的进入所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的进入物品检测；Step S2: In the case that the key points of the hand of at least one hand of the person are detected to enter the outer polygon, for each hand of the person that enters the outer polygon, perform an incoming object detection for detecting objects in the vicinity of the key points of the hand;

步骤S4：在所述人的至少一只手的所述手部关键点被检测到退出所述外部多边形的情况下，针对所述人的退出所述外部多边形的每只手分别执行用于检测所述手部关键点的附近的物品的退出物品检测；Step S4: In the case that the key point of the hand of at least one hand of the person is detected to exit the outer polygon, for each hand of the person that exits the outer polygon, perform exit object detection for detecting objects near the key point of the hand;

步骤S5：基于所述进入物品检测的结果和所述退出物品检测的结果来判定所述人在所述货架前的所述拾取或放回行为。Step S5: Determine the pick-up or put-back behavior of the person in front of the shelf based on the detection result of the incoming item and the detection result of the exiting item.
根据权利要求24所述的方法，其特征在于，在所述步骤S1中，从图像数据中获取所述人的所述多个关键点的数据包括：The method according to claim 24, wherein, in the step S1, obtaining the data of the plurality of key points of the person from the image data comprises:

通过至少一部相机拍摄包含所述货架和所述货架前的所述人的所述图像数据；capturing the image data including the shelf and the person in front of the shelf by at least one camera;

从所述图像数据中提取所述人的包含所述手部关键点的所述多个关键点的数据。Data of the plurality of key points including the hand key points of the person are extracted from the image data.
根据权利要求24所述的方法，其特征在于，在所述步骤S2之后，还包括判定所述人的所述手部关键点是否进入所述内部多边形的步骤，其中The method according to claim 24, characterized in that, after the step S2, further comprising the step of judging whether the key points of the hands of the person enter the internal polygon, wherein

在确定所述人的所述手部关键点进入所述内部多变形的情况下，执行所述步骤S2的后续步骤。In the case that it is determined that the key point of the hand of the person enters the internal polymorph, the subsequent steps of the step S2 are executed.
根据权利要求26所述的方法，其特征在于，The method of claim 26, wherein

只有在所述人的所述手部关键点在所述图像数据的至少3个连续帧中都处于所述内部多边形内，所述手部关键点才被确定为进入所述内部多边形。The key point of the hand of the person is determined to enter the internal polygon only if the key point of the hand of the person is within the internal polygon in at least 3 consecutive frames of the image data.
根据权利要求24所述的方法，其特征在于，在所述步骤S5中，通过基于所述进入物品检测的结果和所述退出物品检测的结果针对所述人的每只手分别使用一个有限状态机，来判定所述人在所述货架前的所述拾取或放回行为。The method according to claim 24, characterized in that, in the step S5, by using a finite state machine for each hand of the person based on the result of the detection of the incoming item and the result of the detection of the exiting item, the behavior of picking up or putting back the person in front of the shelf is determined.
根据权利要求24所述的方法，其特征在于，在所述步骤S2与所述步骤S4之间还包括步骤S3：The method according to claim 24, further comprising a step S3 between the step S2 and the step S4:

针对所述人的进入所述外部多边形的每只手，分别记录所述手部关键点附近的所述物品在所述外部多边形与所述内部多边形之间的轨迹。For each hand of the person entering the outer polygon, the trajectory of the item near the key point of the hand between the outer polygon and the inner polygon is recorded respectively.
根据权利要求29所述的方法，其特征在于，在所述步骤S5中，通过基于所述进入物品检测的结果、所述退出物品检测的结果和所述物品的所述轨迹，针对所述人的每只手分别使用一个有限状态机，来判定所述人在所述货架前的所述拾取或放回行为。The method according to claim 29, characterized in that, in the step S5, by using a finite state machine for each hand of the person based on the detection result of the incoming item, the result of the detection of the exiting item and the trajectory of the item, to determine the picking up or putting back behavior of the person in front of the shelf.
根据权利要求30所述的方法，其特征在于，若在所述步骤S2中针对所述人的两只手的所述进入物品检测均确认检测出进入物品，则在所述步骤S5中，通过比较所述人的左手的所述手部关键点附近的所述进入物品的所述轨迹与右手的所述手部关键点附近的所述进入物品的所述轨迹，确定左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品是否为同一物品。The method according to claim 30, characterized in that, if in the step S2, the detection of the incoming objects for the two hands of the person confirms the detection of incoming objects, then in the step S5, by comparing the trajectory of the incoming objects near the key points of the left hand of the person with the trajectory of the incoming objects near the key points of the right hand, it is determined whether the incoming objects near the key points of the left hand and the incoming objects near the key points of the right hand are the same object.
根据权利要求31所述的方法，其特征在于，所述人的左手的所述手部关键点附近的所述进入物品的所述轨迹与右手的所述手部关键点附近的所述进入物品的所述轨迹的所述比较是通过核查所述图像数据的各帧中的左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品之间的距离而进行的，并且The method of claim 31, wherein said comparison of said trajectory of said incoming item near said hand key point of said person's left hand with said trajectory of said incoming item near said hand key point of right hand is performed by checking the distance between said incoming item near said hand key point of left hand and said incoming item near said hand key point of right hand in each frame of said image data, and

仅在各帧中的所述距离均低于预定的距离阈值的情况下，确定左手的所述手部关键点附近的所述进入物品与右手的所述手部关键点附近的所述进入物品为同一物品。Only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the incoming item near the hand key point of the left hand and the incoming item near the hand key point of the right hand are the same item.
根据权利要求30至32中任一项所述的方法，其特征在于，若在所述步骤S4中针对所述人的两只手的所述退出物品检测均确认检测出退出物品，则在所述步骤S5中，通过比较所述人的左手的所述手部关键点附近的所述退出物品的所述轨迹与右手的所述手部关键点附近的所述退出物品的所述轨迹，确定左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品是否为同一物品。According to the method of claim 30 to 32, it is characterized by the test of the exit of the two -handed item to detect the exit of the two hands in the step S4 in the step S4, and in the step S5, the trajectory of the exit and the right hand of the exit objects near the key point of the hand of the person's left hand is cleared. The trajectory of the exit items near the key point, determine whether the exit item near the hand key point of the left hand is the same items near the right point of the right hand.
根据权利要求33所述的方法，其特征在于，所述人的左手的所述手部关键点附近的所述退出物品的所述轨迹与右手的所述手部关键点附近的所述退出物品的所述轨迹的所述比较是通过核查所述图像数据的各帧中的左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品之间的距离而进行的，并且The method of claim 33, wherein said comparison of said trajectory of said exit items near said hand key point of said person's left hand with said trajectory of said exit items near said hand key point of right hand is performed by checking the distance between said exit items near said hand key point of left hand and said exit items near said hand key point of right hand in each frame of said image data, and

仅在各帧中的所述距离均低于预定的距离阈值的情况下，确定左手的所述手部关键点附近的所述退出物品与右手的所述手部关键点附近的所述退出物品为同一物品。Only when the distances in each frame are lower than a predetermined distance threshold, it is determined that the exiting item near the hand key point of the left hand and the exiting item near the hand key point of the right hand are the same item.
根据权利要求24至30中任一项所述的方法，其特征在于，在所述步骤S2中，只有在所述图像数据中的具有检测到的物品的帧数等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述进入物品检测确认检测出进入物品。The method according to any one of claims 24 to 30, characterized in that, in the step S2, only when the number of frames with detected items in the image data is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the incoming item detection confirms that an incoming item is detected.
根据权利要求24至30中任一项所述的方法，其特征在于，在所述步骤S4中，只有在所述图像数据的具有检测到的物品的帧数等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述退出物品检测确认检测出退出物品。The method according to any one of claims 24 to 30, wherein, in the step S4, only when the number of frames of the image data with detected items is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the exit item detection confirmation detects an exit item.
根据权利要求35所述的方法，其特征在于，所述图像数据包括来自多个相机或图像传感器的多组图像数据，并且The method of claim 35, wherein the image data comprises multiple sets of image data from multiple cameras or image sensors, and

在所述步骤S2中，只有在来自所述多个相机或图像传感器的所述多组图像数据中的具有检测到的物品的帧数之和等于或大于预定的最小帧数并且具有检测到的所述物品的帧数与总帧数的比值等于或大于预定的最小比值的情况下，所述进入物品检测确认检测出进入物品。In the step S2, only when the sum of the number of frames with detected items in the multiple sets of image data from the plurality of cameras or image sensors is equal to or greater than a predetermined minimum number of frames and the ratio of the number of frames with the detected items to the total number of frames is equal to or greater than a predetermined minimum ratio, the incoming item detection confirms that an incoming item is detected.
根据权利要求28或30所述的方法，其特征在于，所述有限状态机包括下列拾取或放回行为：无变化、拾取一个物品、拾取两个物品、放回一个物品和放回两个物品。The method of claim 28 or 30, wherein the finite state machine includes the following pick or put behaviors: no change, pick one item, pick two items, put back one item, and put back two items.
一种用于检测人在货架前的拾取或放回行为的检测设备，其特征在于，所述检测设备包括：A detection device for detecting the behavior of picking up or putting back a person in front of a shelf, characterized in that the detection device includes:

至少一台相机或图像传感器，获取图像数据；at least one camera or image sensor to acquire image data;

处理单元，所述处理单元按照根据权利要求24至38中任一项所述的用于检测人在货架前的拾取或放回行为的检测方法对所述图像数据进行处理。A processing unit, the processing unit processes the image data according to the detection method for detecting the picking or putting back behavior of a person in front of a shelf according to any one of claims 24 to 38.
一种存储介质，其上存储有计算机可读程序，所述程序在处理器上执行时实施根据权利要求24至38中任一项所述的方法。A storage medium having stored thereon a computer readable program which, when executed on a processor, implements the method according to any one of claims 24 to 38.