CN112233770A

CN112233770A - Intelligent gymnasium management decision-making system based on visual perception

Info

Publication number: CN112233770A
Application number: CN202011105653.8A
Authority: CN
Inventors: 李宗山; 李鹏举; 王涛; 王冰冰
Original assignee: Zhengzhou Normal University
Current assignee: Zhengzhou Normal University
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2021-01-15
Anticipated expiration: 2040-10-15
Also published as: CN112233770B

Abstract

The invention provides an intelligent gymnasium management decision-making system based on visual perception. The device comprises an image acquisition module, a three-dimensional coordinate detection module, a domination behavior judgment module and a density grading module. The image acquisition module is used for constructing a BIM of the gymnasium, acquiring an original image of each subregion, projecting the original image of each subregion onto a BIM ground plane and splicing the original images; the three-dimensional coordinate detection module is used for detecting the three-dimensional coordinates of the human key points of each customer in the original image and outputting a human key point three-dimensional coordinate sequence and a key point thermodynamic diagram; the overlooking behavior judging module is used for detecting whether a customer in the fitness equipment area has an overlooking fitness equipment behavior; the density grading module is used for detecting the density grade of the customer corresponding to the positioning point superposed map of each fitness area, and finally, the BIM is visualized, so that a worker can timely observe information such as the place where the behavior of the fitness equipment is overlooked, the density grade of the customer in the fitness area and the like on the Web.

Description

Intelligent gymnasium management decision-making system based on visual perception

Technical Field

The application relates to the field of machine vision, in particular to a gymnasium intelligent management decision-making system based on visual perception.

Background

With the development of science and technology and the improvement of living standard, the body-building consciousness of people is greatly improved, and a plurality of people can choose to go to a gymnasium to exercise. However, some customers in the gymnasium occupy the gymnasium equipment, which causes the waste of gymnasium resources. In addition, some fitness machines in the gym are popular with customers, and the number of the machines needs to be increased.

The existing detection of the dominating fitness equipment mainly depends on the patrol of workers in a gymnasium, the cost of the method is high, the manual judgment is subjective, and the condition of misjudgment can occur.

Disclosure of Invention

Aiming at the problems, the invention provides an intelligent gymnasium management decision-making system based on visual perception. The device comprises an image acquisition module, a three-dimensional coordinate detection module, a domination behavior judgment module and a density grading module. The image acquisition module is used for constructing a BIM of the gymnasium, acquiring an original image of each subregion, projecting the original image of each subregion onto a BIM ground plane and splicing the original images; the three-dimensional coordinate detection module is used for detecting the three-dimensional coordinates of the human key points of each customer in the original image and outputting a human key point three-dimensional coordinate sequence and a key point thermodynamic diagram; the overlooking behavior judging module is used for detecting whether a customer in the fitness equipment area has an overlooking fitness equipment behavior; the density grading module is used for detecting the density grade of the customer corresponding to the positioning point superposed map of each fitness area, and finally, the BIM is visualized, so that a worker can timely observe information such as the place where the behavior of the fitness equipment is overlooked, the density grade of the customer in the fitness area and the like on the Web.

An intelligent gymnasium management decision making system based on visual perception, comprising:

the device comprises an image acquisition module, a three-dimensional coordinate detection module, a domination behavior judgment module and a density grading module.

The image acquisition module is used for establishing BIM of the gymnasium, dividing the gymnasium into a plurality of sub-regions and setting a camera to acquire original images of the sub-regions; and projecting the original images of the sub-regions onto a BIM ground plane, splicing the images, and outputting a gymnasium panorama.

The three-dimensional coordinate detection module comprises a key point detection network, a limb detection network, a key point matching module and a TCN (transmission control network) and is used for acquiring the three-dimensional coordinates of the key points of the human body of each customer and outputting a three-dimensional coordinate sequence of the key points of the human body and a thermodynamic diagram of the key points of the human body.

The overlock behavior judgment module is used for regularly detecting a human body key point three-dimensional coordinate sequence of a customer in a fitness equipment area in a detection period to generate a fitness action detection sequence; taking a group of human body key point three-dimensional coordinate sequences in the body-building action detection sequence as a comparison sequence, analyzing the similarity between the human body key point three-dimensional coordinate sequences in the body-building action detection sequence and the comparison sequence, counting the time interval t of the similar key point three-dimensional coordinate sequences, calculating the mean value t' of t, and judging whether a customer is in a suspected body-building state; if the user is suspected to be in the fitness state, calculating the variance s of t in the detection period, and further judging whether the user is in the fitness state; and if the customer is in the non-fitness state, continuously counting the time t' when the customer is in the non-fitness state, and judging whether the behavior of dominating the fitness equipment exists or not.

The density grading module is used for screening positioning points in a human body key point thermodynamic diagram; obtaining a human body positioning point thermodynamic diagram, projecting the positioning point thermodynamic diagrams of all sub-regions onto a BIM ground plane and splicing, outputting a positioning point panoramic image and superposing the positioning point panoramic image based on a forgetting coefficient to obtain a positioning point superposed image; setting fitness areas and outputting positioning point superposed graphs of the fitness areas; and inputting the positioning point superposed graph of each body-building area into a customer density grading network, and detecting the customer density grade of each body-building area.

The three-dimensional coordinate detection module includes: the key point detection network is used for detecting key points and positioning points of the human body in the original image and outputting a thermodynamic diagram of the key points of the human body; the limb detection network is used for detecting limbs in the original image and outputting a limb affinity vector field; the key point matching module is used for combining the thermodynamic diagram of the human key points and the limb affinity vector field, sequentially matching two key points at two ends of each limb in a maximum weight matching mode to obtain the optimal matching of the human key points, and outputting a plurality of groups of two-dimensional coordinate sequences of the human key points; and the TCN is used for predicting the three-dimensional coordinates of the key points of the human body and outputting a three-dimensional coordinate sequence of the key points of the human body.

The training method of the key point detection network comprises the following steps: taking a plurality of original images of the subareas shot by a camera as a data set; labeling the data set, labeling a head, a left shoulder, a right shoulder, a left elbow, a right elbow, a left hand center, a right hand center, a spine center, a neck center, a left crotch center, a right crotch center, a left knee, a right knee, a left foot center, a right foot center point and positioning points, and generating labeled data; the positioning point is the middle point of a line segment connected with the center point of the left foot and the center point of the right foot; training is performed using a mean square error loss function.

The training method of the limb detection network comprises the following steps: taking a plurality of original images of the subareas shot by a camera as a data set; labeling the data set, labeling a unit vector pointing to the direction from a key point at one end of a limb to a key point at the other end of the limb on a pixel contained in the limb of the person, and generating labeled data; training is performed using a mean square error loss function.

The training method of the TCN network comprises the following steps: taking a plurality of groups of human body key point two-dimensional coordinate sequences as a data set; marking the three-dimensional coordinates of each human body key point to generate marking data; training is performed using a mean square error loss function.

The duty ratio behavior judgment module comprises: taking the three-dimensional coordinate sequence of the first group of human key points in the body-building action detection sequence as a comparison sequence, calculating Euclidean distances between human key points a in the three-dimensional coordinate sequence of all human key points in the body-building action detection sequence and corresponding human key points a' in the comparison sequence, and adding the Euclidean distances of various human key points to obtain a total Euclidean distance L₂Is prepared by mixing L₂Arranging the distance sequences according to the time sequence to obtain a Euclidean distance sequence; carrying out binarization processing on the Euclidean distance sequence, and setting a Euclidean distance threshold value m₁When L is present₂＜m₁Then, corresponding L on the sequence is added₂Value is set to 1 when L₂≥m₁Then, corresponding L on the sequence is added₂The value is set to 0, resulting in a binary sequence.

Computing neighbors in a binary sequence1, and calculating the mean value t' of t in a detection period; setting an empirical mean threshold m₂When t' < m₂When the detection period is short, the customer is judged to be in a non-fitness state, and when t' is more than or equal to m₂And then, judging that the customer is in a suspected fitness state in the detection period.

Calculating the variance s of t in the detection period of the suspected fitness state; setting an empirical mean square error threshold m₃When s < m₃Judging that the customer is in a fitness state in the detection period; when s is more than or equal to m₃And then, judging that the customer is in a non-fitness state in the detection period.

The step of judging whether the overlook fitness equipment behavior exists comprises the following steps: detecting the time t' of the customer in the non-fitness state, and setting an experience time threshold m₄When t' is not less than m₄When the time is longer than the preset time, the equipment occupation behavior is judged to exist, and when t' is less than m₄And judging that the equipment occupation behavior does not exist.

The training method of the customer density grading network comprises the following steps: using a plurality of positioning point superposed graphs of the fitness areas as a data set; manually marking the density grade of the customer corresponding to the positioning point superposition map of the fitness area to generate marking data; training is performed using a mean square error loss function.

Compared with the prior art, the invention has the following beneficial effects:

(1) by detecting the total Euclidean distance L between the three-dimensional coordinate sequence of the key points of the guest body and the comparison sequence within a certain time₂And carrying out binarization processing to obtain a binary sequence. And judging whether the customer is in a suspected fitness state according to the mean value of the t, determining whether the customer is in the fitness state according to the variance of the t, and judging whether the customer does the fitness exercise regardless of the fitness exercise which is done by the customer without presetting a three-dimensional coordinate sequence aiming at the three-dimensional postures of different fitness exercises. The universality is strong, and the system judgment speed is higher.

(2) The density grading module is used for detecting the density of customers in the fitness area where various fitness equipment are located, so that the popularity of the various fitness equipment can be judged, management decision can be made, and resources of the fitness room can be utilized more fully.

(3) The BIM is visualized, so that a worker can visually observe the conditions in the gymnasium on the Web, and the behavior of dominating the gymnasium equipment is prevented in time.

Drawings

Fig. 1 is a system configuration diagram.

Fig. 2 is a limb diagram.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The first embodiment is as follows:

the invention mainly aims to realize the detection of the behavior of the overlooking fitness equipment and the density of customers and manage a gymnasium according to the detection.

In order to realize the content of the invention, the invention designs an intelligent gymnasium management decision-making system based on visual perception. The system architecture is shown in fig. 1.

And the image acquisition module is used for acquiring the original image of each subregion and the gymnasium panorama.

First, a Building Information model (Building Information Modeling) is established as a Building BIM of the gymnasium. The system is a datamation tool applied to engineering design, construction and management, and shares and transmits the data and informatization models of the gymnasium in the whole life cycle process of gymnasium planning, operation and maintenance. It should be noted that the method for building BIM is various and well known, and the method for building BIM is not limited in the present invention.

The gymnasium is divided into a plurality of sub-areas, each area is provided with a camera with a fixed pose, images of the sub-areas are shot, and original images of the sub-areas are output after image noise is reduced through filtering processing. It should be noted that the shooting ranges of the cameras in the adjacent sub-areas should be overlapped to a certain extent, so that subsequent image splicing is facilitated.

The method comprises the steps of firstly projecting each sub-region original image onto a BIM ground plane through projection transformation, wherein the projection transformation describes the corresponding relation between a coordinate system of each sub-region original image and a BIM ground plane coordinate system, and the projection transformation of a corresponding transformation matrix is a homography matrix.

Because the pose of each camera is fixed, the internal reference and the external reference of each camera are also fixed, and the image for converting the original image into the forward overlooking angle can be estimated according to the internal reference and the external reference of each camera, namely a first homography matrix H used by the original image on the BIM ground plane₁Let l (x, y) be the pixel in the original image, l₁(x₁,y₁) For the corresponding pixel of point l in the original image on the BIM ground plane, the original image is transformed into the original image on the BIM ground plane₁The transformation relation of the coordinates is as follows:

wherein h is₁To h₈Are parameters of the first homography matrix.

After projecting the original images of the sub-regions onto the BIM ground plane, a second homography matrix H is used₂They are spliced together. Taking the splicing of the original images on the ground plane of the two adjacent sub-regions BIM as an example, the steps are as follows:

the method for extracting the feature points in the original image on the BIM ground plane of the two adjacent sub-regions is various and well-known, the method for extracting the feature points is not limited, and an implementer can select a proper feature point extraction method according to actual conditions. In this embodiment, a SIFT corner point detection algorithm is adopted to extract a plurality of feature points.

The method for matching the feature points extracted from the original images of the two adjacent subregions on the BIM ground plane by using the second algorithm is various and well-known, the method for matching the feature points is not limited, and an implementer can select a proper feature point matching method according to actual conditions. This embodiment performs feature point matching using NCC (normalized cross correlation).

And (4) carrying out the estimation of the homography matrix, and estimating the homography matrix by using a RANSAC method. The method comprises the following specific steps: and randomly selecting four pairs of feature points from the feature point pairs, adding the four pairs of feature points into the inner point set, calculating a homography matrix, projecting other feature point pairs by using the homography matrix, and adding the feature point pairs into the inner point set when the projection error of a certain pair of feature points is less than a certain threshold value. Repeating the steps, finally counting the number of the characteristic point pairs in the internal point set, and selecting the homography matrix corresponding to the internal point set containing the most characteristic point pairs as a second homography matrix H₂。

And calculating second homography matrixes for splicing the original images of all the two adjacent subregions on the BIM ground plane, and splicing the subregions according to the second homography matrixes. And then carrying out image fusion, wherein the method for fusing the images is various and well known, and the algorithm used for image fusion is not limited by the invention. In this embodiment, a feathering method is used, and overlapping pixels are fused by using a weighted average color value. And obtaining a panoramic view of the gymnasium on the BIM ground plane.

And the three-dimensional coordinate detection module is used for acquiring the three-dimensional coordinates of the human key points of each customer. The three-dimensional coordinate detection module comprises a key point detection network, a limb detection network, a key point matching module and a TCN network.

The invention adopts a key point detection network to detect key points of a human body. The invention can detect the position and density of the customer besides the posture of the customer. In order to make the error between the detected position and the real position of the customer as small as possible, the selected key point for detecting the track should be as close to the ground as possible, and the middle point of the connecting line of the key point of the center of the left foot and the key point of the center of the right foot is selected as a positioning point for detecting the position of the customer.

The training step of the key point detection network comprises the following steps: taking a plurality of subarea original images as a data set; labeling the data set, labeling a head, a left shoulder, a right shoulder, a left elbow, a right elbow, a left hand center, a right hand center, a spine center, a neck center, a left crotch center, a right crotch center, a left knee, a right knee, a left foot center, a right foot center and positioning points, and generating labeled data by using 16 key points; training is performed using a mean square error loss function.

And inputting the original images of the sub-regions into a trained key point detection network, detecting key points of the human body, and outputting a thermodynamic diagram of the key points of the human body.

The invention adopts a limb detection network to detect the limb affinity vector field of the customers so as to distinguish key points belonging to each customer. According to experience, 16 key points can be connected into 16 limbs, so that the limb detection network needs to detect the affinity vector fields of the 16 limbs.

The training steps of the limb detection network are as follows: a plurality of subarea original images are taken as a data set; labeling the data set, labeling a unit vector pointing to the direction of the other end of the limb from one end of the limb on pixels contained in 16 limbs of the human body, and generating labeled data; training is performed using a mean square error loss function.

And inputting the original images of the sub-areas into a trained limb detection network to detect the limbs of the human, wherein the limb images are shown in figure 2, and 16 limbs are provided in total. A limb affinity vector field is generated.

And the key point matching module is used for matching the human key points corresponding to the customers by combining the limb affinity vector field and the human key point thermodynamic diagram. And for two key points at two ends of one type of limb, detecting the key point with the maximum connection probability according to the affinity vector field of the type of limb to obtain a plurality of key point pairs. And matching the key points at the two ends of each limb, combining the key point pairs containing the same key point into a group, and finally outputting a plurality of groups of human body key points, wherein each group of human body key points contains 16 human body key points. And acquiring two-dimensional coordinates of the human key points through softargmax, arranging the two-dimensional coordinates of the human key points in each group according to a certain sequence, and outputting a two-dimensional coordinate sequence of the human key points.

To obtain the posture of the human body more accurately, three-dimensional coordinates of key points of the human body need to be obtained. The invention obtains the three-dimensional coordinates of the key points of the human body through the TCN network.

The training step of the TCN network comprises the following steps: taking a plurality of groups of human body two-dimensional key point coordinate sequences as a data set; marking the three-dimensional coordinates of each human body key point to generate marking data; training is performed using a mean square error loss function.

Inputting the two-dimensional coordinate sequence of the human body key points into the trained TCN network, predicting the three-dimensional coordinates of the human body key points, and outputting the three-dimensional coordinate sequence of the human body key points.

And the overlooking behavior judging module is used for judging whether the customer has the behavior of overlooking the fitness equipment.

The invention needs to detect the behavior of the gymnastic equipment in the area with the gymnastic equipment, so that a plurality of gymnastic equipment areas are set in the original image, each gymnastic equipment area only has one gymnastic equipment and at most one gymnastic customer. And generating a fitness equipment area mask, multiplying the fitness equipment area mask and the corresponding human body key point thermodynamic diagrams point to obtain the human body key point thermodynamic diagrams of the fitness equipment area, and further judging whether the customer has the behavior of dominating the fitness equipment.

Taking the detection of the behavior of a customer occupying the fitness equipment in a fitness equipment area as an example, the method comprises the following steps:

it should be noted that, the detection period T is set differently according to the exercise performed by the user using the exercise apparatus, and in this embodiment, T is set to 30 seconds.

Collecting a group of human body key point three-dimensional coordinate sequences every other second within 30 seconds and arranging the sequences according to the time sequence to generate a body-building action detection sequence; and calculating the Euclidean distance between the human key point a in the human key point three-dimensional coordinate sequence in the body-building action detection sequence and the corresponding human key point a' in the comparison sequence by taking the first group of human key point three-dimensional coordinate sequence in the body-building action detection sequence as the comparison sequence.

And analyzing the similarity between the three-dimensional coordinate sequence of the human body key point in the body-building action detection sequence and the comparison sequence. Taking the three-dimensional coordinate sequence of the second group of human key points in the body-building action detection sequence as an example: calculating Euclidean distances between 16 human body key points a in the second group and corresponding a' in the comparison sequence, and adding the calculated 16 Euclidean distances to obtain a total Euclidean distance L₂. Sequentially calculating L corresponding to the three-dimensional coordinate sequences of all the human body key points₂Is prepared by mixing L₂And arranging the distance sequences according to the time sequence to obtain a Euclidean distance sequence.

Carrying out binarization processing on the Euclidean distance sequence, and setting a Euclidean distance threshold value m₁When L is present₂＜m₁When, the L is described₂The three-dimensional coordinate sequence of the human key points corresponding to the values is similar to the comparison sequence, and the corresponding L on the Euclidean distance sequence is compared₂Value is set to 1 when L₂≥m₁When, the L is described₂The difference between the three-dimensional coordinate sequence of the key points of the human body corresponding to the value and the comparison sequence is larger, and the L corresponding to the Euclidean distance sequence is compared₂The value is set to 0, and the shape is [1, 0,0,0,0,1,0,0,0,0,1, … … 1,0 [ ]]A binary sequence. In the present embodiment, m is set₁Is 30. For a key point, the Euclidean distance is within 2 and is the same gesture. In the invention, 16 key points are adopted, and the threshold value m is₁＝16×2＝32。

Counting the time interval t between adjacent 1 in the binary sequence, calculating the mean value t' of t, and setting an empirical mean value threshold m₂When t' < m₂When the customer posture is in the detection period, the posture of the customer is basically unchanged and basically in a static state, namely a non-fitness state, and when t' is not less than m₂And then, the posture of the customer in the detection period is changed, and the customer is judged to be in a suspected fitness state in the detection period. In the present embodiment, m is set₂Is 1 second.

For the T with the suspected fitness state of the customer, calculating the mean square error s of the T in the detection period T, and setting an empirical mean square error threshold m₃When s < m₃When the customer does regular repetitive movement, namely body-building movement, the customer is judged to be in the body-building state in the detection period; when s is more than or equal to m₃And when the customer does irregular movement, judging that the customer is in a non-fitness state in the detection period. In the present embodiment, m is set₃Is 1.

If the customer is in the non-fitness state, continuously detecting the three-dimensional coordinate sequence of the customer in the fitness equipment area, counting the time t 'of the customer in the non-fitness state, and setting an experience time threshold m' of₄When t' is not less than m₄Judging that the dominance of the fitness equipment exists; when t' < m₄And judging that the behavior of dominating the fitness equipment does not exist. In the present embodiment, m is set₄300 seconds, i.e. 5 minutes.

It should be noted that, for different fitness equipments, the corresponding cycles of the fitness exercises are different, and an appropriate m should be set according to the actual situation₁、m₂、m₃、m₄。

And the density grading module is used for detecting the density grade of the density superposition map of the customers in the fitness area.

And screening the positioning points in the human body key point thermodynamic diagram to obtain the human body positioning point thermodynamic diagram. Because the human body positioning point thermodynamic diagrams are obtained through original images, the positioning point thermodynamic diagrams of all sub-region positioning points can be projected onto the BIM ground plane by using the first homography matrix, the positioning point thermodynamic diagrams on the BIM ground plane are spliced by using the second homography matrix, and finally, the positioning point panoramic diagram is obtained.

And superposing the positioning point panoramic image based on the forgetting coefficient to obtain a positioning point superposed image. The superposition method comprises the following steps: x ═ α X + (1- α) X'.

Wherein, X is the panorama of the locating point of the current frame, X' is the locating point superimposed map of a frame before the current frame, X is the superimposed calculation result of the current frame, namely the superimposed map of the locating point of the current frame, (1-alpha) is a forgetting coefficient, and alpha is 0.05 in the invention.

The invention needs to detect the customer density of the area where various fitness equipments are located and the surrounding area to judge the popularity of various fitness equipments and make a decision to increase or decrease the fitness equipments. The number of customers waiting for exercise equipment at the side of the exercise equipment also reflects the popularity of the exercise equipment. Therefore, the invention is provided with a plurality of fitness areas, the range of the fitness areas is larger than that of the fitness equipment, and each fitness area comprises a plurality of similar fitness equipment and areas with a certain range around the fitness equipment. And generating a plurality of fitness area masks, and multiplying the fitness area masks and the positioning point superposed graph in a point-to-point manner to obtain the positioning point superposed graph of each fitness area. And detecting the customer density grade of the positioning point overlay map in each fitness area through a customer density grading network. The training steps of the customer density hierarchical network are as follows: using a plurality of positioning point superposition graphs as a data set; manually marking the customer density grade of each body-building area in the positioning point superposition graph to generate marking data; training is performed using a mean square error loss function.

The present invention does not limit the conditions for setting the customer density levels and the number of customer density levels, and the implementer can adjust the setting of the customer density levels according to the actual situation. This embodiment sets a customer density of 5 levels, with higher levels indicating higher customer densities.

And inputting the positioning point superposed graph of each body-building area into the trained customer density grading network, and outputting the customer density grade of each body-building area. Recording the density grade of the customers in each fitness area once every half hour, and counting the average value of the density grades of the customers in each fitness area every day to judge the popularity of each fitness area, wherein the fitness area with higher average customer density grade has higher popularity of fitness equipment in the area.

BIM is visualized through a WebGIS technology, and workers can visually obtain information in the BIM on Web. When the behavior of dominating the fitness equipment is detected, the position information of the customer of the dominating the fitness equipment can be obtained on the positioning point panoramic image, and the staff can timely drive to stop dominating behavior. The density grade of the customers in each fitness area can be visually seen, decision can be made, and fitness equipment is added for the areas with high density grade of the customers.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting thereof, as any number of variations, equivalents, or improvements which come within the spirit and scope of the invention are desired. Are intended to be included within the scope of the present invention.

Claims

1. An intelligent gymnasium management decision making system based on visual perception, comprising:

the device comprises an image acquisition module, a three-dimensional coordinate detection module, a domination behavior judgment module and a density grading module;

the image acquisition module is used for establishing BIM of the gymnasium, dividing the gymnasium into a plurality of sub-regions and setting a camera to acquire original images of the sub-regions; projecting the original images of the subregions on a BIM ground plane, splicing the images, and outputting a gymnasium panorama;

the three-dimensional coordinate detection module comprises a key point detection network, a limb detection network, a key point matching module and a TCN (transmission control network) and is used for acquiring the three-dimensional coordinates of the key points of the human body of each customer and outputting a three-dimensional coordinate sequence of the key points of the human body and a thermodynamic diagram of the key points of the human body;

the overlock behavior judgment module is used for regularly detecting a human body key point three-dimensional coordinate sequence of a customer in a fitness equipment area in a detection period to generate a fitness action detection sequence; taking a group of human body key point three-dimensional coordinate sequences in the body-building action detection sequence as a comparison sequence, analyzing the similarity between the human body key point three-dimensional coordinate sequences in the body-building action detection sequence and the comparison sequence, counting the time interval t of the similar key point three-dimensional coordinate sequences, calculating the mean value t' of t, and judging whether a customer is in a suspected body-building state; if the user is suspected to be in the fitness state, calculating the variance s of t in the detection period, and further judging whether the user is in the fitness state; if the customer is in the non-fitness state, continuously counting the time t' when the customer is in the non-fitness state, and judging whether the behavior of dominating the fitness equipment exists or not;

2. The system of claim 1, wherein the three-dimensional coordinate detection module comprises:

the key point detection network is used for detecting key points and positioning points of the human body in the original image and outputting a thermodynamic diagram of the key points of the human body;

the limb detection network is used for detecting limbs in the original image and outputting a limb affinity vector field;

the key point matching module is used for combining the thermodynamic diagram of the human key points and the limb affinity vector field, sequentially matching two key points at two ends of each limb in a maximum weight matching mode to obtain the optimal matching of the human key points, and outputting a plurality of groups of two-dimensional coordinate sequences of the human key points;

and the TCN is used for predicting the three-dimensional coordinates of the key points of the human body and outputting a three-dimensional coordinate sequence of the key points of the human body.

3. The system of claim 1, wherein the method of training the keypoint detection network comprises:

taking a plurality of original images of the subareas shot by a camera as a data set;

labeling the data set, labeling a head, a left shoulder, a right shoulder, a left elbow, a right elbow, a left hand center, a right hand center, a spine center, a neck center, a left crotch center, a right crotch center, a left knee, a right knee, a left foot center, a right foot center point and positioning points, and generating labeled data;

the positioning point is the middle point of a line segment connected with the center point of the left foot and the center point of the right foot;

training is performed using a mean square error loss function.

4. The system of claim 1, wherein the training method of the limb detection network comprises:

labeling the data set, labeling a unit vector pointing to the direction from a key point at one end of a limb to a key point at the other end of the limb on a pixel contained in the limb of the person, and generating labeled data;

training is performed using a mean square error loss function.

5. The system of claim 1, wherein the method of training the TCN network comprises:

taking a plurality of groups of human body key point two-dimensional coordinate sequences as a data set;

marking the three-dimensional coordinates of each human body key point to generate marking data;

training is performed using a mean square error loss function.

6. The system of claim 1, wherein the dominance behavior determination module comprises:

taking the three-dimensional coordinate sequence of the first group of human key points in the body-building action detection sequence as a comparison sequence, calculating Euclidean distances between human key points a in the three-dimensional coordinate sequence of all human key points in the body-building action detection sequence and corresponding human key points a' in the comparison sequence, and adding the Euclidean distances of various human key points to obtain a total Euclidean distance L₂Is prepared by mixing L₂Arranging according to the time sequence to obtain a Euclidean distance sequence;

carrying out binarization processing on the Euclidean distance sequence, and setting an empirical Euclidean distance threshold value m₁When L is present₂＜m₁Then, corresponding L on the sequence is added₂Value is set to 1 when L₂≥m₁Then, corresponding L on the sequence is added₂Setting the value to be 0 to obtain a binary sequence;

calculating the time interval t between adjacent 1 in the binary sequence, and calculating the mean value t' of t in a detection period;

setting an empirical mean threshold m₂When t' < m₂When the detection period is short, the customer is judged to be in a non-fitness state, and when t' is more than or equal to m₂Judging that the customer is in a suspected fitness state in the detection period;

setting an empirical mean square error threshold m₃When s < m₃Judging that the customer is in a fitness state in the detection period; when s is more than or equal to m₃And then, judging that the customer is in a non-fitness state in the detection period.

7. The system of claim 1, wherein the step of determining whether an overarching fitness equipment behavior exists comprises:

detecting the time t' of the customer in the non-fitness state, and setting an experience time threshold m₄When t' is not less than m₄When the time is longer than the preset time, the equipment occupation behavior is judged to exist, and when t' is less than m₄And judging that the equipment occupation behavior does not exist.

8. The system of claim 1, wherein the method of training the customer density ranking network comprises:

using a plurality of positioning point superposed graphs of the fitness areas as a data set;

manually marking the density grade of the customer corresponding to the positioning point superposition map of the fitness area to generate marking data;

training is performed using a mean square error loss function.