CN115841651A

CN115841651A - Constructor intelligent monitoring system based on computer vision and deep learning

Info

Publication number: CN115841651A
Application number: CN202211602196.2A
Authority: CN
Inventors: 陈祺荣; 陈科宇; 杭世杰; 林俊; 汤序霖; 陈钰开; 李晨慧; 李卫勇; 朱东烽; 杨哲; 杨健明; 聂勤文; 张华健; 邬学文; 汪爽; 练月荣
Original assignee: Guangzhou Jishi Construction Group Co ltd; Guangdong Yuncheng Architectural Technology Co ltd; Hainan University
Current assignee: Guangzhou Jishi Construction Group Co ltd; Guangdong Yuncheng Architectural Technology Co ltd; Hainan University
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-03-24
Anticipated expiration: 2042-12-13
Also published as: CN115841651B

Abstract

The invention discloses a constructor intelligent monitoring system based on computer vision and deep learning, which comprises an intelligent statistical module and an intelligent monitoring module, wherein the intelligent statistical module comprises a video acquisition module, a deep learning algorithm analysis module and an analysis result display module, and the intelligent monitoring module comprises a safety helmet identification module, a work clothes identification module, a state identification module and an alarm reminding module. The invention not only can realize the intelligent statistics of the number of the personnel and vehicles entering and exiting the construction site and the total number of the personnel and vehicles in the local, but also can respectively identify the dressing and the state of the constructors passing through the entrance of the construction site by utilizing the image identification technology and remind the constructors which do not meet the standard, thereby effectively avoiding the occurrence of safety accidents caused by the fact that the constructors do not wear safety helmets, do not wear working clothes or have poor mental state and further effectively improving the construction safety performance of the construction site.

Description

Constructor intelligent monitoring system based on computer vision and deep learning

Technical Field

The invention relates to the technical field of intelligent statistics, in particular to a constructor intelligent monitoring system based on computer vision and deep learning.

Background

At present, in the building construction process, the construction site is required to implement closed management in principle, and an entrance guard system for entering and exiting the construction site is set up. The access control system that present construction site used is mostly by the three-roller gate, the pendulum floodgate, the wing floodgate, passageway systems such as stainless steel fence door combine the system that constitutes with response card reader-writer, all managers and constructor can freely pass in and out after entering system with people's face or second generation ID card in advance, personnel can't enter the job site without authorization without registering, install surveillance camera head at job site access & exit simultaneously, store the video recording in a period in local server so that look up.

However, the current access control system has the following defects in practical application:

1) Personnel at the building site entrance and exit frequently come in and go out, the personnel flow out of order, and the field management is difficult; 2) The attendance data are recorded by paper, so that the actual working hours are difficult to accurately count, and the data are easy to be distorted; 3) The construction site personnel number is large, the sub-packaging units are large, the work types and the posts are large, and project managers cannot clearly and timely master the construction operator number, the work type personnel number and the professional sub-packaging unit number on site, so that the improvement of the site management efficiency is not facilitated; 4) When the construction labor staff wage dispute occurs, the supervision department is difficult to obtain evidence and the weight is difficult to maintain; 5) The statistics of the entrance and the exit of vehicles and personnel in the vehicles are difficult.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides an intelligent monitoring system for constructors based on computer vision and deep learning, so as to overcome the technical problems in the prior related art.

Therefore, the invention adopts the following specific technical scheme:

the system comprises an intelligent statistical module and an intelligent monitoring module;

the intelligent statistical module is used for detecting constructors passing through a building site entrance by using a trained deep learning algorithm, tracking a target by using a tracking algorithm, counting and counting when the target collides with a detection line, and displaying in real time by using a client;

the intelligent monitoring module is used for identifying the dressing and the state of constructors passing through a construction site inlet by utilizing a preset image identification technology and reminding the constructors not meeting the standard.

Furthermore, the intelligent statistical module comprises a video acquisition module, a deep learning algorithm analysis module and an analysis result display module;

the video acquisition module is used for acquiring real-time monitoring pictures according to monitoring cameras erected at all entrances and exits of a construction site, inputting the real-time monitoring pictures into the POE switch, and converting the real-time monitoring pictures to obtain initial video materials;

the deep learning algorithm analysis module is used for outputting video analysis pictures and personnel statistical data corresponding to the initial video material in real time through a trained deep learning algorithm and outputting the analysis pictures and statistical results to the client;

the analysis result display module is used for displaying video analysis pictures and statistical data of personnel and vehicles at the client and is also used for switching the analysis pictures and the statistical data acquired by the cameras at different entrances and exits by managers according to requirements.

Furthermore, the deep learning algorithm analysis module comprises a detection area setting module, a video analysis module, a data statistics module and a data transmission module;

the detection area setting module is used for controlling the actual detection range in each picture by adjusting the corresponding detection range according to the picture layouts of different entrances and exits so as to realize the setting of the detection area;

the video analysis module is used for analyzing the images in the initial video material through the coordinates of the line collision detection points to realize the analysis and statistics of personnel and vehicles in the detection area;

the data statistics module is used for judging and counting the entering and exiting of the target according to whether the target frame has line collision or not and the color of a line collision area of the target frame;

and the data transmission module is used for outputting the analysis picture and the statistical result to the client through the RTSP server and the HTTP push service.

Further, the setting of the detection area comprises the following steps:

the positions of all the end points are sequentially determined according to the shape of the required area and are expressed by an array, each element of the array is a binary array representing the end point of the graph, and the adjustment and the setting of the detection area are realized by the adjustment of the array.

Further, the step of analyzing the image in the initial video material through the coordinates of the line collision detection points to realize the analysis and statistics of the personnel and the vehicles in the detection area comprises the following steps:

acquiring position parameters and categories of all objects in a current image;

judging whether the offset between the geometric center of an object in the new frame of image and the geometric center of an object in the previous frame of image is within a preset offset, if so, judging that the two objects are the same object and have the same ID, and if not, judging that a new object exists in the new frame of image and endowing the new ID to the object;

the image of the object range is represented by a rectangle and the ID of the object is known, and the parameter of a certain object range is set as x ₁ ，y ₁ ，x ₂ ，y ₂ Wherein x is ₁ <x ₂ ，y ₁ <y ₂ The coordinates of the impact detection point are (check _ point _ x, check _ point _ y), check _ point _ x = x ₁ ，check_point_y＝int[y ₁ +(y ₂ -y ₁ )*0.6]Int refers to rounding the operation result;

and judging whether the object line collision detection point is located in the judgment area, if so, performing statistical operation on the object, and if not, not performing statistical operation.

Further, the step of judging and counting the entering and exiting of the target by whether the target frame collides with the line and the color of the line colliding area of the target frame comprises the following steps:

all pictures captured by the current camera are specified as detection areas, blue and yellow strip-shaped areas are set as judgment areas, and the judgment areas are recorded as in when a target is collided with a yellow line in an uplink mode and recorded as out when the target is collided with a blue line in a downlink mode;

acquiring real-time monitoring pictures acquired by monitoring cameras at each entrance and exit of a construction site, and performing size reduction processing on the acquired real-time monitoring pictures;

judging whether a target appears in the detection area of the reduced real-time monitoring picture, if not, regarding the monitoring picture as an invalid picture and neglecting to clean, and if so, framing the target and outputting;

and detecting whether the line of the target frame is collided and the color of the line collision area of the target frame, and judging and counting the entering and exiting of the target according to the color of the line collision area of the target frame.

Furthermore, the intelligent monitoring module comprises a safety helmet identification module, a work clothes identification module, a state identification module and an alarm reminding module;

the safety helmet identification module is used for identifying constructors who pass through a building site entrance and do not wear a safety helmet by using an image identification technology;

the work clothes identification module is used for identifying constructors who pass through the entrance of the construction site and do not wear work clothes by utilizing an image identification technology;

the state identification module is used for identifying the mental state of a constructor passing through a construction site entrance by utilizing an image identification technology;

the alarm reminding module is used for reminding constructors who do not wear safety helmets, do not wear working clothes and do not conform to the standards in mental states.

Further, the state identification module comprises a constructor face image acquisition module, a fusion feature identification module and a state identification result output module;

the constructor face image acquisition module is used for acquiring the face images of constructors in the real-time monitoring pictures of all the entrances and exits of the construction site;

the fusion feature recognition module is used for recognizing facial images of constructors by using a facial state recognition algorithm based on independent feature fusion so as to realize recognition of mental states of the constructors entering a construction site;

and the state recognition result output module is used for outputting mental state information of constructors in real-time monitoring pictures of all entrances and exits of a construction site.

Furthermore, the fusion feature identification module comprises a global state feature extraction module, a local state feature extraction module, a state feature fusion module and a state feature analysis identification module;

the global state feature extraction module is used for extracting global state features of the face image through discrete cosine transform, and removing the correlation of the global state features by utilizing an independent component analysis technology to obtain independent global state features;

the local state feature extraction module is used for extracting the features of the eye region and the mouth region in the image sequence, and respectively carrying out Gabort wavelet transform and feature fusion on the eye region and the mouth region to obtain the dynamic multi-scale features of the two local regions as the local state features of the face image;

the state feature fusion module is used for fusing the independent global state features and the local state features, and adding local detail information into the global features to obtain face state fusion features;

the state feature analysis and identification module is used for analyzing and identifying the obtained face state fusion features through a preset classifier to obtain mental state information of the constructors, wherein the mental state of the constructors comprises a waking state, a slight fatigue state, a moderate fatigue state and a severe fatigue state.

Further, the preset classifier selects partial features through an AdaBoost algorithm, removes redundant features and trains to obtain the features, and the calculation formula of the preset classifier is as follows:

where T represents the final number of algorithm cycles, a _t Representation classifier h _t (X) the selected weight is determined by learning AdaBoost algorithm, and X = (X) ₁ ，X ₂ ，…，X _T ) Representing dynamic Gabor features of the selected sequence of face images.

The invention has the beneficial effects that:

1) The method and the system have the advantages that the trained deep learning algorithm is used for detecting constructors passing through a building site entrance, the tracking algorithm is used for tracking targets, counting and counting are carried out when the targets collide with the detection line, so that intelligent statistics on the entering and exiting quantity of constructors and vehicles in the construction site and the total number of the local constructors and the vehicles can be realized, in addition, the wearing and the state of the constructors passing through the building site entrance can be respectively identified by using an image identification technology, and the constructors not meeting the standard can be reminded, so that the occurrence of safety accident phenomena caused by the fact that the constructors do not wear safety helmets, wear working clothes or have poor mental state can be effectively avoided, and the construction safety performance of the construction site can be effectively improved.

2) The invention not only can ensure that the recognition accuracy can reach more than 95% under various illumination conditions, but also can ensure that the time difference between an analysis picture and an original picture does not exceed 1 second while ensuring the accuracy of statistical data, and has the advantages of high recognition accuracy and high recognition speed.

3) Based on the algorithm used by the invention and the plasticity contained in the UI, different functions such as face recognition, work type recognition and the like can be added according to different requirements, and the expansibility is high; in addition, the hardware equipment needed by the invention is low in price, so that the cost is saved, and the field management efficiency is improved to obtain more economic benefits.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a block diagram of a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the invention;

FIG. 2 is a block diagram of a deep learning algorithm analysis module in the intelligent monitoring system for constructors based on computer vision and deep learning according to an embodiment of the invention;

FIG. 3 is a block diagram of a state identification module in the intelligent monitoring system for constructors based on computer vision and deep learning according to an embodiment of the invention;

FIG. 4 is a block diagram of a fusion feature identification module in the intelligent monitoring system for constructors based on computer vision and deep learning according to an embodiment of the invention.

In the figure:

1. an intelligent statistical module; 2. an intelligent monitoring module; 11. a video acquisition module; 12. a deep learning algorithm analysis module; 121. a detection area setting module; 122. a video analysis module; 123. a data statistics module; 124. a data transmission module; 13. an analysis result display module; 21. a helmet identification module; 22. a work clothes identification module; 23. a state identification module; 231. a constructor face image acquisition module; 232. a fusion feature identification module; 2321. a global state feature extraction module; 2322. a local state feature extraction module; 2323. a state feature fusion module; 2324. a state feature analysis and identification module; 233. a state recognition result output module; 24. and an alarm reminding module.

Detailed Description

For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable one skilled in the art to understand the embodiments and advantages of the disclosure for reference and without scale, wherein elements are not shown in the drawings and like reference numerals are used to refer to like elements generally.

According to the embodiment of the invention, the intelligent monitoring system for the constructors based on computer vision and deep learning is provided.

Referring to the drawings and the detailed description, the invention is further described, as shown in fig. 1-4, according to an embodiment of the invention, the system for intelligent monitoring of constructors based on computer vision and deep learning comprises an intelligent statistic module 1 and an intelligent monitoring module 2;

the intelligent statistical module 1 is used for detecting constructors passing through a building site entrance by using a trained deep learning algorithm, tracking targets by using a tracking algorithm, counting and counting when the targets collide with a detection line, and displaying in real time by using a client;

the intelligent statistical module 1 comprises a video acquisition module 11, a deep learning algorithm analysis module 12 and an analysis result display module 13;

the video acquisition module 11 is used for acquiring a real-time monitoring picture according to a monitoring camera erected at each entrance and exit of a construction site, inputting the real-time monitoring picture into a POE (power over Ethernet) switch, and converting to obtain an initial video material;

the deep learning algorithm analysis module 12 is configured to output a video analysis picture and personnel statistical data corresponding to the initial video material in real time through a trained deep learning algorithm, and output the analysis picture and the statistical result to the client;

the main feature extraction network adopted by the algorithm of the embodiment is superior to a cross-stage partial network, the video memory consumption is reduced, the learning capability of the convolutional neural network is enhanced, the network is further widened, and the algorithm precision is ensured. In a training strategy of a trunk feature extraction network, in order to guarantee the identification precision in a complex environment, a plurality of images are spliced to simulate an object in the complex environment. The algorithm also carries out decision making again on the modified image, improves the weak environment of a decision boundary and improves the robustness of the system. The algorithm identifies the object by the following three steps:

first, the image is sampled 5 times using a convolution layer with a step size of 2 and a convolution kernel of 3 × 3, thereby extracting the trunk features of the image and generating five feature layers of different sizes.

And secondly, performing multi-scale receptive field fusion on the smaller-size characteristic layer, performing maximum pooling, and performing parameter aggregation on the processed small-size characteristic diagram and the larger-size characteristic diagram through tensor splicing. The algorithm is still applicable under different size detection.

And finally, dividing the characteristic layer into grids with different sizes, so as to detect various targets with larger size difference and avoid target loss. Generating multi-specification prior frames on grids with different sizes, returning the probability that the object is the preset object type by each prior frame, taking the prior frame containing the maximum confidence coefficient in the prior frames as the actual position of the object, and returning the confidence coefficients of all the preset object types and the actual position of the object.

Specifically, the confidence interval is set manually, and the confidence is given after the object is identified by the algorithm. In the test, the lowest confidence value is 0.3, the confidence interval is (0.3, 1), and in the interval, the type of the object to be detected is single, the natural environment change is small, and the recognition accuracy is high. The optimal confidence interval is greatly influenced by environmental factors, so that the optimal confidence interval suitable for different environments does not exist, and the optimal confidence interval under the corresponding condition can be obtained only by debugging the confidence interval for multiple times under the actual environment.

Specifically, the deep learning algorithm analysis module 12 includes a detection region setting module 121, a video analysis module 122, a data statistics module 123, and a data transmission module 124;

the detection area setting module 121 is configured to control an actual detection area in each picture by adjusting a corresponding detection area according to picture layouts of different entrances and exits, so as to set a detection area (the detection area includes a whole picture area, the determination area only includes a blue-yellow bar area, and if a person appears in the picture but does not pass through the blue-yellow bar area, the person is captured but does not count statistical data);

in this embodiment, it is found in the field test process that the video picture detection ranges in different areas need to be adjusted. Generally, all images acquired by the camera are identified by default as the detection area, however, in practical applications, it is not necessary to completely participate in the identification of the entire monitoring image, for example, a main person access passage of a project site tested in a development process, and the passage image mainly includes a gate passage and a right-side guard duty room. The algorithm can identify and detect the personnel in the whole picture under the default condition, if some personnel go back and forth in the duty room, the personnel can be judged as entering or leaving by the algorithm, and the misjudgment can cause huge errors on the system statistical result. Therefore, in the process of landing the project on site, the system needs to adjust the corresponding background algorithm detection range according to the picture layouts of different entrances and exits. The actual detection range in each picture is controlled by setting the detection area in the algorithm code. Any point on the image can be determined through a binary array, if a detection area needs to be set, the positions of all the end points can be sequentially determined according to the shape of the required area, the detection area is expressed through an array, each element of the array is the binary array representing the end point of the graph, and therefore the purpose of adjusting the geometric area is achieved through the array. All people entering the channel picture can be identified, but only people passing through the detection area in the picture can be counted into the statistical data, and people moving outside the detection area can not influence the statistical data.

The video analysis module 122 is used for analyzing the images in the initial video material through the coordinates of the line collision detection points, so as to realize the analysis and statistics of personnel and vehicles in the detection area;

algorithm in the present embodimentAfter initialization is completed, the object type and range in the image can be identified and given, the algorithm uses a rectangular box to determine the position of the image, and in a plane, the determination of a rectangular position only needs to know one diagonal coordinate, such as the vertex coordinate value (x) of the upper right corner ₁ ，y ₁ ) And the coordinate value (x) of the vertex of the lower left corner ₂ ，y ₂ ) The position of the rectangle in the image can be determined by the four parameters, so that the algorithm only needs to return the four parameters and the object type. The four parameters returned by the algorithm are used for determining the line-strike detection point and drawing a rectangle. After the position parameters and the types of all objects in the current image are obtained, if the offset between the geometric center of an object in the new frame of image and the geometric center of an object in the previous frame of image is within the set offset, the two objects are regarded as the same object and have the same ID. If a new object exists in the new frame, a new ID is given to the object. After the above processing, an image in which the object range is represented by a rectangle and the object ID is known is obtained. Now let the parameter of a certain object range be x ₁ ，y ₁ ，x ₂ ，y ₂ Wherein x is ₁ <x ₂ ，y ₁ <y ₂ The coordinates of the impact detection point are (check _ point _ x, check _ point _ y), check _ point _ x = x ₁ ，check_point_y＝int[y ₁ +(y ₂ -y ₁ )*0.6]Int refers to rounding the result of the operation. And only when a certain object line collision detection point is positioned in the judgment area, the algorithm carries out statistical operation, otherwise, the statistical operation is not carried out.

The data statistics module 123 is configured to determine and count entry and exit of a target according to whether the target frame collides with a line and a color of a line collision area of the target frame;

in the embodiment, a more intuitive and accurate judgment mode is adopted when the personnel counting function is realized. Namely, a judgment area is preset in the algorithm, and the blue and yellow strip-shaped areas are preset as the judgment areas. And when the target is in the upper line and is in the upper line, the target is in the lower line and is in the upper line, and when the target is in the lower line, the target is in the upper line and is in the lower line, and therefore the number of the pedestrians entering and exiting is calculated. When the algorithm runs, the video is processed, the size is reduced firstly, whether a target appears in the picture is detected, if no target appears in the picture, the picture is ignored and cleaned by the algorithm as an invalid picture, when the target exists in the picture, the algorithm frames the target and outputs the frame, and finally, whether the target frame is collided with a line and whether the area collided by the target frame is blue or yellow is detected as a judgment basis for the target to enter and exit.

The data transmission module 124 is configured to output the analysis picture and the statistical result to the client through the RTSP server and the HTTP push service.

After the video material is identified and analyzed by the deep learning algorithm, the analysis picture needs to be output to the client, in order to minimize the delay between the Real-Time monitoring picture and the analysis picture displayed in the client, the transmission mode used in this embodiment is an RTSP server, and an RTSP (Real Time Streaming Protocol) Real-Time Streaming Protocol (RTSP) Real-Time Streaming Protocol). Firstly, writing a video stream interface capable of supporting an RTSP protocol in an algorithm, and then acquiring an RTSP stream according to information such as an IP address and a port number of a camera, a user name and a password of equipment. Furthermore, the analysis picture can be transmitted to the client by inputting the stream address into an interface preset by an algorithm.

Similar to outputting a real-time analysis picture, data obtained through statistics in a deep learning algorithm also needs to be transmitted to the client in real time, and the embodiment adopts HTTP to achieve data transmission. HTTP (Hyper Text Transfer Protocol), a hypertext Transfer Protocol, has the advantages of being simple, flexible and easy to expand. In the embodiment, firstly, an HTTP program is written in a server, that is, a deep learning algorithm, then, a TCP connection from a client to the server is created, and a data request message is written in the client, so that an HTTP request is realized when the client sends the message. After receiving the request, the server organizes the response according to the request content and returns the response content to the client by multiplexing the TCP connection, and the content of the response is analyzed and read in the client and finally displayed on a user interface of the client in real time.

The analysis result display module 13 is used for displaying video analysis pictures and statistical data of people and vehicles at the client, and is also used for switching the analysis pictures and the statistical data acquired by the cameras at different entrances and exits according to requirements by managers.

In order to enable the real-time video analysis picture to be more visually presented to the field management personnel and facilitate the field management personnel to obtain analysis pictures of different entrances and exits according to requirements, the embodiment adds RTSP and HTTP plug-flow services into the algorithm through the combination of the deep learning algorithm and the illusion-4 engine, displays the obtained video analysis picture and the statistical result of the personnel on the client in real time, and adds other functions on the client.

The functions of the client mainly comprise:

(1) The personnel and vehicle counting function comprises personnel and vehicle entering and exiting quantity and the total number of the personnel and the vehicles in the local area;

(2) And the visual angle switching function is realized, and when a plurality of cameras are erected at different entrances and exits, the client can be used for switching among different cameras and acquiring the data of the personnel and vehicles recorded by the cameras in real time.

The intelligent monitoring module 2 is used for identifying the dressing and the state of constructors passing through a construction site inlet by utilizing a preset image identification technology and reminding the constructors not meeting the standard.

The intelligent monitoring module 2 comprises a safety helmet identification module 21, a work clothes identification module 22, a state identification module 23 and an alarm reminding module 24;

the safety helmet identification module 21 is used for identifying constructors who do not wear safety helmets and pass through a building site entrance by using an image identification technology;

the work clothes recognition module 22 is used for recognizing constructors who pass through the entrance of the construction site and do not wear work clothes by using image recognition technology;

the state identification module 23 is used for identifying the mental state of the constructor passing through the construction site entrance by using an image identification technology;

specifically, the state identification module 23 includes a constructor face image acquisition module 231, a fusion feature identification module 232 and a state identification result output module 233;

the constructor face image acquisition module 231 is used for acquiring face images of constructors in real-time monitoring pictures of all entrances and exits of a construction site;

the fusion feature recognition module 232 is configured to recognize facial images of constructors by using a facial state recognition algorithm based on independent feature fusion, so as to recognize mental states of constructors entering a construction site;

the fusion feature identification module 232 includes a global state feature extraction module 2321, a local state feature extraction module 2322, a state feature fusion module 2323 and a state feature analysis identification module 2324;

the global state feature extraction module 2321 is configured to extract a global state feature of the face image through Discrete Cosine Transform (DCT), and remove a correlation of the global state feature by using an Independent Component Analysis (ICA) technique to obtain an independent global state feature;

discrete cosine transform is a commonly used image data compression method, and for an MxN digital image (x, y), the 2D discrete cosine transform is defined as:

wherein u =0,1,2,.., M-1; v =0,1,2,. N-1;

the discrete cosine transform is characterized in that: when the frequency domain variation factors u, v are large, the value of the DCT coefficient C (u, v) is small; the C (u, v) with a larger value is mainly distributed in the upper left corner region with a smaller u and v, which is also a concentrated region of useful information.

Independent principal component analysis is an effective method for solving the blind signal separation problem, mutually independent source signals can be successfully separated from mixed signals through a transformation matrix by ICA, and the property is used for reducing the dimensionality of a fatigue feature vector in the aspect of fatigue feature extraction and reducing high-order correlation among components in the feature vector.

In addition to the second-order statistical correlation which can be removed by the PCA method, the high-order statistical correlation quantity also occupies a large component in the facial expression image, so that the characteristic with higher discrimination capability can be obtained by removing the high-order correlation quantity in the expression image by the ICA method. The basic idea of the ICA algorithm is to represent a series of random variables with a set of basis functions, while assuming statistical independence or as much independence as possible between the components.

In this embodiment, ICA is adopted to successfully separate mutually independent features from the global DCT features of the face image sequence through the transformation matrix, which not only reduces the dimension of the feature vector, but also reduces the high-order correlation among the components in the feature vector to obtain independent global features with higher discrimination capability.

The local state feature extraction module 2322 is configured to extract features of an eye region and a mouth region in the image sequence, and perform Gabort wavelet transform and feature fusion on the eye region and the mouth region respectively to obtain dynamic multi-scale features of the two local regions as local state features of the face image;

the expression characteristics of the fatigue of the human face have different scales, the integral larger scale and the fine smaller scale, so that all important characteristics of the fatigue expression of the human face are difficult to extract by analyzing in a single scale, however, the characteristics in the scale and the direction containing more fatigue information are extracted by utilizing multi-scale decomposition to analyze, the visual information of the face can be effectively analyzed, and for a face video image sequence, the fatigue information needs to be analyzed by a multi-scale method because the scales of different facial movements in the fatigue process are different.

The Gabor wavelet is a powerful tool for multi-scale analysis, compared with DCT, gabor transform obtains optimal localization in time domain and frequency domain, the transform coefficient of the Gabor transform describes the gray feature of the area near a given position on the image, and has the advantage of insensitivity to illumination, position and the like, and is suitable for representing the local feature of the human face.

The state feature fusion module 2323 is configured to fuse the independent global state feature and the local state feature, and add local detail information to the global feature to obtain a face state fusion feature;

state characteristic analysis recognition module 2324 is used for fusing the characteristic through the facial state that predetermined classifier obtained and analyzes and discerns, obtains constructor's mental state information, wherein, constructor's mental state includes that waking state (eyes normally strive for, eyeball state is active, the head is upright, attention is concentrated, the eyebrow is flat and spread), light fatigue state (eyeball activity degree descends, the bad port of vision, the flagging trend appears in the eyebrow, the forehead is tight crumpled, the head rotation frequency increases, the spirit is shakiness), moderate fatigue state (the eyes appear closed, beat and lack, the phenomenon such as nod, the eyebrow seriously droops, facial muscle warp seriously) and heavy fatigue state (the closed trend of eyes aggravates, and the continuous eye-closing phenomenon appears, attention is lost).

The method comprises the following steps that the preset classifier selects partial features through an AdaBoost algorithm, removes redundant features and trains to obtain the features, and the preset classifier has the calculation formula as follows:

where T represents the final number of algorithm cycles, a _t Representation classifier h _t (X) the selected weight is determined by learning of AdaBoost algorithm, and X = (X) ₁ ，X ₂ ，…，X _T ) Watch (A)And displaying the dynamic Gabor characteristics of the selected face image sequence.

The state recognition result output module 233 is configured to output mental state information of the constructors in the real-time monitoring pictures of the entrances and exits of the construction site.

The alarm reminding module 24 is used for reminding constructors who do not wear safety helmets, do not wear working clothes and do not meet the standards in mental states.

In summary, according to the technical scheme of the invention, the trained deep learning algorithm is used for detecting the constructors passing through the construction site entrance, the tracking algorithm is used for tracking the target, and counting are carried out when the target collides with the detection line, so that intelligent statistics of the entering and exiting numbers of the constructors and the vehicles in the construction site and the total numbers of the local constructors and the vehicles can be realized.

Meanwhile, the invention not only can ensure that the recognition accuracy rate can reach more than 95% under various illumination conditions, but also can ensure that the statistical data is accurate and the time difference between an analysis picture and an original picture does not exceed 1 second, and has the advantages of high recognition accuracy rate and high recognition speed.

Meanwhile, based on the algorithm used by the invention and the plasticity contained in the UI, different functions such as face recognition, work type recognition and the like can be added according to different requirements, and the expansibility is high; in addition, the hardware equipment needed by the invention is low in price, and the field management efficiency is improved while the cost is saved, so that more economic benefits are obtained.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. The intelligent monitoring system for constructors based on computer vision and deep learning is characterized by comprising an intelligent statistical module (1) and an intelligent monitoring module (2);

the intelligent statistical module (1) is used for detecting constructors passing through a building site entrance by using a trained deep learning algorithm, tracking targets by using a tracking algorithm, counting and counting when the targets collide with a detection line, and displaying in real time by using a client;

the intelligent monitoring module (2) is used for identifying the dressing and the state of constructors passing through a construction site inlet by utilizing a preset image identification technology and reminding the constructors not meeting the standard.

2. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 1, characterized in that the intelligent statistical module (1) comprises a video acquisition module (11), a deep learning algorithm analysis module (12) and an analysis result display module (13);

the video acquisition module (11) is used for acquiring real-time monitoring pictures according to monitoring cameras erected at all entrances and exits of a construction site, inputting the real-time monitoring pictures into the POE switch, and converting the real-time monitoring pictures to obtain initial video materials;

the deep learning algorithm analysis module (12) is used for outputting a video analysis picture and personnel statistical data corresponding to the initial video material in real time through a trained deep learning algorithm and outputting the analysis picture and a statistical result to a client;

the analysis result display module (13) is used for displaying video analysis pictures and statistical data of personnel and vehicles at the client and also used for switching the analysis pictures and the statistical data acquired by the cameras at different entrances and exits by managers according to requirements.

3. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 2, characterized in that the deep learning algorithm analysis module (12) comprises a detection region setting module (121), a video analysis module (122), a data statistics module (123) and a data transmission module (124);

the detection area setting module (121) is used for controlling the actual detection range in each picture by adjusting the corresponding detection range according to the picture layout of different entrances and exits, so as to realize the setting of the detection area;

the video analysis module (122) is used for analyzing the images in the initial video material through the coordinates of the line collision detection points to realize the analysis and statistics of personnel and vehicles in the detection area;

the data statistics module (123) is used for judging and counting the entering and exiting of the target according to whether the target frame collides with the line and the color of the line collision area of the target frame;

and the data transmission module (124) is used for outputting the analysis picture and the statistical result to the client through the RTSP server and the HTTP push service.

4. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 3, wherein the setting of the detection area comprises the following steps:

5. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 3, wherein the analysis and statistics of personnel and vehicles in the detection area by analyzing the images in the initial video material through the coordinates of the line-impact detection points comprises the following steps:

acquiring position parameters and categories of all objects in a current image;

using rectangle to represent object range and knowing the image of object ID, setting the parameter of certain object range as x ₁ ，y ₁ ，x ₂ ，y ₂ Wherein x is ₁ <x ₂ ，y ₁ <y ₂ The coordinates of the impact detection point are (check _ point _ x, check _ point _ y), check _ point _ x = x ₁ ，check_point_y＝int[y ₁ +(y ₂ -y ₁ )*0.6]Int refers to rounding the operation result;

6. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 5, wherein the determination and statistics of the entering and exiting of the target through whether the target frame is in collision with the line and the color of the collision area of the target frame comprises the following steps:

all pictures captured by the current camera are specified as detection areas, blue and yellow strip-shaped areas are set as judgment areas, and the detection areas are recorded as in when a target upwards hits a yellow line and recorded as out when the target downwards hits a blue line;

7. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 1, characterized in that the intelligent monitoring module (2) comprises a safety helmet identification module (21), a work clothes identification module (22), a state identification module (23) and an alarm reminding module (24);

the safety helmet identification module (21) is used for identifying constructors who do not wear safety helmets and pass through a building site entrance by using an image identification technology;

the work clothes identification module (22) is used for identifying constructors who do not wear work clothes and pass through a construction site entrance by utilizing an image identification technology;

the state recognition module (23) is used for recognizing the mental state of the constructor passing through the construction site entrance by using an image recognition technology;

the alarm reminding module (24) is used for reminding constructors who do not wear safety helmets, do not wear working clothes and do not meet the standards in mental state.

8. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 7, wherein the state recognition module (23) comprises a constructor facial image acquisition module (231), a fusion feature recognition module (232) and a state recognition result output module (233);

the constructor face image acquisition module (231) is used for acquiring the face images of constructors in the real-time monitoring pictures of all entrances and exits of a construction site;

the fusion feature identification module (232) is used for identifying the facial images of the constructors by using a facial state identification algorithm based on independent feature fusion so as to realize identification of the mental states of the constructors entering a construction site;

and the state recognition result output module (233) is used for outputting mental state information of constructors in real-time monitoring pictures of all entrances and exits of a construction site.

9. The intelligent monitoring system for constructors based on computer vision and deep learning of claim 8, wherein the fusion feature identification module (232) comprises a global state feature extraction module (2321), a local state feature extraction module (2322), a state feature fusion module (2323) and a state feature analysis identification module (2324);

the global state feature extraction module (2321) is used for extracting global state features of the facial image through discrete cosine transform, and removing the correlation of the global state features by using an independent component analysis technology to obtain independent global state features;

the local state feature extraction module (2322) is used for extracting the features of the eye region and the mouth region in the image sequence, and respectively carrying out Gabort wavelet transform and feature fusion on the eye region and the mouth region to obtain the dynamic multi-scale features of the two local regions as the local state features of the face image;

the state feature fusion module (2323) is used for fusing the independent global state feature and the local state feature, and adding local detail information into the global feature to obtain a face state fusion feature;

the state feature analysis and identification module (2324) is used for analyzing and identifying the obtained face state fusion features through a preset classifier to obtain mental state information of the constructors, wherein the mental state of the constructors comprises a waking state, a slight fatigue state, a moderate fatigue state and a severe fatigue state.

10. The intelligent monitoring system of constructors based on computer vision and deep learning of claim 9, wherein the preset classifier selects partial features through an AdaBoost algorithm, removes redundant features and trains to obtain the system, and the preset classifier has a calculation formula as follows:

where T represents the final number of algorithm cycles, a _t Representation classifier h _t (X) the selected weights are calculated by AdaBoost algorithmicallyLet us determine, X = (X) ₁ ，X ₂ ，…，X _T ) Representing dynamic Gabor features of the selected sequence of face images.