CN114267082A - Bridge side falling behavior identification method based on deep understanding - Google Patents

Bridge side falling behavior identification method based on deep understanding Download PDF

Info

Publication number
CN114267082A
CN114267082A CN202111088471.9A CN202111088471A CN114267082A CN 114267082 A CN114267082 A CN 114267082A CN 202111088471 A CN202111088471 A CN 202111088471A CN 114267082 A CN114267082 A CN 114267082A
Authority
CN
China
Prior art keywords
falling
bridge
person
channel
personnel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111088471.9A
Other languages
Chinese (zh)
Other versions
CN114267082B (en
Inventor
朱家祥
成孝刚
张博
汪兆斌
高波
倪杰
蔡聪聪
徐风雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Municipal Public Security Bureau
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Municipal Public Security Bureau
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Municipal Public Security Bureau, Nanjing University of Posts and Telecommunications filed Critical Nanjing Municipal Public Security Bureau
Priority to CN202111088471.9A priority Critical patent/CN114267082B/en
Publication of CN114267082A publication Critical patent/CN114267082A/en
Application granted granted Critical
Publication of CN114267082B publication Critical patent/CN114267082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a bridge side falling behavior recognition method based on deep understanding, which utilizes a camera which always monitors a bridge to capture a signal that a person falls from the bridge side and send an alarm signal, so that the falling person can be rescued in time. A computer vision algorithm is embedded into a camera on a river bridge, wherein the system comprises a personnel climbing railing behavior monitoring module, a personnel falling monitoring module, a falling water bloom monitoring module, a personnel floating detection module and a rescue region prediction module. The system judges whether a person climbs the cross-river bridge rail and falls from the side of the bridge through cross verification of the first three modules, if so, the system can give an alarm and call for rescue in time to prevent missing the optimal rescue time; the latter two modules are used for predicting the position of the person falling into the water and informing a rescue team, so that convenience is created for rescue work.

Description

Bridge side falling behavior identification method based on deep understanding
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a bridge side falling behavior identification method based on deep understanding.
Background
In real life, the accident that a pedestrian falls off a bridge is often heard. But it is difficult to find and cure the first time because of its randomness and chance of occurrence. At present, China mainly depends on a mode of combining manual inspection and passerby alarming, and the efficiency is low. Therefore, the bridge side falling behavior recognition system which is monitored for 7 days multiplied by 24 hours and has higher accuracy rate is developed, and has great social significance.
Target detection algorithms are broadly divided into two categories: "two stage" and "one stage" processes. The two-stage method refers to two stages of detection and identification, and is recommended based on the region. Representative algorithms are RCNN, Fast-RCNN, and the like. And the 'one-stage' is based on regression, and directly regresses the class probability and the position coordinate value of the object. Representative algorithms include the YOLO series and SSD series. The detection speed of the one-stage method is higher than that of the two-stage method, and the method is suitable for the characteristic that the method needs to be timely, so that the one-stage method is utilized.
An Attention Model (AM) is a Model that simulates the brain processing information. The method becomes an important component of a network structure in the field of computer vision, and has been widely applied to the fields of image classification, target detection and the like. The Attention Mechanism (Attention Mechanism) is a resource allocation means for screening out useful information from a large amount of information, focuses on the required information, and then puts more Attention to the places to obtain the detailed information of the required target, and ignores the unimportant areas. For example, when reading an article, we will first focus on the article title to see what type of article this is, and then see the title of each chapter to grasp the overall context of the article. This is a means for human beings to quickly screen out high-value information from a large amount of information with limited attention. For computer vision, the attention mechanism is to obtain a weight distribution by learning, and then apply the weight distribution to the original features to obtain more detailed information of the target of interest, while suppressing other useless information.
Attention mechanisms can be divided into 3 categories: a channel domain attention mechanism, a space domain attention mechanism and a mixed domain attention mechanism. The channel domain attention is apt to ignore the local information in each channel, and the spatial domain attention is apt to ignore the local information at different spatial positions of the same channel. The attention model of the mixed domain combines the ideas of the two, scores the channel attention and the space attention at the same time, and effectively integrates the advantages of the two. The most representative of these modules is the CBAM Module (Convolition Block Attention Module).
Aiming at a plurality of types of target scenes, the target detection method aims at accurately judging the type and the position of a target in an image, and the two-stage method can solve the problems. Researchers mainly generate a candidate frame by a Region Proposal method and then carry out coordinate regression prediction according to the candidate frame. Ross Girshick et al adopts a CNN network to extract image features, improves the representation capability of the features to samples from experience-driven artificial feature normal forms HOG and SIFT to data-driven representation learning models, solves the problems that small samples are difficult to train or even over-fit and the like by adopting a mode of supervised pre-training and fine-tuning of the small samples under large samples, and improves the accuracy of target detection to a certain extent. Ross Girshick et al proposed a Fast convolutional network method (Fast R-CNN) based on regional recommendations for target detection. Fast R-CNN uses deep convolutional networks on the basis of previous work, and can classify objects more efficiently. Compared with the previous work, Fast R-CNN carries out multiple innovations, improves the detection precision and the training and testing speed.
The Chinese patent with the publication number of CN112487920A discloses a climbing behavior identification method based on a convolutional neural network, which is applied to the field of target identification and aims at solving the problem of low detection precision in the behavior identification of pedestrian climbing over a railing in the prior art; the patent overcomes the defects of low real-time performance and unavailable size of the boundary box in the traditional target detection method by drawing the boundary box with the same size as the figure; predicting image feature types by adopting a Yolo target detection network, and tracking a target by adopting a GOTURN network; and finally, rapidly using the relative position relation between the railing and the track point set by a priori knowledge method to judge whether the track point set is a crossing behavior, and if the track point set is the crossing behavior, outputting a crossing label and initiating warning.
Although the patent can accurately identify the behavior of the pedestrian crossing the railing, the aim of the patent is to predict the behavior of the pedestrian possibly crossing the railing in the next step according to the video frame image data collected in real time, and if the algorithm is used for detecting and identifying the falling behavior at the bridge side, the false detection rate is high because the algorithm is not combined with the detection results of different behavior stages of the pedestrian; and the algorithm can only be used for detecting the behavior of crossing the railing, cannot be used for detecting falling of personnel and falling of personnel, and is not suitable for detecting the falling behavior of bridging. Therefore, it is necessary to provide a detection method based on deep learning to realize automatic detection of bridge-based falling behavior, so as to strive for gold rescue for 5 minutes.
Disclosure of Invention
The invention aims to provide a bridge side falling behavior recognition method based on deep understanding, which can effectively realize the detection of the bridge side falling behavior and effectively prevent tragedies from occurring when pedestrians fall at the bridge side but rescue is not in time.
In order to achieve the purpose, the invention adopts the technical scheme that:
a bridge side falling behavior identification method based on depth understanding comprises the following steps:
s1, collecting video data of a panoramic camera of the monitoring bridge beside the bridge at the edge of the river in real time, and preprocessing the video data;
s2, pre-judging whether a pedestrian falls from the side of the bridge or not by using the pre-processed video data; using the fence at the bridge edge and the periphery of the fence as an interest domain, identifying whether a person crosses the fence by using a trained YOLO-Attention model, and verifying whether the person crosses the fence in an auxiliary way by using a monitoring camera on the bridge floor and a warning region algorithm; if the bridge recognizes that the person crosses the fence, a railing boundary crossing signal is generated, and the step S3 is entered; otherwise, returning to the step S1;
s3, detecting whether a person falls in the bridge edge fence or not; taking the area under the bridge and the river as the interest areas, detecting whether a person falls by using a trained YOLO-Attention model, if the person falls, sending a person falling signal, and entering the step S4;
s4, detecting whether falling water bloom exists on the river surface under the bridge; setting the river surface under the bridge as an interest area, detecting whether falling water bloom generated after falling of a person occurs by using a trained YOLO-Attention model, if the falling water bloom occurs, judging that the person falls into water, sending a falling water bloom signal, and entering the step S5;
s5, detecting whether a person floats on the river surface by using the trained YOLO-Attention model, and if detecting that the person floats, sending the position of the floating person to a rescue worker; if the person is not detected to float, judging that the person sinks into the river, and entering step S6;
and S6, constructing a water flow model to predict the approximate position of the person falling into the water according to the water flow speed in the river and the position of falling water bloom, and sending the predicted position information to the rescue personnel.
Specifically, in step S1, the method for preprocessing the video data includes: and judging whether the video image needs to be subjected to defogging processing by adopting a self-adaptive defogging algorithm, if the TBV (total bounded variation) in the image is judged to be larger than a set threshold value, the image does not need to be subjected to defogging processing, and otherwise, the image needs to be subjected to defogging processing.
The invention provides a defogging method based on Deblurganv2, which can effectively remove fog on the river surface and enable the subsequent bridge measurement falling behavior detection to be more accurate; the Deblurganv2 is the core of the defogging algorithm, a Feature Pyramid (FPN) structure is adopted as a core module of a generator, semantic information contained in low-layer feature information extracted by the feature pyramid is less but the target position of the low-layer feature information is accurate, extracted high-layer semantic information is rich but the target position is fuzzy, high-layer features are fused through upsampling and low-layer features, and prediction is independently made after the features of each layer are fused.
The generator backbone network (backbone) selects a more complex inclusion-ResNet-v 2, which combines the inclusion module with the ResNet structure. After the input of the increment module, a plurality of paths can be selected, and the network can select which filter to use, so that sparse or non-sparse characteristics on the same layer can be well obtained; ResNet is the stack of residual module, and neuron study objective function and the difference of input, along with the increase of network depth, can greatly accelerate neural network's convergence, reduce the training error, improve the precision of network.
The generator loss function consists of a weighted sum of pixel-level loss, perceptual loss, and local loss.
LG=0.5×Lpix+0.006×Lp+0.01×Ladv
Wherein L ispixIs the minimum mean square error; l ispCalculating Euclidean distance from feature maps extracted by a convolution kernel of 3 × 3 of the VGG19 network for perceptual loss; l isadvIs the local loss with Patch size 70 x 70. L isGTo generator losses; such a combination of losses ensures that the convergence of the network takes into account both the local image details and the overall image style.
The Discriminator of the DeblurgAN-v2 adopts a double-Discriminator structure, not only keeps the PatchGAN structure as a Local Discriminator to discriminate the Patch with the size of 70 multiplied by 70, but also introduces a global Discriminator to discriminate the whole image. Therefore, the discriminator can search a balance point between the whole image information and the local image information to achieve the effect of taking account.
The discriminator loss function uses a modified loss function RaGAN-LS loss to the Least Square GANS (LSGAN), which helps the network converge more smoothly and efficiently.
Figure RE-GDA0003473598370000041
Wherein D (-) is a discriminator function, G (-) is a generator function, LDFor discriminator loss, E is the mathematical expectation, x-pdata(x) Generating a distribution p for compliance datadata(x) Data samples x, s-p in (1)s(s) a noise prior distribution p generated for the compliant datasNoise samples s in(s).
Specifically, in step S2 and step S5, the person is very small in the monitoring screen when the person turns over; in step S3, the speed of the personnel falling is very fast; in step S4, water bloom may occasionally appear on the river surface, but the characteristics thereof are different from those of water bloom falling into water; in summary of the above challenges, a mixed domain Attention mechanism is added to the YOLO-Attention model to improve the detection accuracy.
Further, the invention adopts a CBAM mixed domain attention mechanism, which combines a channel attention mechanism and a space attention mechanism, under the mechanism, the feature map passes through two models in total, firstly passes through the channel attention model and then passes through the space attention model, and then outputs a reconstructed feature map. The CBAM adopts the idea similar to the attention mechanism of people, and assigns the weight of the feature map again through continuous self-learning so as to attach importance to the feature with large weight and inhibit useless features, thereby improving the network performance.
Further, a channel domain attention mechanism in the CBAM module learns and assigns weight distribution to different channels according to differences of importance of different channels, focuses on important feature channels, weakens influences of other features, achieves the purpose of improving network performance, and assigns weight distribution to the obtained feature graph based on the channels through three-step operation. The specific implementation method comprises the following steps:
in the first step, the Squeeze operation (Squeeze) compresses the two-dimensional feature (H × W) of each channel into a real number through Global Pooling (Global boosting), which belongs to a feature compression of spatial dimension, because the real number is calculated according to all values of the two-dimensional feature, so that the real number has a Global sense field to some extent, the number of channels remains unchanged, and thus the real number becomes 1 × 1 × C after passing the Squeeze operation. The specific operation formula is as follows:
Figure RE-GDA0003473598370000042
wherein, Fsq(. cndot.) is a squeeze function, W, H is the width and height, respectively, of the feature map to be processed, uc(i, j) is the element for which the feature map level c channel coordinate is (i, j), zcOutput characteristics indicating that the c-th layer channel is extruded; after extrusion operation, a one-dimensional tensor with the same length and channel number is formed;
and secondly, exciting operation (Excitation), generating a weight value for each characteristic channel through a parameter W, and outputting the weight values with the same number as that of the input characteristics, wherein the specific operation formula is as follows:
s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z))
wherein, FexRepresenting the excitation operation, z is the output of the squeeze operation, and is the tensor with the size of 1 × 1 × C, and C is the channel number of the feature map; w1And W2Is a weight, wherein
Figure RE-GDA0003473598370000051
R is a scaling parameter for reducing the number of channels and thus reducing the amount of computation, and is represented by R; δ denotes the ReLU activation function, σ denotes the Sigmoid activation function; s is the output of the excitation function and is used for describing the weight of the characteristic diagram; starting from the last equal sign, first use W1Multiplying by z is a fully-connected operation, the result of the multiplicationFruit dimension of
Figure RE-GDA0003473598370000052
Then, the output dimension is unchanged through a ReLU layer, and then W is summed2Multiplication is a full connection process, the output dimension is changed into 1 × 1 × C at this time, and finally s is obtained through a Sigmoid function. The s is the core of the SE module, and is used to characterize the weight of the feature map, and the weight is obtained by the previous learning of the fully-connected layer and the nonlinear layer.
Thirdly, feature weight calibration (Scale), weighting the weight value obtained by the excitation operation on each channel feature, multiplying the weight coefficient by the channel by channel to finish introducing an attention mechanism into the channel dimension, wherein the specific operation formula is as follows:
Figure RE-GDA0003473598370000053
wherein, Fscale(. cndot.) represents an identification function,
Figure RE-GDA0003473598370000054
representing the output layer c channel characteristics, scRepresents the weight of the c-th channel, ucRepresenting the features of the c-th channel of the input feature map.
Further, the spatial domain attention mechanism in the CBAM module is that a feature map is formed by utilizing the spatial structure of features, and the spatial relationship of the features is used for modeling; first, maximum pooling and average pooling based on channel dimension are performed on the feature map to obtain two W × H × 1 channel descriptions, which are then connected together according to the channel to generate a valid feature descriptor. Simultaneous application of convolutional layers to generate a spatial attention map Ms(F)∈RH×WThe spatial attention map represents weight coefficients which encode locations requiring attention or suppression, and two feature maps are generated through two Pooling (Pooling) operations, which in turn represent an average Pooling feature and a maximum Pooling feature, which are passed through an activation function to obtain a final result; the specific operation formula is as follows:
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)]))
Figure RE-GDA0003473598370000055
where σ denotes a sigmoid activation function, f7×7A convolutional layer representing a convolutional kernel size of 7 × 7; ms(F) A final space attention diagram is obtained; AvgPool (F) is the average pooling operation based on the channel dimension for the feature map, and MaxPool (F) is the maximum pooling operation based on the channel dimension for the feature map;
Figure RE-GDA0003473598370000061
in order to maximize the pooling characteristics,
Figure RE-GDA0003473598370000062
is the average pooling characteristic.
Further, in step S2, since the person is small in the monitoring screen, the surrounding pedestrians interfere with the detection of the model, and false detection is caused. An alert zone algorithm is introduced in step S2 and a camera on the road surface on the large bridge is also embedded in the detection algorithm of the bridge-side fall anticipation module. And taking the warning region algorithm and bridge deck camera detection as auxiliary verification, and combining the yolo-attention detection to form cross verification and prompt the accuracy of the algorithm.
Specifically, the method for implementing the alert zone algorithm in step S2 is as follows:
the first step is as follows: setting a background template of the alert area:
the method comprises the steps of setting an image of a bridge floor guardrail region without a person as a background template, inputting a feature map of the background template region into a deep learning detection network for training, and enabling the network to adaptively find out an alarm ring region through a certain amount of training.
The second step is that: setting early warning characteristics:
and performing image difference operation on the image frames of the bridge floor guardrail with the person and the image frames of the background template, extracting the difference information of the image template and the background template with the person as an early warning characteristic diagram, and inputting the early warning characteristic diagram into a detection network for training to enable the network to obtain an early warning effect. When someone climbs the guardrail beside the bridge, the characteristics of the warning area correspondingly change into early warning characteristics, and at the moment, the detection model can give an early warning.
The third step: screening out influence factors:
due to the influence of optical flow and rainwater, when garbage, sundries, flying objects such as birds and the like thrown down from the bridge surface pass through the warning area, image characteristics are changed, and the images may be used as early warning characteristics by a network to cause false detection. Therefore, the method defines the characteristics of light stream, rainwater, garbage and other sundries, and the characteristics of flying objects such as birds and the like when the flying objects pass through the warning area as influence factors and needs to be screened out. Firstly, the image frame when the influence factor characteristic of the warning area appears and the image frame of the background template are subjected to differential operation, the difference information of the image of the guardrail influence factor and the background template is extracted to be used as an influence factor characteristic diagram, and the influence factor characteristic diagram is input into a detection network for training, so that the network has the capability of identifying the influence factor. When the network detects that the characteristics of the video change but the influence factor, the influence factor is screened.
Specifically, in step S3, if the personnel falling signal is not generated, the method returns to step S2 to detect whether a person crosses the fence again, and if the railing boundary crossing signal is not generated, the railing boundary crossing signal is determined to be a false determination; if the balustrade out-of-range signal is still generated, step S3 is executed again to detect whether a person falls, if the person falling signal is not generated, the balustrade out-of-range signal is determined as a false determination.
Specifically, in step S4, if the falling splash signal is not generated, the process returns to step S3 to detect whether a person falls again, and if the person falls is not detected, the falling signal is determined to be a false determination; if the personnel falling signal is still generated, the step S4 is executed again to detect whether falling splash exists, and if the personnel falling signal is not generated, the personnel falling signal is judged to be misjudgment.
Specifically, in step S6, the invention adopts a rescue area prediction algorithm to predict the approximate position of the person falling into water, the algorithm predicts the drift trajectory of the drift person, the water in the river seems to be straight, but the person falling into water is not straight due to the influence of various factors, and the simple prediction precision is not high; the method takes the wind speed and the wind direction of a person falling into water and the flow speed and the flow direction of water into consideration to establish a motion equation of target drifting:
Figure RE-GDA0003473598370000071
wherein, VcIs a wind speed field, VwIs a water flow velocity field; x (t) is the position of the man falling into water at the moment, and x (t + delta t) is the position of the man falling into water after delta t time; the method obtains drifting data in various parameter forms by utilizing a dummy to carry out a simulated floating experiment in the river, and fits a personnel drifting motion track according to the obtained data. Wind power and flow velocity are discrete data recorded, and the recorded data is limited. In order to further improve the accuracy of the predicted track, the method divides the sampling of the wind speed field and the water flow velocity field into smaller intervals, and applies a Lagrangian interpolation method to the smaller intervals to obtain unknown data, so that the prediction error is further reduced.
Because the implementation difficulty of the manual experiment is high, the time cost is high, and the Monte Carlo simulation method is used for simulating the drift trajectory of the personnel on the basis of the manual experiment data fitting and the Lagrange interpolation method. The monte carlo simulation method is a method for setting a random process, continuously generating a time sequence and researching the distribution of the process by calculating statistic in the time sequence. The concrete operation steps are that drifting personnel are abstracted into particles, and each particle is endowed with an influence factor influenced by factors such as river medium wind speed, water flow speed and the like. Then, the particles are massively copied to generate drift of the particle group. And finally, taking the drifting trajectory of part of the particle cluster as the prediction of the personnel drifting trajectory.
Compared with the prior art, the invention has the beneficial effects that: (1) by adopting the defogging method based on the Deblurganv2, the method can improve the reduction effects of video image detail textures, river surface area color difference and local artifacts, can effectively remove fog on the river surface, and enables the subsequent bridge measurement falling behavior detection to be more accurate; the defogging algorithm adopted by the invention has a self-adaptive function, and is only applied when the river is fogged, so that the system resources are greatly saved; (2) in the detection stage of identifying the bridge falling behavior, warning lines are established at the edges of the bridge deck railings to detect out-of-range personnel, and a large number of false alarms can be generated at the moment because more pedestrians and tourists in a bridge can generate a great deal of interference; in order to reduce false alarm and effectively discover potential target personnel, a large number of training samples are established through actual measurement data and manual simulation, an attention mechanism is fused with a Yolo network, and climbing behaviors are detected; screening out the personnel which can fall from the side of the bridge through a warning region algorithm and cross verification of a bridge deck camera and a Yolo-attention; the method improves the accuracy to 75%; (3) the invention utilizes the method of the combined detection of three modules of personnel boundary crossing, personnel falling and falling water bloom, and establishes a strict signal transmission regression mechanism of the three modules, thereby greatly reducing the probability of false alarm; simultaneously, utilizing the measured data pairs; the drift track of the person falling into the water is strictly mathematically analyzed to predict the approximate position of the person falling into the water, so that great convenience is provided for rescue search of rescuers.
Drawings
Fig. 1 is a flow chart of a bridge side falling behavior recognition method based on depth understanding according to the present invention.
FIG. 2 is a schematic structural diagram of a DeblurgAN-v2 generator in the defogging algorithm according to the embodiment of the invention.
Fig. 3 is a schematic structural diagram of a CBAM module according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a backbone network structure of the YOLO algorithm with an attention mechanism added in the embodiment of the present invention.
FIG. 5 is a schematic diagram of a pedestrian crossing in an embodiment of the present invention.
Fig. 6 is a schematic view of a bridge surveillance zone in an embodiment of the present invention.
FIG. 7 is a schematic diagram of auxiliary verification of a bridge deck highway camera in the embodiment of the invention.
Fig. 8 is a schematic diagram of a person falling in an embodiment of the present invention.
Fig. 9 is a schematic view of a falling water bloom in the embodiment of the invention.
FIG. 10 is a schematic view of the floating of a person in an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present embodiment provides a bridge side falling behavior identification method based on depth understanding, and a strict cross validation method and a signal transmission mechanism can accurately measure the behavior of a person falling from a bridge side, including the following steps:
s1, collecting video data of a panoramic camera of the monitoring bridge beside the bridge at the edge of the river in real time, and preprocessing the video data;
s2, pre-judging whether a pedestrian falls from the side of the bridge or not by using the pre-processed video data; the fence at the bridge edge and the periphery of the fence are used as interest areas, a trained YOLO-Attention model is used for identifying whether a person crosses the fence or not, and a monitoring camera on the bridge surface and a warning area algorithm are used for assisting in verifying whether the person crosses the fence or not. If the people are recognized to cross the fence, generating a railing boundary crossing signal, and entering the step S3; otherwise, returning to the step S1;
s3, detecting whether a person falls in the bridge edge fence or not; taking the area under the bridge and the river as the interest areas, detecting whether a person falls by using a trained YOLO-Attention model, if the person falls, sending a person falling signal, and entering the step S4;
s4, detecting whether falling water bloom exists on the river surface under the bridge; setting the river surface under the bridge as an interest area, detecting whether falling water bloom generated after falling of a person occurs by using a trained YOLO-Attention model, if the falling water bloom occurs, judging that the person falls into water, sending a falling water bloom signal, and entering the step S5;
s5, detecting whether a person floats on the river surface by using the trained YOLO-Attention model, and if detecting that the person floats, sending the position of the floating person to a rescue worker; if the person is not detected to float, judging that the person sinks into the river, and entering step S6;
and S6, constructing a water flow model to predict the approximate position of the person falling into the water according to the water flow speed in the river and the position of falling water bloom, and sending the predicted position information to the rescue personnel.
Specifically, in step S1, the method for preprocessing the video data includes: and judging whether the video image needs to be subjected to defogging processing by adopting a self-adaptive defogging algorithm, if the TBV (total bounded variation) in the image is judged to be larger than a set threshold value, the image does not need to be subjected to defogging processing, and otherwise, the image needs to be subjected to defogging processing. This embodiment all is equipped with the camera on other and the bridge floor of bridge, and the camera on the bridge floor can assist the detection forever and cross the border, improves the validity and the wholeness that detect.
On a river bridge with rich water vapor, fogging is a very common phenomenon generally, however, when fog covers the river bridge, identification of a camera is difficult, and when the fog occurs, defogging needs to be carried out on a video captured by the camera, the invention provides a defogging method based on Deblurganv2, which can effectively remove fog on the river surface, so that subsequent drop behavior detection of a bridge is more accurate; the Deblurganv2 is the core of the defogging algorithm, a Feature Pyramid (FPN) structure is adopted as a core module of a generator, semantic information contained in low-layer feature information extracted by the feature pyramid is less but the target position of the low-layer feature information is accurate, extracted high-layer semantic information is rich but the target position is fuzzy, high-layer features are fused through upsampling and low-layer features, and prediction is independently made after the features of each layer are fused.
As shown in fig. 2, the generator backbone (backbone) here selects a more complex inclusion-ResNet-v 2, which combines the inclusion module with the structure of ResNet. After the input of the increment module, a plurality of paths can be selected, and the network can select which filter to use, so that sparse or non-sparse characteristics on the same layer can be well obtained; ResNet is the stack of residual module, and neuron study objective function and the difference of input, along with the increase of network depth, can greatly accelerate neural network's convergence, reduce the training error, improve the precision of network.
The generator loss function consists of a weighted sum of pixel-level loss, perceptual loss, and local loss.
LG=0.5×Lpix+0.006×Lp+0.01×Ladv
Wherein L ispixIs the minimum mean square error; l ispCalculating Euclidean distance from feature maps extracted by a convolution kernel of 3 × 3 of the VGG19 network for perceptual loss; l isadvIs the local loss with Patch size 70 x 70. L isGTo generator losses; such a combination of losses ensures that the convergence of the network takes into account both the local image details and the overall image style.
The Discriminator of the DeblurgAN-v2 adopts a double-Discriminator structure, not only keeps the PatchGAN structure as a Local Discriminator to discriminate the Patch with the size of 70 multiplied by 70, but also introduces a global Discriminator to discriminate the whole image. Therefore, the discriminator can search a balance point between the whole image information and the local image information to achieve the effect of taking account.
The discriminator loss function uses a modified loss function RaGAN-LS loss to the Least Square GANS (LSGAN), which helps the network converge more smoothly and efficiently.
Figure RE-GDA0003473598370000101
Wherein D (-) is a discriminator function, G (-) is a generator function, LDFor discriminator loss, E is the mathematical expectation, x-pdata(x) Generating a distribution p for compliance datadata(x) Data samples x, s-p in (1)s(s) a noise prior distribution p generated for the compliant datasNoise samples s in(s).
Specifically, in step S2 and step S5, the person is small in the monitoring screen; in step S3, the person falls very quickly; in step S4, in addition to the person falling into the water, the river surface may occasionally have water bloom, but the characteristics thereof are different from those of the person falling into the water bloom; in summary of the above challenges, a mixed domain Attention mechanism is added to the YOLO-Attention model to improve the detection accuracy.
Further, as shown in fig. 3, the present invention employs a CBAM mixed domain attention mechanism, which combines a channel attention mechanism and a spatial attention mechanism, in which the feature map passes through two models, first the channel attention model and then the spatial attention model, and then outputs the reconstructed feature map. The CBAM adopts the idea similar to the attention mechanism of people, and assigns the weight of the feature map again through continuous self-learning so as to attach importance to the feature with large weight and inhibit useless features, thereby improving the network performance.
Further, a channel domain attention mechanism in the CBAM module learns and assigns weight distribution to different channels according to differences of importance of different channels, focuses on important feature channels, weakens influences of other features, achieves the purpose of improving network performance, and assigns weight distribution to the obtained feature graph based on the channels through three-step operation. The specific implementation method comprises the following steps:
in the first step, the Squeeze operation (Squeeze) compresses the two-dimensional feature (H × W) of each channel into a real number through Global Pooling (Global boosting), which belongs to a feature compression of spatial dimension, because the real number is calculated according to all values of the two-dimensional feature, so that the real number has a Global sense field to some extent, the number of channels remains unchanged, and thus the real number becomes 1 × 1 × C after passing the Squeeze operation. The specific operation formula is as follows:
Figure RE-GDA0003473598370000102
wherein, Fsq(. cndot.) is a squeeze function, W, H is the width and height, respectively, of the feature map to be processed, uc(i, j) is the element for which the feature map level c channel coordinate is (i, j), zcOutput characteristics indicating that the c-th layer channel is extruded; after extrusion operation, formA one-dimensional tensor with the same length and channel number;
and secondly, exciting operation (Excitation), generating a weight value for each characteristic channel through a parameter W, and outputting the weight values with the same number as that of the input characteristics, wherein the specific operation formula is as follows:
s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z))
wherein, FexRepresenting the excitation operation, z is the output of the squeeze operation, and is the tensor with the size of 1 × 1 × C, and C is the channel number of the feature map; w1And W2Is a weight, wherein
Figure RE-GDA0003473598370000111
R is a scaling parameter for reducing the number of channels and thus reducing the amount of computation, and R represents a linear space in the real number domain; δ denotes the ReLU activation function, σ denotes the Sigmoid activation function; s is the output of the excitation function and is used for describing the weight of the characteristic diagram; starting from the last equal sign, first use W1Multiplying by z is a full join operation, the result dimension of the multiplication being
Figure RE-GDA0003473598370000112
Then, the output dimension is unchanged through a ReLU layer, and then W is summed2Multiplication is a full connection process, the output dimension is changed into 1 × 1 × C at this time, and finally s is obtained through a Sigmoid function. The s is the core of the SE module, and is used to characterize the weight of the feature map, and the weight is obtained by the previous learning of the fully-connected layer and the nonlinear layer.
Thirdly, feature weight calibration (Scale), weighting the weight value obtained by the excitation operation on each channel feature, multiplying the weight coefficient by the channel by channel to finish introducing an attention mechanism into the channel dimension, wherein the specific operation formula is as follows:
Figure RE-GDA0003473598370000113
wherein, Fscale(. represents)The function is identified and, in response to the identification,
Figure RE-GDA0003473598370000114
representing the output layer c channel characteristics, scRepresents the weight of the c-th channel, ucRepresenting the features of the c-th channel of the input feature map.
Further, the spatial domain attention mechanism in the CBAM module is that a feature map is formed by utilizing the spatial structure of features, and the spatial relationship of the features is used for modeling; first, maximum pooling and average pooling based on channel dimension are performed on the feature map to obtain two W × H × 1 channel descriptions, which are then connected together according to the channel to generate a valid feature descriptor. Simultaneous application of convolutional layers to generate a spatial attention map Ms(F)∈RH×WThe spatial attention map represents weight coefficients which encode locations requiring attention or suppression, and two feature maps are generated through two Pooling (Pooling) operations, which in turn represent an average Pooling feature and a maximum Pooling feature, which are passed through an activation function to obtain a final result; the specific operation formula is as follows:
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)]))
Figure RE-GDA0003473598370000115
where σ denotes a sigmoid activation function, f7×7A convolutional layer representing a convolutional kernel size of 7 × 7; ms(F) A final space attention diagram is obtained; AvgPool (F) is the average pooling operation based on the channel dimension for the feature map, and MaxPool (F) is the maximum pooling operation based on the channel dimension for the feature map;
Figure RE-GDA0003473598370000116
in order to maximize the pooling characteristics,
Figure RE-GDA0003473598370000117
is the average pooling characteristic.
Further, in step S2, when the pedestrian is about to live gently, the pedestrian usually jumps down over the guardrail at the edge of the bridge, and most people will hesitate to jump down again on the guardrail, as shown in fig. 5, at this time, the method performs people crossing detection to determine whether someone wants to live gently to perform early warning, and if the people crossing is detected, the system generates a people crossing signal. The method utilizes YOLO added with an attention mechanism to detect out-of-range personnel, because the personnel are close to a side sidewalk on a guardrail when out-of-range, and misjudgment is easily caused if the pedestrian on the roadside is close to the guardrail, the auxiliary verification is carried out by matching a camera for monitoring traffic on a bridge floor with a long-range camera as shown in figure 7, and a mixed domain attention mechanism is added into a detection model; in addition, the method also adds an alert zone algorithm when people are detected to cross the border. The guardrail area at the side of the bridge is adaptively found through network training, and is marked as the warning area in the image, as shown in fig. 6, whether a person wants to cross the guardrail or not is judged through a characteristic comparison mode, and if the person crosses, an early warning can be given. And in the warning region algorithm, the bridge deck camera auxiliary verification is combined with YOLO added with a mixed region attention mechanism to form a cross verification mode, so that the accuracy of personnel boundary crossing detection is greatly improved.
The invention utilizes the camera of the monitoring bridge for 7 days multiplied by 24 hours to capture the falling signal of the personnel and send an alarm signal to strive for gold rescue for 5 minutes, thereby ensuring that the falling personnel can be rescued and treated in time. The system embeds computer vision algorithm in a camera on a bridge on the river, which comprises: 1) a people crossing railing behavior monitoring module; 2) a personnel fall monitoring module; 3) a falling water bloom monitoring module; 4) a personnel floating detection module; 5) and a rescue area prediction module.
In the invention, in a monitoring module for the behavior of people crossing a railing, three sub-modules are adopted for cross validation to judge whether the people cross the railing, and the three sub-modules are respectively as follows: a) a bridge deck roadside camera target detection submodule; b) an end-to-end detection submodule; c) and a warning region target detection submodule.
The end-to-end detection submodule and the warning region target detection submodule adopt panoramic cameras capable of shooting the whole bridge deck and water surface region to collect image data (for example, when the panoramic cameras are applied to a Nanjing Yangtze river bridge, the panoramic cameras can be installed on a Nanbao and a Beibao of the bridge); the end-to-end detection submodule is used for identifying whether a person exists outside a bridge deck railing, and the warning area target detection submodule is used for identifying whether the person crosses the bridge deck railing; the identification network is realized by using a Yolo-attention algorithm; the input of the end-to-end detection submodule is a whole picture, the output is whether a person exists, and if the person exists, a frame is drawn to frame the person; the warning area target detection submodule utilizes the comparison of front and rear frames to identify whether a person turns over a large bridge railing; a frame is drawn at the position where people stand outside a bridge railing, the area of the frame is used as an interest area, the input of the network is a graph of the whole interest area, and the output is whether people are detected (namely whether people cross the railing).
The bridge deck roadside camera is mounted on a bridge deck street lamp post (the camera is short, the range of 3-5 meters can be seen, and only partial guardrail transgression can be seen), and a clearer picture can be shot due to the fact that the bridge deck roadside camera is closer to a large bridge guardrail, so that the bridge deck roadside camera and the panoramic camera are combined, and the recognition error rate can be reduced; in the specific implementation process of the embodiment, a plurality of bridge deck roadside cameras are required to work in a combined manner.
The invention judges whether a person crosses the cross bridge rail and falls from the bridge side through the cross verification of the first three modules, if the person is detected to cross the rail and fall from the bridge side, the alarm can be given in time and the rescue can be called, so that the optimal rescue time can be prevented from being missed; the position of the person falling into the water is predicted by the two last modules, and the rescue team is informed of the prediction, so that convenience is created for rescue work, the rescue team can rescue the person falling into the water in the shortest time, and the survival rate of the person falling into the water is improved.
Specifically, the alert zone algorithm adopted by the alert zone target detection submodule is specifically implemented as follows:
the first step is as follows: setting a background template of the alert area:
the method comprises the steps of setting an image of a bridge floor guardrail region without a person as a background template, inputting a feature map of the background template region into a deep learning detection network for training, and enabling the network to adaptively find out an alarm ring region through a certain amount of training.
The second step is that: setting early warning characteristics:
and performing image difference operation on the image frames of the bridge floor guardrail with the person and the image frames of the background template, extracting the difference information of the image template and the background template with the person as an early warning characteristic diagram, and inputting the early warning characteristic diagram into a detection network for training to enable the network to obtain an early warning effect. When someone climbs the guardrail beside the bridge, the characteristics of the warning area correspondingly change into early warning characteristics, and at the moment, the detection model can give an early warning.
The third step: screening out influence factors:
due to the influence of light and rain, the influence of occasional falling of sundries, the influence of head bags exposed by tourists on the railings when the tourists on the bridge lie prone and the change of image characteristics can be generated when the flying birds pass through the warning area, and the false detection can be caused by using the network as an early warning characteristic. Therefore, the method defines the characteristics of light, rainwater, sundries, exposure of head bags on the railings of tourists on the bridge, flying birds and the like as influence factors and needs to screen out the influence factors. Firstly, the image frame when the influence factor characteristic of the warning area appears and the image frame of the background template are subjected to differential operation, the difference information of the image of the guardrail influence factor and the background template is extracted to be used as an influence factor characteristic diagram, and the influence factor characteristic diagram is input into a detection network for training, so that the network has the capability of identifying the influence factor. When the network detects that the characteristics of the video change but the influence factor, the influence factor is screened.
Specifically, in step S3, if the personnel falling signal is not generated, the method returns to step S2 to detect whether a person crosses the fence again, and if the railing boundary crossing signal is not generated, the railing boundary crossing signal is determined to be a false determination; if the balustrade out-of-range signal is still generated, step S3 is executed again to detect whether a person falls, if the person falling signal is not generated, the balustrade out-of-range signal is determined as a false determination.
When a person jumps into the river over a guardrail, there is a free fall process, as shown in figure 8, which is detected and alarmed at this point, and a rescue team can usually arrive and save it quickly. The speed of personnel falling is very fast, so the method utilizes a detection model added with a space domain attention mechanism to detect the personnel falling during the personnel falling detection, and if a personnel falling event is detected, the system sends a personnel falling signal to detect the personnel splash. If a signal that the personnel cross the border is received and the personnel disappear from the border crossing area, but the personnel falling condition is not detected, missing detection needs to be considered, the previous video stream needs to be read to carry out personnel falling detection again, and if the personnel falling event is detected again or not, personnel cross-border detection false detection is considered.
Specifically, in step S4, if the falling splash signal is not generated, the process returns to step S3 to detect whether a person falls again, and if the person falls is not detected, the falling signal is determined to be a false determination; if the personnel falling signal is still generated, the step S4 is executed again to detect whether falling splash exists, and if the personnel falling signal is not generated, the personnel falling signal is judged to be misjudgment.
When a person falls into a river, a large splash is usually caused, as shown in fig. 9, the method also needs to detect the splash generated when the person falls into the river, and the splash can be caused on the river surface due to other factors, although the characteristics of the method are obviously different from those of the splash falling into the river, the possibility of false detection still exists, so that in order to reduce false detection and save system resources, the detection of the splash falling into the river also needs to be established on the premise that the system already receives a signal of the person falling; when the water bloom detection is carried out, some spoondrift can be occasionally generated on the river surface, but the characteristics of the spoondrift are obviously different from the water bloom when people fall, so that the channel region attention mechanism is added into the detection model to detect the water bloom on the river surface. If the falling water bloom is detected after the personnel fall, the system erects the horse to calibrate the water bloom position as a rescue area prediction starting point, and the erector erects the horse to give an alarm to request rescue from a countryside troops on the river surface, and enters the next step of personnel floating detection.
Specifically, in step S5, after the person falls into water, some people sink, some people float on the water surface and struggle, and some people float on the water surface after drowning, and for convenience of subsequent rescue, the method detects the person floating on the river surface to determine the position of the person to be rescued and reports the position to the rescuers. Since the part usually floating on the water surface has only a few body parts after people fall into the water, the characteristics of the part are relatively few, as shown in fig. 10, and the person falling into the water also moves position due to the water flow, the system uses a detection model with a mixed domain attention mechanism when the detection people float, and the detection accuracy is improved. And if the floating condition of the personnel is not detected after the personnel fall into the water, judging that the personnel are in a submerged state, and needing to carry out next rescue area prediction.
Specifically, in step S6, the invention adopts a rescue area prediction algorithm to predict the approximate position of the person who falls into water, the algorithm predicts the drift trajectory of the drift person, the water in the river seems to be straight, but the person who falls into water is not straight due to various factors, and the simple prediction precision is not high. The method takes the wind speed and the wind direction of a person falling into water and the flow speed and the flow direction of water into consideration to establish a motion equation of target drifting:
Figure RE-GDA0003473598370000141
wherein, VcIs a wind speed field, VwIs a water flow velocity field; x (t) is the position of the man falling into water at the moment, and x (t + delta t) is the position of the man falling into water after delta t time; the method obtains drifting data in various parameter forms by utilizing a dummy to carry out a simulated floating experiment in the river, and fits a personnel drifting motion track according to the obtained data. Wind power and flow velocity are discrete data recorded, and the recorded data is limited. In order to further improve the accuracy of the predicted track, the method divides the sampling interval of the wind speed field and the water flow velocity field into smaller intervals, and applies a Lagrange interpolation method to the smaller intervals to obtain unknown data, so that the prediction error is further reduced.
Because the implementation difficulty of the manual experiment is high, the time cost is high, and the Monte Carlo simulation method is used for simulating the drift trajectory of the personnel on the basis of the manual experiment data fitting and the Lagrange interpolation method. The monte carlo simulation method is a method for setting a random process, continuously generating a time sequence and researching the distribution of the process by calculating statistic in the time sequence. The concrete operation steps are that drifting personnel are abstracted into particles, and each particle is endowed with an influence factor influenced by factors such as river medium wind speed, water flow speed and the like. This example was then massively replicated to generate a drift in the population. And finally, taking the drifting track of the central part of the particle swarm as the prediction of the personnel drifting track.
In this embodiment, in four modules of the people crossing detection, the people falling detection, the falling splash detection, and the people floating detection, a YOLO algorithm with a mixed attention mechanism is adopted for detection, where the YOLO algorithm is used as a core technology of the method, as shown in fig. 4, which is summarized as follows:
1) principle of operation
The input image of the detection model is in a fixed size (608 × 608 input size is adopted in the method), wherein features are extracted through a DarkNet-53 network structure, detection is carried out on feature maps in three sizes to obtain three prediction outputs y1, y2 and y3, a final prediction result is obtained through a Non-Maximum Suppression algorithm (NMS), and the detected target position and category information are output. The basic components of YOLO are CBL, i.e., the Conv layer, BN layer (Batch Normalization) and leakage ReLU activation function layer. The entire network is free of pooling layers and full connectivity layers. The Res unit is a residual unit block, and the unit block can relieve the degradation problem of the network model. DarkNet-53 is used as a main network of the detection model of the method, and the main component of the DarkNet-53 is ResX, which is composed of a CBL and X residual components and is also a large component in YOLO. The CBL in front of each Res module plays the role of down-sampling, so after 5 times of Res modules, the obtained feature map is 608 × 608- >304 × 304- >152 × 152- >76 × 76- >38 × 38- >19 × 19 in size. Residual components in ResX use the residual structure in the ResNet network for reference, and the network can be built deeper. The up-sampling defaults to using a nearest neighbor interpolation method, and the function is to amplify the feature map to obtain the prediction feature maps with different scales. Concat is tensor splicing operation, and splicing up-sampling results of a DarkNet middle layer and a later layer to achieve the purpose of dimension expansion. add is different from adding two tensors directly, and does not expand dimension.
2) Feature extraction network
As the backbone network for YOLO, the network consists essentially of a series of 1X 1 and 3X 3 convolutional layers followed by a BN layer and a LeakyReLU layer, for a total of 53 layers, called DarkNet-53. The network uses the thought of ResNet residual errors as a reference, uses a large number of 'jump layer connections' of the residual errors, and each residual error module consists of a 1 x 1 convolutional layer, a 3 x 3 convolutional layer and a jump connection. The method solves the problem of difficult training brought by a deep network, and in order to reduce the negative effect of gradient brought by Pooling, Pooling is abandoned by YOLO used in the method, and stride of conv is used for realizing down-sampling.
3) Loss function
The loss function is particularly important for target detection, and the loss function of YOLO in the method is then determined.
Given that the number of grids is S, the number of candidate boxes generated by each grid is B, and each candidate box finally obtains a corresponding bounding box through the network. Finally, the number of bounding boxes is S B;
first of all, states
Figure RE-GDA0003473598370000161
The meaning of (a): if the jth anchor box of the ith grid is responsible for the current object, then
Figure RE-GDA0003473598370000162
Otherwise it is 0.
Figure RE-GDA0003473598370000163
Meaning that the jth anchor box of the ith grid is not responsible for the target.
The parameter confidence C followsij *In training, Cij *Representing true value, Cij *The value of (c) depends on whether the bounding box of the grid cell is responsible for predicting a certain object. If it is responsible, Cij *1, otherwise Cij *=0。
Next, each term of the loss function is analyzed, first the center coordinate error, as follows:
Figure RE-GDA0003473598370000164
the meaning of the formula is that when the jth anchor of the ith grid is responsible for a certain real target, the center coordinates of the prediction frame are compared with the center coordinates of the real frame to obtain the center coordinate error. Wherein x isijRepresenting the predicted value of the x coordinate of the center, xij *Representing the true value of the central x coordinate; y isijRepresenting the predicted value of the central y coordinate, yij *Representing the true value of the central y coordinate.
The following is the broad height error, as follows:
Figure RE-GDA0003473598370000165
the meaning of the formula is that when the jth anchor of the ith grid is responsible for a certain real target, the width and the height of a generated prediction frame are compared with the width and the height of a real frame, and the width and the height errors are calculated; wherein, wijWidth, w, of the representation of the prediction anchorij *Represents the width of the actual anchor; h isijDenotes the height, h, of the predicted achorboxij *Representing the height of the actual anchor box.
Next is the confidence error, which is expressed using cross entropy, and whether or not the anchor is responsible for a certain goal, the confidence error is calculated as follows:
Figure RE-GDA0003473598370000171
Figure RE-GDA0003473598370000172
wherein, CijRepresents a parameter confidence prediction, Cij *Representing a parameter confidence truth value; alpha is alphanoobjThe weight value when no target exists is shown, and the weight value when a target exists is shown.
Next is the classification error, which is also the selection of cross entropy as a loss function. When the jth anchor box of the ith mesh is responsible for a real target, the bounding box generated by the anchor box will calculate the classification loss function, as shown in the following formula:
Figure RE-GDA0003473598370000173
wherein c belongs to classes as a certain class c in the class classes belonging to the general class, PijRepresenting a prediction of classification probability, Pij *Representing a classification probability truth value;
in summary, it can be finally obtained that the loss function of the method YOLO is shown as follows:
Figure RE-GDA0003473598370000174
wherein x isijRepresenting the predicted value of the x coordinate of the center, xij *Representing the true value, y, of the central x-coordinateijRepresenting the predicted value of the central y coordinate, yij *Representing the true value of the central y coordinate; w is aijWidth, w, of the representation of the prediction anchorij *Denotes the width, h, of the actual anchorijDenotes the height, h, of the predicted achorboxij *Representing the height of the actual anchor; cijRepresents a parameter confidence prediction, Cij *Representing a parameter confidence truth value; c belongs to class in classA certain class c, P in sesijRepresenting a prediction of classification probability, Pij *Representing a classification probability truth value; beta is acoordRepresenting the coordinate weight, αnoobjRepresenting the weight, alpha, without objectobjIndicating the weight value when there is a target.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A bridge side falling behavior identification method based on depth understanding is characterized by comprising the following steps:
s1, collecting video data of a panoramic camera of the monitoring bridge beside the bridge at the edge of the river in real time, and preprocessing the video data;
s2, pre-judging whether a pedestrian falls from the side of the bridge or not by using the pre-processed video data; using the fence at the bridge edge and the periphery of the fence as an interest domain, identifying whether a person crosses the fence by using a trained YOLO-Attention model, and cross-verifying whether the person crosses the fence by using a monitoring camera on a street lamp of the bridge floor and a warning region algorithm; if the people are recognized to cross the fence, generating a railing boundary crossing signal, and entering the step S3; otherwise, returning to the step S1;
s3, detecting whether a person falls in the bridge edge fence or not; taking the area under the bridge and the river as the interest areas, detecting whether a person falls by using a trained YOLO-Attention model, if the person falls, sending a person falling signal, and entering the step S4;
s4, detecting whether falling water bloom exists on the river surface under the bridge; setting the river surface under the bridge as an interest area, detecting whether falling water bloom generated after falling of a person occurs by using a trained YOLO-Attention model, if the falling water bloom occurs, judging that the person falls into water, sending a falling water bloom signal, and entering the step S5;
s5, detecting whether a person floats on the river surface by using the trained YOLO-Attention model, and if detecting that the person floats, sending the position of the floating person to a rescue worker; if the person is not detected to float, judging that the person sinks into the river, and entering step S6;
and S6, constructing a water flow model to predict the approximate position of the person falling into the water according to the water flow speed in the river and the position of falling water bloom, and sending the predicted position information to the rescue personnel.
2. The method for identifying bridge side falling behavior based on depth understanding of claim 1, wherein in step S1, the video data is preprocessed by: and judging whether the video image needs to be subjected to defogging processing by adopting a self-adaptive defogging algorithm, if the TBV (total bounded variation) in the image is judged to be larger than a set threshold value, the image does not need to be subjected to defogging processing, and otherwise, the image needs to be subjected to defogging processing.
3. The method for identifying the bridge-side falling behavior based on depth understanding of claim 1, wherein a mixed domain Attention mechanism is added to the YOLO-Attention model in steps S2 to S5 for improving the detection accuracy.
4. The method as claimed in claim 3, wherein a mixed domain attention mechanism is used, which combines a channel attention mechanism and a spatial attention mechanism, and in this mechanism, the feature map passes through a total of two models, namely the channel attention model and the spatial attention model, and then the reconstructed feature map is output.
5. The bridge side falling behavior recognition method based on depth understanding of claim 4, wherein the channel domain attention mechanism in the hybrid attention model is implemented as follows:
the first step, extrusion operation, compressing the two-dimensional features of each channel into a real number through global pooling, and the specific operation formula is as follows:
Figure RE-FDA0003473598360000021
wherein, Fsq(. cndot.) is a squeeze function, W, H is the width and height, respectively, of the feature map to be processed, uc(i, j) is the element for which the feature map level c channel coordinate is (i, j), zcOutput characteristics indicating that the c-th layer channel is extruded;
and secondly, exciting operation, namely generating a weight value for each characteristic channel through the parameter W, and outputting the weight values with the same number as the input characteristics, wherein the specific operation formula is as follows:
s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z))
wherein, FexRepresenting the excitation operation, z is the output of the squeeze operation, and is the tensor with the size of 1 × 1 × C, and C is the channel number of the feature map; w1And W2Is a weight; δ denotes the ReLU activation function, σ denotes the Sigmoid activation function; s is the output of the excitation function and is used for describing the weight of the characteristic diagram;
thirdly, calibrating feature weight, weighting the weight value obtained by the excitation operation to each channel feature, multiplying the weight coefficient by channels one by one, and completing an attention mechanism in the channel dimension, wherein a specific operation formula is as follows:
Figure RE-FDA0003473598360000022
wherein, Fscale(. cndot.) represents an identification function,
Figure RE-FDA0003473598360000023
representing the output layer c channel characteristics, scRepresents the weight of the c-th channel, ucRepresenting the features of the c-th channel of the input feature map.
6. The method for identifying bridge side falling behavior based on depth understanding of claim 4, wherein the spatial domain attention mechanism in the hybrid attention model is implemented as follows:
forming a feature map by using the spatial structure of the features, and modeling by using the relation of the features on the space; firstly, performing maximum pooling and average pooling operations based on channel dimensions on feature maps to obtain two feature maps which respectively represent maximum pooling features and average pooling features; then applying the convolution layer and the activation function to obtain a final space attention diagram; the specific operation formula is as follows:
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)]))
Figure RE-FDA0003473598360000024
where σ denotes a sigmoid activation function, f7×7A convolutional layer representing a convolutional kernel size of 7 × 7; ms(F) Outputting a characteristic map for the final spatial attention; AvgPool (F) is the average pooling operation based on the channel dimension for the feature map, and MaxPool (F) is the maximum pooling operation based on the channel dimension for the feature map;
Figure RE-FDA0003473598360000025
in order to maximize the pooling characteristics,
Figure RE-FDA0003473598360000026
is the average pooling characteristic.
7. The method for identifying the bridge side falling behavior based on depth understanding of claim 1, wherein in step S2, the warning region algorithm is as follows:
the first step is as follows: setting a background template of the alert area:
setting an image of the bridge floor guardrail region without a person as a background template, and inputting a feature map of the background template region into a deep learning detection network for training so that the network can adaptively find out an alarm ring region;
the second step is that: setting early warning characteristics:
performing image difference operation on image frames of people on the bridge floor guardrail and image frames of the background template, extracting difference information of the image templates and the background template of people on the guardrail as an early warning characteristic diagram, and inputting the early warning characteristic diagram into a detection network for training to enable the network to obtain an early warning effect; when someone climbs the bridge guardrail, the characteristics of the warning area correspondingly change into early warning characteristics, and at the moment, the detection model gives early warning;
the third step: screening out influence factors:
due to the influence of optical flow and rainwater, when garbage and sundries thrown down from a bridge floor and birds or other flying objects pass through a warning area, image characteristics can be changed, and the images can be used as early warning characteristics by a network to cause false detection; therefore, the optical flow rain characteristic, the garbage impurity characteristic and the characteristic when the flying birds or other flying objects skip the warning area are defined as influence factors and need to be screened out; firstly, performing differential operation on an image frame and a background template image frame when the influence factor characteristics of the warning area appear, extracting the differential information of the image of the guardrail influence factor and the background template as an influence factor characteristic diagram, and inputting the difference information into a detection network for training to enable the network to have the capability of identifying the influence factor; when the network detects that the characteristics of the video change but the influence factor, the influence factor is screened.
8. The bridge-side falling behavior recognition method based on depth understanding of claim 1, wherein in step S4, if no falling splash signal is generated, the method returns to step S3 to detect whether a person falls again, and if no person falls is detected, the person falling signal is determined to be misjudged; if the personnel falling signal is still generated, the step S4 is executed again to detect whether falling splash exists, and if the personnel falling signal is not generated, the personnel falling signal is judged to be misjudgment.
9. The bridge side falling behavior recognition method based on depth understanding of claim 1, wherein in step S6, the method for predicting the personnel drifting trajectory is as follows:
and (3) establishing a motion equation of the target drift by considering the wind speed when the person falling into the water falls into the water and the flow velocity of the water flow:
Figure RE-FDA0003473598360000031
wherein, VcIs a wind speed field, VwIs a water flow velocity field; x (t) is the position of the man falling into water at the moment, and x (t + delta t) is the position of the man falling into water after delta t time; the method comprises the steps of obtaining drifting data in various parameter forms by utilizing a dummy to carry out a simulated floating experiment in the river, and fitting a personnel drifting motion track according to the obtained data; unknown data of the wind speed field and the water flow velocity field are obtained by applying a Lagrange interpolation method, and prediction errors are further reduced; on the basis, a Monte Carlo simulation method is applied to simulate the drifting trajectory of the personnel.
CN202111088471.9A 2021-09-16 2021-09-16 Bridge side falling behavior identification method based on depth understanding Active CN114267082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111088471.9A CN114267082B (en) 2021-09-16 2021-09-16 Bridge side falling behavior identification method based on depth understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111088471.9A CN114267082B (en) 2021-09-16 2021-09-16 Bridge side falling behavior identification method based on depth understanding

Publications (2)

Publication Number Publication Date
CN114267082A true CN114267082A (en) 2022-04-01
CN114267082B CN114267082B (en) 2023-08-11

Family

ID=80824625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111088471.9A Active CN114267082B (en) 2021-09-16 2021-09-16 Bridge side falling behavior identification method based on depth understanding

Country Status (1)

Country Link
CN (1) CN114267082B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512098A (en) * 2022-09-26 2022-12-23 重庆大学 Electronic bridge inspection system and inspection method
CN115937506A (en) * 2023-03-09 2023-04-07 南京邮电大学 Method, system, device and medium for positioning bridge side falling point hole position information
CN116740649A (en) * 2023-08-07 2023-09-12 山东科技大学 Deep learning-based real-time detection method for behavior of crewman falling into water beyond boundary

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012032198A2 (en) * 2010-09-11 2012-03-15 Nieto Leon Jose Set of elements and parts for the assembly, extension and rapid modular conversion of vessels, rafts, floating gangways and bridges and temporary floating structures with multiple floats, in particular for aquatic emergencies
CN106828387A (en) * 2017-02-19 2017-06-13 谢永航 Automatic seeking help device and method are quickly positioned based on GPRS network vehicle water falling
US20180307912A1 (en) * 2017-04-20 2018-10-25 David Lee Selinger United states utility patent application system and method for monitoring virtual perimeter breaches
CN109410497A (en) * 2018-11-20 2019-03-01 江苏理工学院 A kind of monitoring of bridge opening space safety and alarm system based on deep learning
CN110148283A (en) * 2019-05-16 2019-08-20 安徽天帆智能科技有限责任公司 It is a kind of to fall water monitoring system in real time based on convolutional neural networks
CN110778265A (en) * 2019-10-08 2020-02-11 赵奕焜 Child safety protection artificial intelligence door and window system based on deep learning model
CN111626162A (en) * 2020-05-18 2020-09-04 江苏科技大学苏州理工学院 Overwater rescue system based on space-time big data analysis and drowning warning situation prediction method
CN111898514A (en) * 2020-07-24 2020-11-06 燕山大学 Multi-target visual supervision method based on target detection and action recognition
AU2020102906A4 (en) * 2020-10-20 2020-12-17 Zhan, Jinyu MISS A drowning detection method based on deep learning
CN113044184A (en) * 2021-01-12 2021-06-29 桂林电子科技大学 Deep learning-based water rescue robot and drowning detection method
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN113449675A (en) * 2021-07-12 2021-09-28 西安科技大学 Coal mine personnel border crossing detection method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012032198A2 (en) * 2010-09-11 2012-03-15 Nieto Leon Jose Set of elements and parts for the assembly, extension and rapid modular conversion of vessels, rafts, floating gangways and bridges and temporary floating structures with multiple floats, in particular for aquatic emergencies
CN106828387A (en) * 2017-02-19 2017-06-13 谢永航 Automatic seeking help device and method are quickly positioned based on GPRS network vehicle water falling
US20180307912A1 (en) * 2017-04-20 2018-10-25 David Lee Selinger United states utility patent application system and method for monitoring virtual perimeter breaches
CN109410497A (en) * 2018-11-20 2019-03-01 江苏理工学院 A kind of monitoring of bridge opening space safety and alarm system based on deep learning
CN110148283A (en) * 2019-05-16 2019-08-20 安徽天帆智能科技有限责任公司 It is a kind of to fall water monitoring system in real time based on convolutional neural networks
CN110778265A (en) * 2019-10-08 2020-02-11 赵奕焜 Child safety protection artificial intelligence door and window system based on deep learning model
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN111626162A (en) * 2020-05-18 2020-09-04 江苏科技大学苏州理工学院 Overwater rescue system based on space-time big data analysis and drowning warning situation prediction method
CN111898514A (en) * 2020-07-24 2020-11-06 燕山大学 Multi-target visual supervision method based on target detection and action recognition
AU2020102906A4 (en) * 2020-10-20 2020-12-17 Zhan, Jinyu MISS A drowning detection method based on deep learning
CN113044184A (en) * 2021-01-12 2021-06-29 桂林电子科技大学 Deep learning-based water rescue robot and drowning detection method
CN113449675A (en) * 2021-07-12 2021-09-28 西安科技大学 Coal mine personnel border crossing detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FENGYUAN JIANG: "Collision failure risk analysis of falling object on subsea pipelines based on machine learning scheme", 《ENGINEERING FAILURE ANALYSIS》, pages 1 - 22 *
成孝刚 等: "一种基于有界变分的树叶锯齿特征提取算法研究", 《数据采集与处理》, vol. 34, no. 1, pages 167 - 174 *
陈晗 等: "一种基于倒影图像检测的水域落水人员判断方法", 《电脑知识与技术》, vol. 14, no. 26, pages 175 - 180 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512098A (en) * 2022-09-26 2022-12-23 重庆大学 Electronic bridge inspection system and inspection method
CN115512098B (en) * 2022-09-26 2023-09-01 重庆大学 Bridge electronic inspection system and inspection method
CN115937506A (en) * 2023-03-09 2023-04-07 南京邮电大学 Method, system, device and medium for positioning bridge side falling point hole position information
CN116740649A (en) * 2023-08-07 2023-09-12 山东科技大学 Deep learning-based real-time detection method for behavior of crewman falling into water beyond boundary
CN116740649B (en) * 2023-08-07 2023-11-03 山东科技大学 Deep learning-based real-time detection method for behavior of crewman falling into water beyond boundary

Also Published As

Publication number Publication date
CN114267082B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN111368687B (en) Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation
Aboah A vision-based system for traffic anomaly detection using deep learning and decision trees
CN114267082B (en) Bridge side falling behavior identification method based on depth understanding
KR20200071799A (en) object recognition and counting method using deep learning artificial intelligence technology
CN111186379B (en) Automobile blind area dangerous object alarm method based on deep learning
CN107220603A (en) Vehicle checking method and device based on deep learning
CN111274886B (en) Deep learning-based pedestrian red light running illegal behavior analysis method and system
KR102122850B1 (en) Solution for analysis road and recognition vehicle license plate employing deep-learning
Chebrolu et al. Deep learning based pedestrian detection at all light conditions
CN110569755B (en) Intelligent accumulated water detection method based on video
CN116343077A (en) Fire detection early warning method based on attention mechanism and multi-scale characteristics
CN111178178B (en) Multi-scale pedestrian re-identification method, system, medium and terminal combined with region distribution
KR102186974B1 (en) Smart cctv system for analysis of parking
Lin Automatic recognition of image of abnormal situation in scenic spots based on Internet of things
CN112861762B (en) Railway crossing abnormal event detection method and system based on generation countermeasure network
CN113920585A (en) Behavior recognition method and device, equipment and storage medium
CN104200202B (en) A kind of upper half of human body detection method based on cumulative perceptron
Agarwal et al. Camera-based smart traffic state detection in india using deep learning models
CN109711313A (en) It is a kind of to identify the real-time video monitoring algorithm that sewage is toppled over into river
BOURJA et al. Real time vehicle detection, tracking, and inter-vehicle distance estimation based on stereovision and deep learning using YOLOv3
KR102143073B1 (en) Smart cctv apparatus for analysis of parking
CN111339934A (en) Human head detection method integrating image preprocessing and deep learning target detection
JP2019192201A (en) Learning object image extraction device and method for autonomous driving
CN113963438A (en) Behavior recognition method and device, equipment and storage medium
CN113793069A (en) Urban waterlogging intelligent identification method of deep residual error network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant