CN117953470B

CN117953470B - Expressway event identification method and device of panoramic stitching camera

Info

Publication number: CN117953470B
Application number: CN202410346321.0A
Authority: CN
Inventors: 郭斌; 颜世航; 陈洁涵; 王伟丽; 诸水清; 陈明
Original assignee: HANGZHOU GANXIANG TECHNOLOGY CO LTD
Current assignee: HANGZHOU GANXIANG TECHNOLOGY CO LTD
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-06-18
Anticipated expiration: 2044-03-26
Also published as: CN117953470A

Abstract

The invention provides a highway event identification method of a panoramic stitching camera, which comprises the following steps: s1, acquiring a highway monitoring video stream as a training sample, wherein the monitoring video stream is a video material of a panoramic 360-degree view formed by splicing a plurality of monitoring cameras; s2, constructing a fusion module comprising a target feature vector and a position vector and a space-time probability prediction module as a highway event recognition model; s3, training is carried out in the expressway event recognition model, each training sample is firstly input into a fusion module of a target feature vector and a position vector to realize fusion, then a space-time probability prediction module is adopted to predict and recognize the track of each type of target, semantic analysis of the expressway event is realized, and a trained expressway event recognition model is obtained; s4, acquiring a monitoring video stream to be detected, inputting the monitoring video stream to a training-completed expressway event recognition model for identifying expressway events, and further obtaining an expressway event recognition result.

Description

Expressway event identification method and device of panoramic stitching camera

Technical Field

The invention belongs to the technical field of computer vision, and relates to a highway event identification method and device for a panoramic stitching camera.

Background

The existing expressway event detection system mostly depends on a traditional monitoring camera, and can be applied to aspects such as traffic flow monitoring, traffic illegal behavior detection, accident monitoring, road jam monitoring, tunnel safety monitoring and the like. However, the following problems exist: (1) limited field of view: conventional monitoring cameras generally only monitor a specific area in a fixed manner, and have a limited field of view. The omnibearing and panoramic monitoring view angle cannot be provided; (2) low resolution: the resolution of the traditional monitoring camera is relatively low, and the details of the image are not clear enough. This can affect the ability to identify and analyze small objects or distant objects; (3) monitoring blind areas: because the view angle of the camera is limited, a monitoring blind area exists. This means that events in the blind zone cannot be captured, which presents a certain risk to the safety monitoring.

Disclosure of Invention

In order to overcome the problems of limited visual field, low resolution, blind area and the like of the traditional monitoring camera, the invention provides a highway event identification method and device of a panoramic spliced camera, which can provide more comprehensive information and help traffic management departments to discover and process various events in time; the efficiency and the accuracy of event detection are greatly improved, and the pressure of manual monitoring is reduced.

The technical scheme adopted by the invention is as follows:

the highway event identification method of the panoramic spliced camera comprises the following specific steps:

step S1, obtaining at least 10 sections of monitoring video streams of each type of expressways including congestion and slow running, abnormal parking, abnormal running, pedestrian non-motor vehicles and road surface casting objects as training samples, wherein the monitoring video streams are video materials of panoramic 360-degree vision formed by splicing a plurality of monitoring cameras;

Step S2, constructing a fusion module comprising a target feature vector and a position vector and a space-time probability prediction module as a highway event recognition model;

Step S3, inputting the training samples in the step S1 into the expressway event recognition model in the step S2 for training, wherein each training sample is firstly input into a fusion module of a target feature vector and a position vector to realize fusion, and then a space-time probability prediction module is adopted to predict and recognize the track of various targets so as to realize semantic analysis of expressway events and obtain a trained expressway event recognition model;

and S4, acquiring a monitoring video stream to be detected, inputting the monitoring video stream into the expressway event recognition model trained in the step S3 for identifying expressway events, and further obtaining the recognition result of the expressway events.

Further, the training samples in step S1 need to be pre-trained, specifically including:

extracting vehicle targets corresponding to trucks, cars and buses in each video image frame through a feature extraction network;

the vehicle object is marked and the image of the marked vehicle object is pre-trained to identify the vehicle object from the video image frames.

Further, the feature extraction network in step S1 adopts YoloV to input training sample V ^T with a time length T into the feature extraction network to output the extracted features, and the width and height of the input video image frame are set to be w×h.

Further, the construction of the expressway event recognition model in step S2 is specifically as follows:

The feature images output by Conv2, conv3, conv4 and Conv5 in YoloV network are C2, C3, C4 and C5 respectively, the output feature images are generated by the subsequent stage of YoloV network to form a multi-scale feature vector theta= [ theta ₁,θ₂,θ₃,θ₄ ] of each camera, the image frames of 4 monitoring cameras are spliced, and theta _i represents the feature vector of the No. 1-4 monitoring cameras;

screening and enhancing features of vehicles, non-motor vehicles and pedestrians in overlapping areas of adjacent monitoring cameras by adopting a cross-attention algorithm:

；

Wherein, In order to enhance the post feature vector, f is a cross-attention mechanism algorithm, which is specifically implemented as follows:

；

Wherein, For/>Is characterized by transpose; d _k denotes a vector matrix of vehicles, non-motor vehicles and pedestrians related to highway events, in order to prevent/>Is too large;

Meanwhile, position sensing is carried out among different camera views, and marking is carried out by adopting a position embedding vector omega= [ omega ₁,ω₂,ω₃,ω₄ ], wherein omega _i represents a position vector of a No. 1-4 monitoring camera;

Constructing a space-time probability prediction model:

Discrete time t= (m,) n., the spatiotemporal trajectory of the object within (i.e., from time m to time n, and m < n) is defined as:

；

wherein j ε I ^j,I^j represents a feature set that includes vehicles, non-motor vehicles, and pedestrians; Representing the characteristic space-time vector after fusion operation of the characteristic vector and the position vector acquired by the monitoring camera; /(I) Representing space-time vectors of various targets from time m to time n, wherein m is less than n; /(I)A space-time vector representing the occurrence of various targets at the moment m;

After the characteristic space-time vectors are obtained, classification is carried out according to the moving track probabilities of the vehicle, the non-motor vehicle and the pedestrian targets in different expressway events, so that the identification of expressway event categories is realized, and the track probability of q-type events is calculated as follows:

；

Wherein, y _i(P^j) is a probability function, and the training stage carries out the training of the probability function according to the known expressway event sample; the trajectory probability representing the q-class event occurring in the time span N, y _q(P^j) is the pre-policy probability, P ^j is the feature space-time vector, i is the number representing the monitoring camera,/> The probability of detecting a q-class event within a time span N;

Meanwhile, the detection stage can match the events in the weighted file expressway event class quilt domain obtained by training, and finally outputs the identified expressway event type, namely the expressway event identification result, which is expressed as:

；

Wherein, Detecting an event with the category q in the time t as a detection result;

the occurrence probability of the event with the category q is detected in the time t' for predicting the probability;

Is the weight obtained through training of the sequence length T.

Further, the monitoring video stream to be detected in step S4 is a video material based on a panoramic 360-degree field of view formed by splicing a plurality of monitoring cameras.

The highway event recognition device of the panoramic stitching camera is used for executing the highway event recognition method of the panoramic stitching camera, and comprises a processor, wherein the processor is internally provided with:

the system comprises a training sample acquisition module, a processing module and a processing module, wherein the training sample acquisition module is used for acquiring at least 10 sections of monitoring video streams of each type of expressways including congestion slow running, abnormal parking, abnormal running, pedestrian non-motor vehicles and road surface casting objects as training samples, and the monitoring video streams are video materials based on panoramic 360-degree visual fields formed by splicing a plurality of monitoring cameras;

The model building module is used for building a fusion module comprising a target feature vector and a position vector and a space-time probability prediction module as a highway event recognition model;

The model training module is used for inputting training samples into the expressway event recognition model for training, wherein each training sample is firstly input into the fusion module of the target feature vector and the position vector for realizing fusion, and then a space-time probability prediction module is used for predicting and recognizing the tracks of various targets to realize semantic analysis of expressway events, so that the expressway event recognition model after training is obtained;

The recognition result output module is used for inputting the monitoring video stream to be detected into the training-completed expressway event recognition model to recognize the expressway event, so as to obtain the recognition result of the expressway event.

Further, a feature extraction module is further arranged in the processor and used for pre-training the training text and extracting features of images of the marked vehicle targets.

Further, a memory for storing data or instructions is connected to the processor.

Further, the processor is connected with a transmission device for transmitting data with an external device.

Further, the processor is connected with an input/output device for inputting or outputting information.

The invention has the beneficial effects that: the panoramic stitching camera can provide a high-resolution and panoramic view angle by fusing the images of a plurality of cameras, and effectively monitor and record various traffic events and conditions occurring on the expressway. Firstly, the panoramic stitching camera can realize the omnibearing monitoring of the whole road section. By stitching the images of multiple cameras, a panoramic view can be obtained, covering a wider area. This is very helpful for detecting highway events such as traffic accidents, traffic violations, road jams, construction conditions, etc. Compared with the traditional camera which can only fixedly monitor a certain specific area, the panoramic spliced camera can provide more comprehensive information, and help traffic management departments to timely find and process various events. Second, panoramic stitching cameras may also provide high definition images and video. The spliced images of the cameras can provide higher resolution and more detailed details, so that the identification and analysis of traffic events are more accurate and reliable. This is of great importance for investigation of accident causes, evidence of illicit activity, evidence provision and investigation of deployment. In addition, the panoramic stitching camera can also be combined with image processing and intelligent algorithms to realize automatic event detection and identification. By analyzing and identifying the spliced images, events such as traffic accidents, vehicle illegal behaviors, road jams and the like can be automatically detected, and an alarm can be timely sent to related departments. This greatly improves the efficiency and accuracy of event detection, reducing the pressure of manual monitoring. In conclusion, the panoramic stitching camera has important application value in expressway event detection. The system can provide an omnibearing monitoring visual angle, high-definition images and videos and an automatic event detection and identification function, provides powerful support for traffic management departments, and improves the efficiency and accuracy of event processing.

Drawings

FIG. 1 is a schematic diagram of the construction logic of the highway event recognition model of the present invention;

Fig. 2 is a schematic diagram of a method for constructing a highway event recognition model according to the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples, without limiting the invention to these specific embodiments. It will be appreciated by those skilled in the art that the invention encompasses all alternatives, modifications and equivalents as may be included within the scope of the claims.

Example 1

Referring to fig. 1 and 2, the invention provides a highway event identification method of a panoramic stitching camera, which comprises the following specific steps:

S1, acquiring at least 10 sections of monitoring video streams of each type of expressways including congestion and slow running, abnormal parking, abnormal running, pedestrian non-motor vehicles and road surface casting objects as training samples, wherein the monitoring video streams are video materials of panoramic 360-degree vision formed by splicing a plurality of monitoring cameras;

The training samples need to be pre-trained, and specifically comprise:

the vehicle object is marked and the image of the marked vehicle object is pre-trained so that the vehicle object can be identified from the video image frames.

The feature extraction network in this embodiment adopts YoloV, inputs training sample V ^T with time length T into the feature extraction network to output the extracted features, and sets the width and height of the input video image frame to be w×h (412×412), and the output of each stage used by the YoloV backbone network is as shown in the following table one:

Output of each stage of a table-image segmentation network

S2, constructing a fusion module comprising a target feature vector and a position vector and a space-time probability prediction module as a highway event recognition model;

the highway event recognition model is constructed specifically as follows:

The feature images output by Conv2, conv3, conv4 and Conv5 in YoloV network are C2, C3, C4 and C5 respectively, the feature images are combined into an overall feature vector theta= [ theta ₁,θ₂,θ₃,θ₄ ] by generating multi-scale features of each camera in the subsequent stage of YoloV network, and in order to better acquire panoramic images, the invention adopts the steps of splicing image pictures of 4 monitoring cameras, wherein theta _i represents the feature vectors of No. 1-4 monitoring cameras;

because panoramic stitching is employed, adjacent cameras have overlapping areas; in order to effectively extract the characteristics of the overlapping area of the adjacent monitoring pictures, the cross attention algorithm is adopted to screen and strengthen the characteristics of vehicles, non-motor vehicles and pedestrians in the overlapping area of the adjacent monitoring cameras:

；

Meanwhile, the panoramic camera based on splicing needs to perform position sensing among different camera views, the position sensing is performed among different camera views, and the position sensing is marked by adopting a position embedding vector omega= [ omega ₁,ω₂,ω₃,ω₄ ], wherein omega _i represents the position vector of the No. 1-4 monitoring camera;

expressway event identification based on panoramic stitching cameras needs to be combined with feature vectors and position vectors extracted by each camera, comprehensive judgment is carried out according to changes of space-time trajectories of vehicles, non-motor vehicles and pedestrians, and a space-time probability prediction model is constructed:

；

Meanwhile, the detection stage matches the events in the weighted file expressway event class quilt area obtained by training, and finally outputs the identified expressway event type, namely the expressway event identification result, which is expressed as:

；

Is the weight obtained through training of the sequence length T. The weight is obtained through training, the invention comprises two parts of training and recognition application, the weight is obtained through training, and then recognition is carried out through the weight.

S3, inputting the training samples in the step S1 into the expressway event recognition model in the step S2 for training, wherein each training sample is firstly input into a fusion module of a target feature vector and a position vector to realize fusion, and then a space-time probability prediction module is adopted to predict and recognize the track of various targets so as to realize semantic analysis of expressway events and obtain a trained expressway event recognition model;

S4, acquiring a monitoring video stream to be detected, inputting the monitoring video stream into the expressway event recognition model trained in the step S3 for identifying expressway events, and further obtaining the recognition result of the expressway events. The monitoring video stream to be detected is a video material based on a panoramic 360-degree visual field formed by splicing a plurality of monitoring cameras.

According to the invention, characteristic vector and position vector information in pictures are collected according to the spliced panoramic camera; then, track prediction and detection are carried out on objects including vehicles, non-motor vehicles, pedestrians and the like based on the space-time probability model, so that recognition and detection on expressway events are realized.

The panoramic stitching camera can provide a high-resolution and panoramic view angle by fusing the images of a plurality of cameras, and effectively monitor and record various traffic events and conditions occurring on the expressway. Firstly, the panoramic stitching camera can realize the omnibearing monitoring of the whole road section. By stitching the images of multiple cameras, a panoramic view can be obtained, covering a wider area. This is very helpful for detecting highway events such as traffic accidents, traffic violations, road jams, construction conditions, etc. Compared with the traditional camera which can only fixedly monitor a certain specific area, the panoramic spliced camera can provide more comprehensive information, and help traffic management departments to timely find and process various events. Second, panoramic stitching cameras may also provide high definition images and video. The spliced images of the cameras can provide higher resolution and more detailed details, so that the identification and analysis of traffic events are more accurate and reliable. This is of great importance for investigation of accident causes, evidence of illicit activity, evidence provision and investigation of deployment. In addition, the panoramic stitching camera can also be combined with image processing and intelligent algorithms to realize automatic event detection and identification. By analyzing and identifying the spliced images, events such as traffic accidents, vehicle illegal behaviors, road jams and the like can be automatically detected, and an alarm can be timely sent to related departments. This greatly improves the efficiency and accuracy of event detection, reducing the pressure of manual monitoring. In conclusion, the panoramic stitching camera has important application value in expressway event detection. The system can provide an omnibearing monitoring visual angle, high-definition images and videos and an automatic event detection and identification function, provides powerful support for traffic management departments, and improves the efficiency and accuracy of event processing.

Example two

The embodiment provides a highway event recognition device of a panoramic stitching camera, which is used for executing the highway event recognition method of the panoramic stitching camera, and comprises a processor, wherein the processor is internally provided with:

the feature extraction module is used for pre-training the training text and extracting features of an image of a marked vehicle target;

The processor of this embodiment may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.

The processor in this embodiment is connected with a memory for storing data or instructions. The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a hard disk drive (HARD DISK DRIVE, abbreviated HDD), a floppy disk drive, a Solid state drive (Solid STATE DRIVE, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a universal serial bus (Universal Serial Bus, abbreviated USB) drive, or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a Non-Volatile (Non-Volatile) memory. In particular embodiments, the Memory includes Read-Only Memory (ROM) and random access Memory (Random Access Memory RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (ELECTRICALLY ALTERABLE READ-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory, FPMDRAM), an extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory, EDODRAM), a synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory, SDRAM), or the like, as appropriate.

The memory may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by the processor. The processor reads and executes the computer program instructions stored in the memory.

The processor in this embodiment is connected with a transmission device for transmitting data with an external device. The transmitting device may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through the base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

The processor in this embodiment is connected with an input/output device for inputting or outputting information. The input-output device is used for inputting or outputting information. In this embodiment, the input information may be a surveillance video stream, and the output information may be a current traffic state of a highway toll station, and the like.

Example III

The expressway event identification method of the panoramic stitching camera provided by the embodiment is different from that of the first embodiment in that:

The method for identifying the expressway event is applied to identifying the expressway traffic jam event, and specifically comprises the following steps:

Congestion detection is combined with scene panorama stitching, firstly, images acquired by each camera are stitched into a panoramic picture, the width and the height of each video are set to be 412 x 412, then feature graphs C2, C3, C4 and C5 are output through YoloV networks Conv2, conv3, conv4 and Conv5, and the feature graphs are combined into a global feature vector through the multi-scale features of each camera generated in the subsequent stage of the YoloV network The θ vector is in (412/32) × (412/32) format, as shown in Table two below:

Output of each stage of the table two congestion image segmentation network

Features of the vehicle are then filtered and enhanced, and then position-embedded vectors are employedMarking is carried out.

In this embodiment, the panoramic camera based on stitching needs to perform position sensing between different camera views, that is, position embedding vectors are adoptedLabeling, wherein/>Representing the position vectors of the monitoring cameras No. 1-4.

Then judging whether the current congestion situation occurs or not according to the running track of the vehicle in each monitoring picture, namely, for the congestion eventWherein the time span is n=180 seconds, that is, the length of the sliding window is judged, and y is the detection result of the congestion event type, that is, the congestion event is detected within 180 seconds from the time t. /(I)To predict the probability, i.e., the occurrence probability of an event with a congestion category is detected at time t' (when the probability is greater than 70%, the occurrence of the event is determined).

The foregoing examples illustrate only a few embodiments of the application, which are described in greater detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. The highway event identification method of the panoramic spliced camera is characterized by comprising the following specific steps of:

step S2, constructing a fusion module comprising a target feature vector and a position vector and a space-time probability prediction module as a highway event recognition model; the construction of the expressway event recognition model is specifically as follows:

θ′_i＝f(θ_i-1,θ_i,θ_i+1)

Wherein θ' _i is the enhanced feature vector, and f is the cross-attention mechanism algorithm, which is specifically implemented as follows:

Wherein, Transpose the features of θ _i+1; d _k denotes a vector matrix of vehicles, non-maneuvers and pedestrians related to highway events in order to prevent/>Is too large;

Constructing a space-time probability prediction model:

The spatiotemporal trajectory of the object within the discrete time t= (m...degree., n) is defined as:

After the characteristic space-time vectors are obtained, classification is carried out according to the moving track probabilities of the vehicle, non-maneuver and pedestrian targets in different expressway events, so that the identification of expressway event categories is realized, and the track probability of q-type events is calculated as follows:

Meanwhile, the detection stage matches the events in the weighted file expressway event class passive domain obtained by training, and finally outputs the identified expressway event type, namely the expressway event identification result

Wherein C ^t,q is a detection result, namely an event with the category of q is detected in t time;

The weight is obtained through training of the sequence length T;

2. The highway event recognition method of the panoramic stitching camera according to claim 1, wherein: the training samples in step S1 need to be pre-trained, specifically including:

3. The highway event recognition method of the panoramic stitching camera according to claim 2, wherein: the feature extraction network in step S1 adopts YoloV to input training sample V ^T with time length T into the feature extraction network to output the extracted features, and the width and height of the input video image frame are set to be w×h.

4. The method for identifying highway events by using the panoramic stitching camera according to claim 1, wherein: the monitoring video stream to be detected in the step S4 is a video material based on a panoramic 360-degree view formed by splicing a plurality of monitoring cameras.

5. A highway event recognition device of a panoramic stitching camera for performing the highway event recognition method of a panoramic stitching camera as recited in any one of claims 1-4, the recognition device comprising a processor, wherein: the processor is internally provided with:

The model building module is used for building a fusion module and a space-time probability prediction module which comprise target feature vectors and position vectors to serve as a highway event recognition model;

6. The highway event recognition device of the panoramic stitching camera of claim 5, wherein: and the processor is also internally provided with a feature extraction module which is used for pre-training the training text and extracting the features of the image of the marked vehicle target.

7. The highway event recognition device of the panoramic stitching camera of claim 5, wherein: the processor is connected with a memory for storing data or instructions.

8. The highway event recognition device of the panoramic stitching camera of claim 5, wherein: the processor is connected with a transmission device for transmitting data with an external device.

9. The highway event recognition device of the panoramic stitching camera of claim 5, wherein: the processor is connected with an input/output device for inputting or outputting information.