Disclosure of Invention
Accordingly, the invention provides a vehicle detection method, a vehicle detection device, an electronic device and a storage medium, which can improve the effectiveness of vehicle detection in an image.
In a first aspect, a vehicle detection method is provided, including:
acquiring a traffic image in a traffic scene acquired by a traffic camera;
copying a first number of the traffic images, sampling and dividing each traffic image at intervals of a second number of rows or columns to divide each traffic image into a third number of image blocks, merging the image blocks according to a preset merging rule, and generating a detection image;
and extracting characteristic information in the detection image, and detecting vehicle information of the traffic image according to the characteristic information.
Optionally, the extracting feature information in the detected image, and detecting vehicle information of the traffic image according to the feature information includes:
extracting feature information of the detection image;
performing feature recombination according to the feature information to generate image features;
and determining the vehicle information according to the image characteristics.
Optionally, the performing feature recombination according to the feature information to generate an image feature includes:
respectively carrying out up-sampling and down-sampling on the feature information for a plurality of times to obtain a plurality of up-sampling features and down-sampling features;
and sequentially connecting the up-sampling feature and the down-sampling feature according to a preset connection rule to generate the image feature.
Optionally, the pixel gray value after the up-sampling and down-sampling transformation is equal to the average of the gray values of the two input pixels closest to the transformed pixel.
Optionally, the determining the vehicle information according to the image feature includes:
dividing the traffic image into a preset number of grids;
acquiring prediction parameters of at least two target frames in each grid according to the image characteristics, wherein each target frame comprises at least two target categories, the prediction parameters comprise coordinates of a center point of the target frame, width and height and confidence, and the target categories are used for indicating whether the grid content belongs to a vehicle or not;
filtering frames with the confidence coefficient smaller than a preset threshold value in all the target frames;
and carrying out non-maximum suppression processing on the reserved target frames, and determining the position of the target frame with the highest confidence coefficient of each target type as the vehicle information.
Optionally, after acquiring the traffic image in the traffic scene acquired by the traffic camera, the method further includes the steps of;
acquiring the original brightness of each pixel point of the traffic image, and calculating the average brightness of each pixel point of the traffic image;
calculating the difference value of each original brightness and the average brightness of each pixel point, and the sum value of a preset enhancement value and one;
solving a product value of the difference value and the sum value, and summing the product value and the average brightness to obtain a brightness value of each pixel point;
and adjusting the original brightness of each pixel point according to the brightness value obtained by each pixel point.
Optionally, after acquiring the traffic image in the traffic scene acquired by the traffic camera, the method further includes:
and taking each pixel point of the traffic image as a central pixel point, selecting a filtering window with a preset size, filtering a maximum pixel point and a minimum pixel point in the filtering window, obtaining the average pixel value of the remaining pixel points and the weight of each remaining pixel point, carrying out normalization processing, carrying out weighted summation on the pixel value of each remaining pixel point and the corresponding weight, and taking the obtained pixel value as the filtered pixel value of the central pixel point in the filtering window.
In a second aspect, there is provided a vehicle detection apparatus comprising:
the acquisition module is used for acquiring a traffic image;
the processing module is used for processing the traffic image according to a preset processing rule to generate a detection image, wherein the processing method comprises sampling, dividing and combining;
and the execution module is used for extracting the characteristic information in the detection image and detecting the vehicle information of the traffic image according to the characteristic information.
Optionally, the vehicle detection device further includes:
the first processing submodule is used for extracting the characteristic information of the detection image;
the second processing submodule is used for carrying out feature recombination according to the feature information to generate image features;
and the first execution sub-module is used for determining the vehicle information according to the image characteristics.
Optionally, the vehicle detection device further includes:
the third processing submodule is used for respectively carrying out up-sampling and down-sampling on the feature information for a plurality of times to obtain a plurality of up-sampling features and down-sampling features;
and the fourth processing submodule is used for sequentially connecting the up-sampling feature and the down-sampling feature according to a preset connection rule to generate the image feature.
Optionally, in the vehicle detection apparatus, the pixel gray scale value after the up-sampling and the down-sampling conversion is equal to an average of gray scale values of two input pixels closest to the converted pixel.
Optionally, the vehicle detection device further includes:
the fifth processing submodule is used for dividing the traffic image into a preset number of grids;
the first detection submodule is used for detecting the prediction parameters of at least two target frames in each grid according to the image characteristics, wherein each target frame comprises at least two target categories, the prediction parameters comprise coordinates of a central point of the target frame, width and height and confidence, and the target categories are used for indicating whether the grid content belongs to a vehicle or not;
the first filtering submodule is used for filtering frames with the confidence coefficient smaller than a preset threshold value in all the target frames;
and the second execution submodule is used for carrying out non-maximum suppression processing on the reserved target frames and determining the position of the target frame with the highest confidence coefficient of each target category as the vehicle information.
Optionally, the vehicle detection device further includes:
the first obtaining submodule is used for obtaining the original brightness of each pixel point of the traffic image and calculating the average brightness of each pixel point of the traffic image;
the first calculation submodule is used for calculating the difference value of each original brightness and the average brightness of each pixel point and the sum value of a preset enhancement value and one;
the second calculation submodule is used for solving a product value for the difference value and the sum value and summing the product value and the average brightness to obtain a brightness value of each pixel point;
and the third execution submodule is used for adjusting the original brightness of each pixel point according to the brightness value obtained by each pixel point.
Optionally, the vehicle detection device further includes:
and the sixth processing submodule is used for selecting a filtering window with a preset size by taking each pixel point of the traffic image as a central pixel point, filtering a maximum value pixel point and a minimum value pixel point in the filtering window, acquiring the mean value pixel value of the residual pixel points and the weight of each residual pixel point, carrying out normalization processing, carrying out weighted summation on the pixel value of each residual pixel point and the corresponding weight, and taking the obtained pixel value as the filtered pixel value of the central pixel point in the filtering window.
In a third aspect, an electronic device is provided, a processor and a memory for storing processor-executable instructions; wherein the memory has stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of any of the vehicle detection methods described above.
In a fourth aspect, a non-transitory computer readable storage medium is provided, the storage medium having stored therein computer readable instructions, which when executed by one or more processors, perform the steps of the vehicle detection method of any of the above.
According to the vehicle detection method, the vehicle detection device, the electronic equipment and the storage medium, the traffic image is processed, the detection image is generated in a sampling, dividing and combining mode, and feature extraction is performed according to the detection image. Compared with the method that the traffic image is directly used for feature recognition, the detection image subjected to sampling segmentation and merging is used as the input image, the sensing precision of the model on the input image can be effectively improved, and the comprehensiveness of the model for extracting features is improved. And then the vehicle information in the traffic image is detected according to the characteristics with higher comprehensiveness, so that the effectiveness of vehicle detection in the traffic image is improved.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
As will be appreciated by those skilled in the art, "terminal" as used herein includes both devices that are wireless signal receivers, devices that have only wireless signal receivers without transmit capability, and devices that include receive and transmit hardware, devices that have receive and transmit hardware capable of performing two-way communication over a two-way communication link. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (personal communications service), which may combine voice, data processing, facsimile and/or data communications capabilities; a PDA (personal digital assistant), which may include a radio frequency receiver, a pager, internet/intranet access, web browser, notepad, calendar and/or GPS (global positioning system) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal" or "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. As used herein, a "terminal device" may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (mobile internet device), and/or a mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.
Referring to fig. 1, fig. 1 is a basic flow chart of the vehicle detection method according to the embodiment.
As shown in fig. 1, a vehicle detection method includes:
and S1100, acquiring a traffic image in a traffic scene acquired by a traffic camera.
The method comprises the steps that image information in a traffic scene is collected in real time through a traffic camera, and an image is collected at certain time intervals (such as 0.1S) to serve as a traffic image; or carrying out whole-course video recording, and extracting a frame of image from the video at a certain time interval (for example, 0.1S) or at a certain frame number (for example, 5 frames) as a traffic image; each frame in the video may be a traffic image or the like.
S1200, copying the traffic images of the first quantity, sampling and dividing each traffic image at intervals of rows or columns of the second quantity so as to divide each traffic image into image blocks of the third quantity, and combining the image blocks according to a preset combination rule to generate a detection image.
Before the copying, the traffic image is first adjusted to a preset image size, for example, the resolution is 640 × 640, 608, 320 × 320, etc., but is not limited thereto, and may be set according to actual requirements. In the embodiment, the resolution of the traffic image is adjusted to 608 × 608 as an example.
The traffic image after the size adjustment is copied, and the first number of copies can be adjusted according to the actual application requirements, for example, 4 copies, but not limited thereto, and when the accuracy of feature extraction needs to be increased, the value of the first number can be increased, for example, 9 copies can be added. This example takes 4 copies as an example, each copy being 608 x 3 (representing resolution 608 x 608, 3 channels).
Each traffic image is sampled and divided into a third number of image blocks, and the embodiment takes the division into 16 image blocks as an example, and equally divides the traffic image into 16 image blocks of 4 × 4.
The preset merging rule is a rule for merging image blocks into a new image, such as merging images of the same region, or interleaving merging, but not limited thereto. In the present application, interlaced merging is taken as an example, as shown in fig. 2, the left side of fig. 2 is a schematic diagram of dividing a traffic image into 16 image blocks, and images are stitched every other image block to obtain four images on the right side of fig. 2. The numbers in fig. 2 are used to facilitate understanding of the position relationship between the spliced new image and the original traffic image. The first number of traffic images obtained by duplication are processed in the same manner. In the present embodiment, each traffic image 608 × 3 (representing the resolution 608 × 608, 3 channels) is processed into 304 × 12 (representing the resolution 304 × 304, 12 channels) images, and 4 images of 304 × 12 are obtained as detection images.
In some embodiments, after the traffic image is processed into 304 × 12 images, the images are subjected to 32 convolution kernels, and finally become 304 × 32 images.
The image segmentation is not limited to the above-described method, and may also be performed by an interlaced segmentation method, specifically including:
when dividing each traffic image, an interlaced sampling division method may be adopted, and in this embodiment, a row or a column refers to a row or a column of image pixels. Specifically, taking the second number as 1 as an example, sampling is performed every other row or column, the image is divided according to the sampling, and the rows and the columns are divided by one sampling to obtain 4 images of the original 1/4. Similarly, the same operation is performed for each channel of each image.
The value of the second number may be adjusted for the scene according to actual use, and may be set to, for example, every two samples, and the value of the second number will affect the number of the images obtained after the segmentation, for example, when the second number is 1, the second number indicates every two samples, and each image may be segmented into 4 images of the original image 1/4; the second number is 2, which indicates two samples apart, and each image can be divided into 9 images of the original 1/9. And defining an image obtained after sampling and dividing as the image block.
S1300, extracting the characteristic information in the detection image, and detecting the vehicle information of the traffic image according to the characteristic information.
And inputting the detection image into a vehicle detection model trained to be convergent, extracting characteristic information of a characteristic extraction part of the vehicle detection model, and extracting vehicle information by the vehicle detection model according to the characteristic information and the traffic image.
The vehicle detection model is a neural network model for extracting and integrating features of an input image and processing the features to finally obtain vehicle information in the image. In the application, the vehicle detection model comprises a feature extraction part and a feature processing part, wherein the feature extraction part is used for extracting feature information in a detection image, and the feature processing part is used for detecting vehicle information in a traffic image according to the feature information. The vehicle information includes a vehicle type (e.g., a car or a truck), vehicle position information, and the like, and the vehicle position information may be represented as pixel coordinate values in the image, for example, coordinates of the vehicle in the traffic image are marked with a lower left corner of the traffic image as an origin. The vehicle position information may also be represented in a form of a block diagram, for example, a rectangular frame indicates the position of the vehicle in the image, and when the form of the block diagram is adopted, a certain point (e.g., a corner point or a center point) of the block diagram may be used as a coordinate reference point, and the length and width values of the block diagram are combined, so as to determine the position of the block diagram in the traffic image, so as to represent the vehicle position information. And detecting all possible vehicle position information in the traffic image through the vehicle detection model as vehicle information of the traffic image.
When the vehicle detection model is trained, a certain amount (for example, 10 ten thousand) of detection images are used as a training sample set, and each detection image is marked with vehicle information therein. And inputting the training sample set into a preset neural network model, adjusting the weight in the neural network model according to the detection classification result of the neural network model and the marking data of the detection image until the value of the loss function of the model on the verification set does not decrease any more, and determining that the vehicle detection model is trained to be convergent.
The traffic image is processed, a detection image is generated in a sampling segmentation and combination mode, and feature extraction is performed according to the detection image. Compared with the method that the traffic image is directly used for feature recognition, the detection image subjected to sampling segmentation and merging is used as the input image, the sensing precision of the model on the input image can be effectively improved, and the comprehensiveness of the model for extracting features is improved. And then the vehicle information in the traffic image is detected according to the characteristics with higher comprehensiveness, so that the effectiveness of vehicle detection in the traffic image is improved.
As shown in fig. 3, S1300, extracting feature information in the detected image, and detecting vehicle information of the traffic image according to the feature information specifically includes:
s1310, extracting characteristic information of the detection image;
and inputting the detection image into a preset vehicle detection model, wherein the vehicle detection model is a neural network model for extracting and integrating the characteristics of the input image and processing the input image to finally obtain the vehicle information in the image. In the application, the vehicle detection model comprises a feature extraction part and a feature processing part, wherein the feature extraction part is used for extracting feature information in a detection image, and the feature processing part is used for detecting vehicle information in a traffic image according to the feature information.
After the detection image is input into the vehicle detection model, the feature information of the detection image is extracted according to the feature extraction part of the vehicle detection model.
S1320, performing feature recombination according to the feature information to generate image features;
and the characteristic processing part of the vehicle detection model performs up-and-down sampling on the extracted characteristic information, and then performs characteristic recombination on the characteristics obtained by up-and-down sampling in sequence to generate a new characteristic diagram as an image characteristic. The up-down sampling may be bilinear interpolation, nearest neighbor interpolation, or the like, but is not limited thereto.
S1330, determining the vehicle information according to the image features;
the feature processing section of the vehicle detection model detects vehicle information in the traffic image in combination with the image feature and the traffic image.
In this application, the vehicle information includes a vehicle type (e.g., a car or a truck), vehicle position information, and the like, and the vehicle position information may be represented as pixel coordinate values in the image, for example, with a lower left corner of the traffic image as an origin, indicating coordinates of the vehicle in the traffic image. The vehicle position information may also be represented in a form of a block diagram, for example, a rectangular frame indicates the position of the vehicle in the image, and when the form of the block diagram is adopted, a certain point (e.g., a corner point or a center point) of the block diagram may be used as a coordinate reference point, and the length and width values of the block diagram are combined, so as to determine the position of the block diagram in the traffic image, so as to represent the vehicle position information. And detecting all possible vehicle position information in the traffic image through the vehicle detection model as vehicle information of the traffic image.
In some embodiments, as shown in fig. 4, S1320, performing feature reorganization according to the feature information to generate an image feature, specifically including:
s1321, respectively carrying out up-sampling and down-sampling on the feature information for a plurality of times to obtain a plurality of up-sampling features and down-sampling features;
and performing up-sampling and down-sampling on the characteristic information for a plurality of times, wherein the up-sampling and the down-sampling times are equal. In this embodiment, two upsampling features and two downsampling features are obtained by taking two upsampling and two downsampling as an example. The up-down sampling frequency can be adjusted according to the actual application requirement, for example, when the extraction effect of the image features is to be improved, the up-down sampling frequency can be increased.
The up-sampling in the present application indicates a process of changing from a small image to a large image while the down-sampling indicates a process of changing from a large image to a small image while the large image is being reduced.
In up-sampling, the image is expanded in the width and length direction by 2 times that of the original image, and the newly added rows and columns are supplemented by interpolation, for example, in the nearest neighbor interpolation method, the gray level of the new pixel is equal to the gray level of the pixel nearest thereto. Through two times of upsampling, the image is continuously enlarged from bottom to top to obtain two upsampling characteristics, the image is recorded as D, the characteristic obtained by the first time of upsampling is U1, the characteristic obtained by the second time of upsampling is U2, and D is smaller than U1 and smaller than U2 in size.
When downsampling is performed, the image is reduced to 1/2 in both the length and width directions, and the generated downsampling feature is 1/4 of the original image. The downsampling can directly delete all even rows and columns to directly generate the downsampling characteristic of the original image 1/4, or can adopt a Gaussian filtering mode to perform weighted average on the image, and each pixel point is obtained by performing weighted average on the pixel point and other pixel values in the field. And after two times of downsampling, the image is continuously reduced from top to bottom to obtain two downsampling characteristics, the image is recorded as D, the characteristic obtained by the first time of upsampling is S1, the characteristic obtained by the second time of downsampling is S2, and D is larger than S1 and is larger than S2 in size.
S1322, sequentially connecting the up-sampling feature and the down-sampling feature according to a preset connection rule to generate the image feature;
the connection rule is that the high-level features of low-resolution and high-semantic information and the low-level features of high-resolution and low-semantic information are connected from top to bottom in a side edge mode, so that the features in all sizes have rich semantics. In this embodiment, after upsampling, the original image D is respectively D, U1 and U2 from bottom to top, and after downsampling, the original image D is respectively D, S1 and S2 from top to bottom, and the original image D is respectively connected from top to bottom, that is, D is connected to S2, U1 is connected to S1, and D is connected to the side of U2, so that fusion of feature maps is realized, and a new feature map is obtained. And performing the above processing on each feature in the feature information to obtain all new features as image features.
In order to improve the semantic richness after feature fusion, new features generated by fusion can be sampled and fused up and down again, and a new feature graph formed after iteration for a certain number of times is used as an image feature.
In some embodiments, the up-sampled and down-sampled transformed pixel gray values are equal to the mean of the gray values of the two input pixels closest to the transformed pixel. By using the method for sampling, more information of the original image can be reserved, meanwhile, the calculation amount is reduced, and the calculation rate is improved.
The method for detecting the vehicle information in the traffic image is not limited to the method of directly classifying by using the neural network model, for example, in some embodiments, as shown in fig. 5, S1330 determines the vehicle information according to the image features, specifically including:
s1331, dividing the traffic image into a preset number of grids;
the traffic image is adjusted to a preset size and normalized, for example, adjusted to 608 × 3, which indicates that there are 608 pixels in the horizontal direction and the vertical direction, and three channels of red, green, and blue are superimposed, and then the image is divided into grids of a preset number, in this embodiment, the number of the grids is 7 × 7, for example, the number of the grids may be adjusted according to needs in practical application, for example, the number of the grids may be increased (such as 9 × 9, but not limited thereto) when the detection accuracy needs to be improved.
S1332, based on the image characteristics, detecting prediction parameters of at least two target frames in each grid, wherein each target frame comprises at least two target categories, the prediction parameters comprise coordinates of a central point of the target frame, width and height and confidence, and the target categories are used for indicating whether the grid content belongs to a vehicle;
in the embodiment, the target categories include four, for example, car, truck, motorcycle, and bus, and the number and types of the target frames and the target categories may be adjusted according to the requirements in practical application, for example, the target frames may further include three-wheel vehicles.
The prediction parameters of two target frames predicted by each mesh and the probability (for convenience of description, denoted as pr, when the mesh contains the target category, pr is 1, and when the mesh does not contain the target category, pr is 0) that the mesh belongs to each target category are characterized by the probability that the mesh content belongs to each target category. In the present embodiment, the image feature corresponds to the object type of the mesh, and when the image feature includes four object classes, for example, the probability that the image feature belongs to each of the four object classes is included in the image feature. Specifically, the prediction parameters include the center point coordinate, the width and the height of the target frame and the confidence coefficient of the target frame, where the confidence coefficient S is the product of pr and IOU (overlap degree), and since pr is 1 or 0, the confidence coefficient S has only two values, i.e., 0 or IOU itself. The predicted value of the center point coordinate is an offset value relative to the origin of coordinates in the image (e.g., but not limited to, the upper left corner of the image), denoted as (x, y), and the predicted value of the width and height of the target frame is the ratio of the width to the height relative to the entire image. And detecting each style by the method, and outputting the corresponding prediction parameters of the target frame.
In a specific embodiment, the traffic image is input into a preset neural network model, an input layer of the neural network model normalizes the image, the size of the processed image is fixed to 608 × 3, then the convolution operation is performed on the processed image of the input layer after 24 convolution layers, the feature information of the input layer is substantially extracted for subsequent classification and positioning processing, wherein the convolution kernel can be set to be 3 × 3 and 1 × 1, the number of convolution kernel channels can be reduced by using the convolution kernel of 1 × 1, and parameters generated by the network are reduced. And a pooling layer is arranged between the convolution layers, and downsampling processing is carried out on input data in a characteristic space, namely the characteristic is divided according to the set granularity block according to the spatial position of the characteristic matrix, a new characteristic value is calculated in a small block, and the information in the original block is replaced. Two full connection layers are arranged between the last layering layer and the output layer, a two-dimensional matrix for extracting features is converted into a multi-dimensional matrix, and all input and network parameters are connected and operated. The last layer of the neural network model is an output layer, the one-dimensional vectors output by the fully connected layers are classified and output, the number of the output feature maps is the classification number of the targets, and in the above embodiment, the one-dimensional vectors are 7 × 2 × 5+4 one-dimensional vectors, where 7 × 7 is the number of grids, 2 is the number of frames of the targets, 5 is the number of prediction parameters, and 4 is the number of target classes to be predicted.
S1333, filtering all frames with the confidence level smaller than a preset threshold value in the target frames;
the confidence threshold is set in the system and used for filtering the target frame with smaller confidence to reduce the data volume, and the confidence threshold can be adjusted according to the actual application requirements, for example, when less data volume needs to be reserved, the value of the confidence threshold can be improved.
S1334, performing non-maximum suppression processing on the reserved target frames, and determining the position of the target frame with the highest confidence coefficient of each target type as the vehicle information;
the filtered and protected target frames (i.e., frames with confidence greater than the confidence threshold) are subjected to Non-maximum suppression processing (NMS), which may be processed by an NMS algorithm, and finally, the positions of the target frames with the highest confidence of each target category, i.e., the coordinates and length and width of the center point corresponding to each target frame, are output as vehicle position information. And outputting all the vehicle position information and the corresponding vehicle types in the traffic image as vehicle information.
In a specific embodiment, after acquiring a traffic image in a traffic scene acquired by a traffic camera, the method further comprises the following steps of performing contrast enhancement on the traffic image;
acquiring the original brightness of each pixel point of the traffic image;
calculating the average brightness of each pixel point of the traffic image;
acquiring a preset enhancement value;
calculating the difference value between each original brightness and the average brightness of each pixel point and the sum value of the enhancement value and one;
solving a product value for the difference value and the sum value;
summing the product value and the average brightness to obtain a brightness value of each pixel point;
and adjusting the original brightness of each pixel point according to the brightness value obtained by each pixel point, thereby realizing the contrast enhancement of the traffic image.
In a specific embodiment, after acquiring a traffic image in a traffic scene acquired by a traffic camera, performing weighted mean filtering processing on the traffic image;
the weighted mean filtering process includes:
and taking each pixel point of the traffic image as a central pixel point, selecting a filtering window with a preset size, filtering a maximum pixel point and a minimum pixel point in the filtering window, obtaining the average pixel value of the remaining pixel points and the weight of each remaining pixel point, carrying out normalization processing, carrying out weighted summation on the pixel value of each remaining pixel point and the corresponding weight, and taking the obtained pixel value as the filtered pixel value of the central pixel point in the filtering window.
In order to solve the technical problem, the embodiment of the invention further provides a vehicle detection device. Referring to fig. 6, fig. 6 is a block diagram of a basic structure of the vehicle detecting device according to the present embodiment.
As shown in fig. 6, the vehicle detection device includes: an acquisition module 2100, a processing module 2200, and an execution module 2300. The acquisition module is used for acquiring a traffic image; the processing module is used for processing the traffic image according to a preset processing rule to generate a detection image; the execution module is used for extracting the characteristic information in the detection image and detecting the vehicle information of the traffic image according to the characteristic information.
The traffic image is processed, a detection image is generated in a sampling segmentation and combination mode, and feature extraction is performed according to the detection image. Compared with the method that the traffic image is directly used for feature recognition, the detection image subjected to sampling segmentation and merging is used as the input image, the sensing precision of the model on the input image can be effectively improved, and the comprehensiveness of the model for extracting features is improved. And then the vehicle information in the traffic image is detected according to the characteristics with higher comprehensiveness, so that the effectiveness of vehicle detection in the traffic image is improved.
In some embodiments, the vehicle detection device further comprises: the device comprises a first processing submodule, a second processing submodule and a first execution submodule. The first processing submodule is used for extracting the characteristic information of the detection image; the second processing submodule is used for carrying out feature recombination according to the feature information to generate image features; the first execution submodule is used for determining the vehicle information according to the image characteristics.
In some embodiments, the vehicle detection device further comprises: a third processing submodule and a fourth processing submodule. The third processing submodule is used for respectively carrying out up-sampling and down-sampling on the feature information for a plurality of times to obtain a plurality of up-sampling features and down-sampling features; and the fourth processing submodule is used for sequentially connecting the up-sampling feature and the down-sampling feature according to a preset connection rule to generate the image feature.
In some embodiments, the up-sampled and down-sampled transformed pixel gray values are equal to the mean of the gray values of the two input pixels closest to the transformed pixel.
In some embodiments, the vehicle detection device further comprises: the device comprises a fifth processing submodule, a first detection submodule, a first filtering submodule and a second execution submodule. The fifth processing submodule is used for dividing the traffic image into a preset number of grids; the first detection submodule is used for detecting the prediction parameters of at least two target frames in each grid according to the image characteristics, wherein each target frame comprises at least two target categories, the prediction parameters comprise coordinates of a central point of the target frame, width and height and confidence, and the target categories are used for indicating whether the grid content belongs to a vehicle or not; the first filtering submodule is used for filtering frames with the confidence coefficient smaller than a preset threshold value in all the target frames; and the second execution submodule is used for carrying out non-maximum suppression processing on the reserved target frames and determining the position of the target frame with the highest confidence coefficient of each target category as the vehicle information.
In some embodiments, the vehicle detection device further comprises: the device comprises a first obtaining submodule, a first calculating submodule, a second calculating submodule and a third executing submodule. The first obtaining submodule is used for obtaining the original brightness of each pixel point of the traffic image and calculating the average brightness of each pixel point of the traffic image; the first calculation submodule is used for calculating the difference value of each original brightness and the average brightness of each pixel point and the sum value of a preset enhancement value and one; the second calculation submodule is used for solving a product value for the difference value and the sum value, and summing the product value and the average brightness to obtain a brightness value of each pixel point; and the third execution submodule is used for adjusting the original brightness of each pixel point according to the brightness value obtained by each pixel point.
In some embodiments, the vehicle detection device further includes a sixth processing sub-module, configured to select a filtering window with a preset size by using each pixel point of the traffic image as a central pixel point, filter a maximum pixel point and a minimum pixel point in the filtering window, obtain a mean pixel value of remaining pixel points and a weight of each remaining pixel point, perform normalization processing, perform weighted summation on a pixel value of each remaining pixel point and a corresponding weight, and use the obtained pixel value as a filtered pixel value of the central pixel point in the filtering window.
In order to solve the above technical problem, an embodiment of the present invention further provides an electronic device. Referring to fig. 7, fig. 7 is a block diagram of a basic structure of the electronic device according to the embodiment.
As shown in fig. 7, the internal structure of the electronic device is schematically illustrated. As shown in fig. 7, the electronic device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the electronic device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions when executed by the processor can enable the processor to realize a vehicle detection method. The processor of the electronic device is used for providing calculation and control capability and supporting the operation of the whole electronic device. The memory of the electronic device may have computer readable instructions stored therein that, when executed by the processor, may cause the processor to perform a vehicle detection method. The network interface of the electronic equipment is used for connecting and communicating with the terminal. It will be appreciated by those skilled in the art that the configurations shown in the figures are block diagrams of only some of the configurations relevant to the present application, and do not constitute a limitation on the electronic devices to which the present application may be applied, and a particular electronic device may include more or less components than those shown in the figures, or may combine certain components, or have a different arrangement of components.
The electronic equipment can be vehicle detection equipment and further comprises a traffic camera arranged in a traffic scene and used for collecting traffic images in the traffic scene.
The present invention also provides a storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the vehicle detection method of any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.