CN116645516A - Multi-category target counting method and system based on multi-perception feature fusion - Google Patents

Multi-category target counting method and system based on multi-perception feature fusion Download PDF

Info

Publication number
CN116645516A
CN116645516A CN202310513969.8A CN202310513969A CN116645516A CN 116645516 A CN116645516 A CN 116645516A CN 202310513969 A CN202310513969 A CN 202310513969A CN 116645516 A CN116645516 A CN 116645516A
Authority
CN
China
Prior art keywords
feature
features
convolution
channel
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310513969.8A
Other languages
Chinese (zh)
Inventor
张莉
魏祥一
赵雷
王邦军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202310513969.8A priority Critical patent/CN116645516A/en
Publication of CN116645516A publication Critical patent/CN116645516A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a multi-category target counting method and system based on multi-perception feature fusion, wherein the method comprises the following steps: step S1: acquiring images with multiple types of targets, and extracting feature images of the images; step S2: extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features; step S3: fusing the space features and the channel features to obtain fusion features; step S4: inputting the fusion characteristics into a counting network, and outputting a density map through the counting network; step S5: and integrating the density map to obtain the number of each type of targets. The application can effectively count vehicles and pedestrians, and has better effect.

Description

Multi-category target counting method and system based on multi-perception feature fusion
Technical Field
The application relates to the technical field of multi-target counting, in particular to a multi-category target counting method and system based on multi-perception feature fusion.
Background
In recent years, more and more researchers have focused on the important role of automatic crowd counting in public monitoring and intelligent transportation systems. Analysis of crowd behavior and traffic density has a significant impact on the efficiency of public transportation. In the construction of smart cities, whether an intelligent traffic system can accurately and efficiently acquire crowd behavior information and vehicle information from public monitoring is a very important link for realizing effective traffic planning, and crowd counting and vehicle counting are basic tasks of crowd behavior analysis and traffic flow analysis. For example, crowd counting has been widely used in public safety, video surveillance, and other fields. There are generally two methods to solve the target count problem, calculating the object, where the input is the image and the output is the total number of people in the image; the density map is predicted, where the input is an image, the output is a predicted population density map, and then the density map is integrated to obtain the population count.
In the early years, researchers studied the target count problem by either detection-based or regression-based methods. Detection-based methods that take the entire object in the image as a detectable feature and then easily calculate the detection result extract the manual features of the object using image processing techniques, however, this assumption is not always true in research, especially when the object is very dense and severely occluded. Regression-based methods use global features, such as texture, gradients and then machine learning methods to learn a regression model, such as support vector machine or ridge regression, to build a map between the hand-made features and the number of people in the image. The detection-based method has the problems of serious shielding, perspective and the like, the performance of the detection-based method on a high-density image is obviously and negatively influenced, the regression-based method uses global features for counting, the problems of image shielding, perspective and the like are solved, and the performance of the detection-based method on a high-density target image is poor.
With the rapid development of deep learning, more and more computer vision researchers are focusing on convolutional neural networks. The convolutional neural network (Convolutional neural networks, CNN) has a strong automatic feature extraction capability, and recent studies on object counting indicate that CNN has a strong automatic feature extraction capability, which is superior to all conventional methods. While conventional object counting methods typically output the population in an image, CNN-based methods typically predict density maps, which contain density information that conventional methods predict that the population is not available. However, in the field of intelligent transportation, there is a constant lack of a counting dataset containing common vehicles and a method for counting common vehicles and pedestrians.
Disclosure of Invention
The technical problem to be solved by the application is to overcome the lack of a method for counting vehicles and pedestrians in the prior art.
In order to solve the technical problems, the application provides a multi-category target counting method based on multi-perception feature fusion, which comprises the following steps:
step S1: acquiring images with multiple types of targets, and extracting feature images of the images;
step S2: extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
step S3: fusing the space features and the channel features to obtain fusion features;
step S4: inputting the fusion characteristics into a counting network, and outputting a density map through the counting network;
step S5: and integrating the density map to obtain the number of each type of targets.
In one embodiment of the present application, the step S1 extracts a feature map of the image through a convolutional neural network;
the neural network comprises a first convolution unit, a second convolution unit, a third convolution unit and a fourth convolution unit which are sequentially connected;
wherein a pooling layer is arranged between the first convolution unit and the second convolution unit, between the second convolution unit and the third convolution unit and between the third convolution unit and the fourth convolution unit;
the first convolution unit comprises two convolution layers with the channel number of 64, the second convolution unit comprises two convolution layers with the channel number of 128, the third convolution unit comprises three convolution layers with the channel number of 256, and the third convolution unit comprises three convolution layers with the channel number of 512.
In one embodiment of the present application, the step S2 extracts multi-scale features of the feature map, and the method includes:
map the characteristic diagram F 0 Inputting a multi-scale feature extraction network, and extracting the feature map F through the multi-scale feature extraction network 0 The four features with different scales are spliced to obtain a multi-scale feature F;
wherein the multi-scale feature extraction network comprises four parallel convolution units;
the first convolution unit comprises a first expanded convolution layer, and an average pooling layer is arranged in front of the first expanded convolution layer;
the second convolution unit comprises a second expansion convolution layer, a third expansion convolution layer and a fourth expansion convolution layer which are sequentially connected;
the third convolution unit comprises a fifth expansion convolution layer and a sixth expansion convolution layer which are sequentially connected;
the fourth convolution unit includes a seventh expanded convolution layer.
In one embodiment of the present application, the method of extracting the channel features of the multi-scale feature in the step S2 includes:
extracting initial channel features C (F) of the multi-scale features through a channel feature extraction network initial
Characterizing the initial channel C (F) initial Element multiplication is carried out on the multi-scale features to obtain channel features C (F);
wherein initial channel features C (F) of the multi-scale features are extracted by a channel feature extraction network initial The formula is:
C(F) initial =concat(σ(L(r(L(v,W 0 ),W 1 )))(F))
wherein F represents multi-scale features, concat represents matrix stitching, sigma represents sigmoid function, v represents feature vector passing through average pooling layer, and W 0 And W is 1 Parameters of two linear layers, L is the linear layer and r is the ReLU activation function.
In one embodiment of the present application, the step S2 extracts spatial features of the feature map, and the method includes:
extracting initial spatial features S (F) of the feature map by a spatial feature extraction network 0 ) initial
The initial channel characteristics S (F 0 ) initial Element multiplication is carried out on the characteristic diagram to obtain a spatial characteristic S (F 0 );
Wherein initial spatial features S (F) of the feature map are extracted by a spatial feature extraction network 0 ) initial The formula is:
S(F 0 ) initial =σ(Conv(MAP(F 0 ),θ))⊙F 0
wherein F is 0 Representing the feature MAP, σ representing the sigmoid function, conv representing the 7×7 convolution layer, MAP representing the maximum pooling and average pooling, θ representing the parameters of the spatial feature extraction network, and # representing multiplication by element.
In one embodiment of the present application, in the step S3, the spatial feature and the channel feature are fused to obtain a fused feature, where the formula is:
Output MA =C(F)+S(F 0 )
wherein C (F) represents channel characteristics, S (F) 0 ) Representing the spatial features.
In one embodiment of the present application, the counting network in the step S4 includes five convolution layers connected in sequence, wherein the third convolution layer is a deconvolution layer; the fifth convolution layer has a convolution kernel size of 1 x 1 for outputting the density map.
In order to solve the technical problems, the application provides a multi-category target counting system based on multi-perception feature fusion, which comprises:
a first extraction module: the method comprises the steps of acquiring images with multiple types of targets, and extracting feature images of the images;
and a second extraction module: the method comprises the steps of extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
and a fusion module: the method comprises the steps of fusing the space characteristics and the channel characteristics to obtain fusion characteristics;
and a density map construction module: the fusion feature is input into a counting network, and a density map is output through the counting network;
the quantity prediction module: for obtaining the number of objects of each type by integrating the density map.
In order to solve the technical problems, the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the multi-category target counting method based on multi-perception feature fusion when executing the computer program.
To solve the above technical problem, the present application provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the multi-class target counting method based on multi-perception feature fusion.
Compared with the prior art, the technical scheme of the application has the following advantages:
the application can realize effective extraction and fusion of the features by constructing the convolutional neural network, the multi-scale feature extraction network, the channel feature extraction network and the space feature extraction network, and finally realize effective counting of common vehicles and pedestrians;
the application can provide important support for the public monitoring and intelligent transportation fields.
Drawings
In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.
FIG. 1 is a flow chart of the method of the present application;
FIG. 2 is a schematic diagram of a convolutional neural network in an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-scale feature extraction network in accordance with an embodiment of the application;
FIG. 4 is a schematic diagram of a channel feature extraction network according to an embodiment of the present application;
fig. 5 is a schematic diagram of a spatial feature extraction network according to an embodiment of the present application.
Detailed Description
The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.
Example 1
Referring to fig. 1, the application relates to a multi-category target counting method based on multi-perception feature fusion, which comprises the following steps:
step S1: acquiring images with multiple types of targets, and extracting feature images of the images;
step S2: extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
step S3: fusing the space features and the channel features to obtain fusion features;
step S4: inputting the fusion characteristics into a counting network, and outputting a density map through the counting network;
step S5: and integrating the density map to obtain the number of each type of targets.
The present embodiment is described in detail below:
in this embodiment, a large-scale multi-category target counting dataset is collected and labeled, and the dataset is divided into 8 categories, which totally comprise 2521 images, 274199 labeling points, and are suitable for verifying the performance of the present application, and the specific implementation steps are as follows:
in the research, the problem of unbalanced categories is generally found to be needed to be considered in multi-category counting, so that a category self-adaptive weight distribution loss function is designed in the embodiment, so that model loss is reduced in the model training process, and model accuracy is improved. The class adaptive weight allocation loss function formula is as follows:
wherein m is the class number, n is the number of test samples, D ij Andrespectively, are image X i True and estimated density map with j-th class, m p Is a model parameter of the entire network. />And T ij Respectively representing predicted values X of class j i And actual truth count, γ is the weight of a conditional reflection difficulty sample on total loss. In the experiment, this example was empirically set to 0.01 during training.
Referring to fig. 2, step S1 extracts a feature map of the image through a convolutional neural network, where the convolutional neural network according to the embodiment shows an excellent local feature extraction capability due to the use of convolution and has more layers with small convolution kernels, and the neural network includes a first convolution unit, a second convolution unit, a third convolution unit, and a fourth convolution unit connected in sequence; wherein a pooling layer is arranged between the first convolution unit and the second convolution unit, between the second convolution unit and the third convolution unit and between the third convolution unit and the fourth convolution unit; the first convolution unit comprises two convolution layers with the channel number of 64, the second convolution unit comprises two convolution layers with the channel number of 128, the third convolution unit comprises three convolution layers with the channel number of 256, and the third convolution unit comprises three convolution layers with the channel number of 512. The characteristic diagram of the input image output through the convolutional neural network is 1/8 of the original input size.
Further, in order to solve the problem that the size and shape of the transmission object in the multi-class object counting task are large in difference, the embodiment constructs a multi-scale feature extraction network, which is composed of four groups of convolution branches with different perception fields and expansion rates, and the embodiment uses dilation convolution (namely hole convolution) to perform multi-scale feature perception. The dilation convolution can expand the perceptual field while maintaining spatial resolution. Unlike the multi-column convolution structure typically used in counting tasks, the present embodiment also designs an average pooling layer in one convolution branch to improve the performance of the model.
Specifically, in step S2, the multi-scale features of the feature map are extracted, and the method includes: inputting the feature map into a multi-scale feature extraction network, extracting four features with different scales of the feature map through the multi-scale feature extraction network, and splicing the four features with different scales to obtain multi-scale features; referring to fig. 3, the multi-scale feature extraction network includes four parallel convolution units; the first convolution unit comprises a first expanded convolution (k=1, d=1, p=1) and an average pooling layer is provided before the first expanded convolution layer; the second convolution unit comprises a second expansion convolution layer (K=1, D=1, P=1), a third expansion convolution layer (K=3, D=3, P=3) and a fourth expansion convolution layer (K=3, D=3, P=3) which are connected in sequence; the third convolution unit includes a fifth expanded convolution layer (k=1, d=1, p=1) and a sixth expanded convolution layer (k=3, d=2, p=2) connected in sequence; the third convolution unit includes a seventh expanded convolution layer (k=1, d=1, p=1).
Further, in the task of counting multiple types of targets, the feature mapping to be focused by the model is relatively complex, so that the embodiment introduces a channel feature extraction network, and extracts the input basic features by using the feature detector corresponding to each channel in the feature mapping.
Specifically, in step S2, channel features of the multi-scale feature are extracted, and the method includes: extracting initial channel features C (F) of the multi-scale features through a channel feature extraction network initial The method comprises the steps of carrying out a first treatment on the surface of the Characterizing the initial channel C (F) initial And the multiscale featureMultiplying the row elements to obtain a channel characteristic C (F); wherein initial channel features C (F) of the multi-scale features are extracted by a channel feature extraction network initial The formula is:
C(F) initial =concat(σ(L(r(L(v,W 0 ),W 1 )))(F))
wherein F represents multi-scale features, concat represents matrix stitching, sigma represents sigmoid function, v represents feature vector passing through average pooling layer, and W 0 And W is 1 Parameters of two linear layers, L is the linear layer and r is the ReLU activation function.
The principle of the formula is as follows: the channel feature extraction network aggregates the spatial information of the multi-scale features F through an average pooling layer to generate different spatial context descriptors, obtains channel feature mapping v, and sorts the importance of the feature mapping v through two linear layers. To limit the complexity of the channel feature extraction network, the present embodiment encodes channel feature vectors by forming a bottleneck consisting of two linear layers around the nonlinearity. Then, the encoding channel feature vector mapped to [0,1] is normalized by Sigmoid operation.
Referring to fig. 4, the channel feature extraction network in this embodiment includes an average pooling layer, a first Linear layer (Linear), a ReLU activation function, a second Linear layer (Linear), and a sigmoid function, which are sequentially connected.
Specifically, in step S2, spatial features of the feature map are extracted, and the method includes: extracting initial spatial features S (F) of the feature map by a spatial feature extraction network 0 ) initial The method comprises the steps of carrying out a first treatment on the surface of the The initial channel characteristics S (F 0 ) initial Element multiplication is carried out on the characteristic diagram to obtain a spatial characteristic S (F 0 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein initial spatial features S (F) of the feature map are extracted by a spatial feature extraction network 0 ) initial The formula is:
S(F 0 ) initial =σ(Conv(MAP(F 0 ),θ))⊙F 0
wherein F is 0 Representing a feature MAP, σ represents a sigmoid function, conv represents a 7×7 convolution layer, and MAP represents a maximum pool and an average poolBy the expression, θ represents the parameters of the spatial feature extraction network, and by the expression of element-wise multiplication.
The principle of the formula is as follows: the role of the spatial feature extraction network is to discover the critical parts of the network and to weight the space. Not all regions in an image are equally important in the contribution of a task, only regions relevant to the task are of interest, and the role of the spatial feature extraction network is to find the essential part of the network. The input to the spatial feature extraction network is a feature map F 0 . The present embodiment obtains the results of maximum pooling and average pooling, concatenates them into one feature map, learns using the convolutional layer, and then operates using Sigmod.
Referring to fig. 5, the spatial feature extraction network in this embodiment includes a max-pooling and average-pooling layer, a convolution layer with k=7 and p=3, and a sigmoid function connected in sequence.
Further, in step S3, the spatial feature and the channel feature are fused to obtain a fusion feature, where the formula is:
Output MA =C(F)+S(F 0 )
wherein C (F) represents channel characteristics, S (F) 0 ) Representing the spatial features.
Further, the counting network in step S4 includes five sequentially connected convolution layers, wherein the first, second and fourth convolution layer parameters are (k=3, d=1, p=1), and the third convolution layer is a deconvolution layer (k= 9,S =2, p=1); the fifth convolution layer has a convolution kernel size of 1 x 1 for outputting the density map.
The experimental comparison results are as follows:
in order to verify the performance of the present application, the present embodiment uses the method provided by the present application to perform experiments on multiple types of target count data sets collected by the present application, and compared with the existing method in a large scale, the present application has better effects as shown in table 1 by using the Mean Absolute Error (MAE) and Mean Square Error (MSE) experimental results.
Table 1 comparison of experimental results
Method MAE MSE
C_CNN 26.94 50.69
MCNN 28.31 56.25
CSRNet 32.10 52.86
CANet 26.82 47.86
Res101_SFCN 31.16 54.02
SCAR 27.50 48.05
DM-Count 35.28 61.62
The application is that 26.03 46.11
Example two
The embodiment provides a multi-category target counting system based on multi-perception feature fusion, which comprises:
a first extraction module: the method comprises the steps of acquiring images with multiple types of targets, and extracting feature images of the images;
and a second extraction module: the method comprises the steps of extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
and a fusion module: the method comprises the steps of fusing the space characteristics and the channel characteristics to obtain fusion characteristics;
and a density map construction module: the fusion feature is input into a counting network, and a density map is output through the counting network;
the quantity prediction module: for obtaining the number of objects of each type by integrating the density map.
Example III
The present embodiment provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the multi-category target counting method based on multi-perception feature fusion of embodiment one when executing the computer program.
Example IV
The present embodiment provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the multi-class object counting method based on multi-perceptual feature fusion of embodiment.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims (10)

1. A multi-category target counting method based on multi-perception feature fusion is characterized by comprising the following steps of: comprising the following steps:
step S1: acquiring images with multiple types of targets, and extracting feature images of the images;
step S2: extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
step S3: fusing the space features and the channel features to obtain fusion features;
step S4: inputting the fusion characteristics into a counting network, and outputting a density map through the counting network;
step S5: and integrating the density map to obtain the number of each type of targets.
2. The multi-class object counting method based on multi-perception feature fusion according to claim 1, wherein: step S1, extracting a feature map of the image through a convolutional neural network;
the neural network comprises a first convolution unit, a second convolution unit, a third convolution unit and a fourth convolution unit which are sequentially connected;
wherein a pooling layer is arranged between the first convolution unit and the second convolution unit, between the second convolution unit and the third convolution unit and between the third convolution unit and the fourth convolution unit;
the first convolution unit comprises two convolution layers with the channel number of 64, the second convolution unit comprises two convolution layers with the channel number of 128, the third convolution unit comprises three convolution layers with the channel number of 256, and the third convolution unit comprises three convolution layers with the channel number of 512.
3. The multi-class object counting method based on multi-perception feature fusion according to claim 1, wherein: the step S2 is to extract multi-scale features of the feature map, and the method comprises the following steps:
map the characteristic diagram F 0 Inputting a multi-scale feature extraction network, and extracting the feature map F through the multi-scale feature extraction network 0 The four features with different scales are spliced to obtain a multi-scale feature F;
wherein the multi-scale feature extraction network comprises four parallel convolution units;
the first convolution unit comprises a first expanded convolution layer, and an average pooling layer is arranged in front of the first expanded convolution layer;
the second convolution unit comprises a second expansion convolution layer, a third expansion convolution layer and a fourth expansion convolution layer which are sequentially connected;
the third convolution unit comprises a fifth expansion convolution layer and a sixth expansion convolution layer which are sequentially connected;
the fourth convolution unit includes a seventh expanded convolution layer.
4. The multi-class object counting method based on multi-perception feature fusion according to claim 4, wherein: the step S2 is to extract the channel characteristics of the multi-scale characteristics, and the method comprises the following steps:
extracting initial channel features C (F) of the multi-scale features through a channel feature extraction network initial
Characterizing the initial channel C (F) initial Element multiplication is carried out on the multi-scale characteristics to obtain a general resultTrace feature C (F);
wherein initial channel features C (F) of the multi-scale features are extracted by a channel feature extraction network initial The formula is:
C(F) initial =concat(σ(L(r(L(v,W 0 ),W 1 )))(F))
wherein F represents multi-scale features, concat represents matrix stitching, sigma represents sigmoid function, v represents feature vector passing through average pooling layer, and W 0 And W is 1 Parameters of two fully connected layers, L is a linear layer and r is a ReLU activation function.
5. The multi-class object counting method based on multi-perception feature fusion according to claim 1, wherein: the step S2 extracts the spatial features of the feature map, and the method includes:
extracting initial spatial features S (F) of the feature map by a spatial feature extraction network 0 ) initial
The initial channel characteristics S (F 0 ) initial Element multiplication is carried out on the characteristic diagram to obtain a spatial characteristic S (F 0 );
Wherein initial spatial features S (F) of the feature map are extracted by a spatial feature extraction network 0 ) initial The formula is:
S(F 0 ) initial =σ(Conv(MAP(F 0 ),θ))⊙F 0
wherein F is 0 Representing the feature MAP, σ representing the sigmoid function, conv representing the 7×7 convolution layer, MAP representing the maximum pooling and average pooling, θ representing the parameters of the spatial feature extraction network, and # representing multiplication by element.
6. The multi-class object counting method based on multi-perception feature fusion according to claim 1, wherein: in the step S3, the spatial feature and the channel feature are fused to obtain a fusion feature, where the formula is:
Output MA =C(F)+S(F 0 )
wherein C (F) represents channel characteristics, S (F) 0 ) Representing the spatial features.
7. The multi-class object counting method based on multi-perception feature fusion according to claim 1, wherein: the counting network in the step S4 comprises five convolution layers which are connected in sequence, wherein a third convolution layer is a deconvolution layer; the fifth convolution layer has a convolution kernel size of 1 x 1 for outputting the density map.
8. A multi-category target counting system based on multi-perception feature fusion is characterized in that: comprising the following steps:
a first extraction module: the method comprises the steps of acquiring images with multiple types of targets, and extracting feature images of the images;
and a second extraction module: the method comprises the steps of extracting multi-scale features and space features of the feature map, and extracting channel features of the multi-scale features;
and a fusion module: the method comprises the steps of fusing the space characteristics and the channel characteristics to obtain fusion characteristics;
and a density map construction module: the fusion feature is input into a counting network, and a density map is output through the counting network;
the quantity prediction module: for obtaining the number of objects of each type by integrating the density map.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized by: the processor, when executing the computer program, implements the steps of the multi-class object counting method based on multi-perceptual feature fusion as defined in any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the multi-class object counting method based on multi-perceptual feature fusion as defined in any one of claims 1 to 7.
CN202310513969.8A 2023-05-09 2023-05-09 Multi-category target counting method and system based on multi-perception feature fusion Pending CN116645516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310513969.8A CN116645516A (en) 2023-05-09 2023-05-09 Multi-category target counting method and system based on multi-perception feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310513969.8A CN116645516A (en) 2023-05-09 2023-05-09 Multi-category target counting method and system based on multi-perception feature fusion

Publications (1)

Publication Number Publication Date
CN116645516A true CN116645516A (en) 2023-08-25

Family

ID=87642647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310513969.8A Pending CN116645516A (en) 2023-05-09 2023-05-09 Multi-category target counting method and system based on multi-perception feature fusion

Country Status (1)

Country Link
CN (1) CN116645516A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948553A (en) * 2019-03-20 2019-06-28 北京航空航天大学 A kind of multiple dimensioned dense population method of counting
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A kind of object count method and system based on the multiple dimensioned cascade network of double attentions
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network
CN113011329A (en) * 2021-03-19 2021-06-22 陕西科技大学 Pyramid network based on multi-scale features and dense crowd counting method
CN113762009A (en) * 2020-11-18 2021-12-07 四川大学 Crowd counting method based on multi-scale feature fusion and double-attention machine mechanism
US11631238B1 (en) * 2022-04-13 2023-04-18 Iangxi Electric Power Research Institute Of State Grid Method for recognizing distribution network equipment based on raspberry pi multi-scale feature fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074186A1 (en) * 2018-08-28 2020-03-05 Beihang University Dense crowd counting method and apparatus
CN109948553A (en) * 2019-03-20 2019-06-28 北京航空航天大学 A kind of multiple dimensioned dense population method of counting
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A kind of object count method and system based on the multiple dimensioned cascade network of double attentions
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network
CN113762009A (en) * 2020-11-18 2021-12-07 四川大学 Crowd counting method based on multi-scale feature fusion and double-attention machine mechanism
CN113011329A (en) * 2021-03-19 2021-06-22 陕西科技大学 Pyramid network based on multi-scale features and dense crowd counting method
US11631238B1 (en) * 2022-04-13 2023-04-18 Iangxi Electric Power Research Institute Of State Grid Method for recognizing distribution network equipment based on raspberry pi multi-scale feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙俊 等: "基于无人机图像的多尺度感知麦穗计数方法", 农业工程学报, 31 December 2021 (2021-12-31) *
潘诚: "密集场景下的人群计数算法研究", 中国优秀硕士学位论文全文数据库 信息科技辑, 15 February 2023 (2023-02-15) *

Similar Documents

Publication Publication Date Title
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN108764063B (en) Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN112529005B (en) Target detection method based on semantic feature consistency supervision pyramid network
CN113239869B (en) Two-stage behavior recognition method and system based on key frame sequence and behavior information
CN112132014B (en) Target re-identification method and system based on non-supervised pyramid similarity learning
Sapijaszko et al. An overview of recent convolutional neural network algorithms for image recognition
CN113963251A (en) Marine organism detection method, system and equipment
CN116310688A (en) Target detection model based on cascade fusion, and construction method, device and application thereof
Coenen et al. Semi-supervised segmentation of concrete aggregate using consensus regularisation and prior guidance
Li et al. Robust blood cell image segmentation method based on neural ordinary differential equations
Pino et al. Semantic segmentation of radio-astronomical images
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion
CN113223037B (en) Unsupervised semantic segmentation method and unsupervised semantic segmentation system for large-scale data
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN116844032A (en) Target detection and identification method, device, equipment and medium in marine environment
Zhao et al. Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network.
CN117011640A (en) Model distillation real-time target detection method and device based on pseudo tag filtering
Wu et al. Research on asphalt pavement disease detection based on improved YOLOv5s
CN116363532A (en) Unmanned aerial vehicle image traffic target detection method based on attention mechanism and re-parameterization
Zhu et al. Real-time traffic sign detection based on YOLOv2
CN116645516A (en) Multi-category target counting method and system based on multi-perception feature fusion
CN114091519A (en) Shielded pedestrian re-identification method based on multi-granularity shielding perception
Xie et al. On the study of predictors in single shot multibox detector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination