CN117238299B - Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line - Google Patents
Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line Download PDFInfo
- Publication number
- CN117238299B CN117238299B CN202311506466.4A CN202311506466A CN117238299B CN 117238299 B CN117238299 B CN 117238299B CN 202311506466 A CN202311506466 A CN 202311506466A CN 117238299 B CN117238299 B CN 117238299B
- Authority
- CN
- China
- Prior art keywords
- bird
- model
- sound
- background sound
- voice recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 57
- 238000001514 detection method Methods 0.000 claims abstract description 55
- 238000004458 analytical method Methods 0.000 claims abstract description 44
- 238000012544 monitoring process Methods 0.000 claims abstract description 14
- 238000005457 optimization Methods 0.000 claims description 24
- 238000007781 pre-processing Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 239000012634 fragment Substances 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 5
- 238000012800 visualization Methods 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 4
- 238000012806 monitoring device Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- 241000208152 Geranium Species 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000002940 repellent Effects 0.000 description 1
- 239000005871 repellent Substances 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method, a system, a medium and equipment for optimizing a bird voice recognition model of a power transmission line, and relates to the technical field of power transmission line monitoring. The method comprises the steps of obtaining background sound of a current scene within a period of time; sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result; performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line; and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set. The invention solves the problem of insufficient training resources of the model under different scenes, can update a scene hidden trouble identification model with high false alarm rate in time, and reduces false alarm and false alarm.
Description
Technical Field
The invention relates to the technical field of power transmission line monitoring, in particular to a power transmission line bird voice recognition model optimization method, a system, a medium and equipment.
Background
In recent years, the power grid is used for greatly promoting the construction of a visual line, so that the inspection efficiency of the power transmission line is improved by a visual and intelligent means, the visual technology of the power transmission channel plays an important role in timely finding hidden danger and timely early warning, and makes outstanding contributions to the safety and reliability guarantee and intelligent promotion of the power transmission line. Although the visualization technology of the transmission channel and the artificial intelligence technology related to the visualization technology of the transmission channel have advanced in recent years, the visualization technology of the transmission channel is still in a starting stage, and a long path is required to be taken for meeting the requirements of transmission operation and maintenance. To meet the requirements of safety, reliability and intelligent transmission lines, not only visual technology needs to be researched, but also related technologies of hearing, touch and smell should be widely researched.
Bird activities are an important factor causing faults of the transmission line, and more prevention measures against bird damage are continuously taken, for example, strengthening inspection, using bird repellent devices and the like. But the problems of wide range of the whole bird prevention area, more towers, great difficulty in bird recognition and the like are faced, and the general measures are difficult to achieve the targeted prevention and treatment effect. The accurate identification of bird images and sound classification and image audio retrieval are important methods for solving the problem of preventing and controlling bird pests such as large birds like Geranium in the environment of a power transmission line.
The current working mode of the terminal bird song classifying algorithm based on deep learning is as follows: and (3) carrying out sample data accumulation labeling on the cloud server, carrying out model training, and deploying the model training after test quantization compression to the mobile terminal for reasoning. However, as the number of the deployment of the monitoring devices increases, each monitoring field becomes diversified, so that the detection model trained in the cloud and deployed to the monitoring terminal lacks certain robustness for identifying the bird sounds under different backgrounds. Limited to the resources of the cloud training server, training a model for each monitoring device's installation scenario alone is almost impossible. Each transmission line monitoring device monitors a fixed scene for a long time, and the bird species moving in the environment of the fixed scene are not changed greatly in a period of time. Therefore, if the sound classification algorithm is researched only for a fixed scene, good effect can be obtained, when the scene changes, the algorithm is debugged timely, parameters are modified, and finally, higher detection performance can be achieved. Therefore, online tuning of the end-side based deep learning recognition model is of great practical significance.
The current analysis flow of the bird sounds monitored by the power transmission line is mainly carried out on a cloud server. Along with the continuous increase of the monitoring devices, the monitoring density is continuously increased. The continuous sound data is gushed to the cloud analysis server through the wireless network, and huge pressure is brought to the 4g wireless network and the cloud server. The analysis server can not timely analyze and process massive audio data, so that a large amount of data backlog is caused, alarm delay is caused, a user can not be timely notified of hidden danger, and huge hidden danger is brought to the safety of a power transmission line.
The intelligent migration of hidden danger analysis to monitoring equipment for front-end analysis is a necessary trend of hidden danger identification of power transmission line monitoring, but due to the limitation of low calculation power and low power consumption of the front-end equipment, a common deep neural network model is too large, is not suitable for front-end hidden danger identification, analysis and calculation at a terminal, and only a lightweight deep neural network model can be deployed. However, the lightweight network model has insufficient expression of the characteristics of the audio, and the recognition accuracy cannot reach that of the cloud analysis server. In addition, the scene of the power transmission line changes in many ways, one model trained by the cloud server is used for identifying hidden dangers under all power transmission scenes, so that the network model and the specifically identified power transmission line monitoring scene are not fused sufficiently, a large number of false positives are caused, and the identification precision is reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide the power transmission line bird voice recognition model optimization method, system, medium and equipment, which can screen out a subset of equipment which needs to be subjected to model iterative upgrade from mass front-end analysis equipment running, and avoid the phenomenon of performing model iterative upgrade training on all the equipment. And through real-time model fine tuning, the problem that model training resources are insufficient under different scenes is solved greatly, a scene hidden danger identification model with high false alarm rate can be updated in time, and false alarm are reduced.
In order to achieve the above object, the present invention is realized by the following technical scheme:
the invention provides a power transmission line bird voice recognition model optimization method, which comprises the following steps:
acquiring background sound of a current scene within a period of time, and preprocessing the background sound;
sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result;
performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line;
and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine-tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
Further, the step of preprocessing the background sound includes:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
Further, the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined.
Furthermore, the specific steps of using the cloud analysis server to perform secondary classification detection on the background sound are as follows:
the bird song signal is subjected to pre-emphasis and sliding window uniform segmentation processing, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally the bird species are predicted and recognized through bird song.
Further, the training process of the bird voice recognition model is as follows:
firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters.
Further, the alarm sound is set according to different common bird sounds in each scene environment.
Furthermore, the bird voice feature adopts a Mel spectrogram, a digital code is given to each region, then the code is encoded to obtain region features, and then the region features and the voice features are spliced into new features to be input as a model.
Further, after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform.
Further, the specific steps of performing sample expansion by using the buffered sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data.
Further, the preset sample is a known labeled target bird sound fragment dataset.
Further, the bird voice recognition model structure adopts an afflicientnet network.
The second aspect of the invention provides an optimization system for a bird voice recognition model of a power transmission line, comprising:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
A third aspect of the present invention provides a medium having stored thereon a program which when executed by a processor performs the steps of the transmission line bird voice recognition model optimization method according to the first aspect of the present invention.
A fourth aspect of the invention provides an apparatus comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method for optimizing a bird voice recognition model for an electric transmission line according to the first aspect of the invention when the program is executed.
The one or more of the above technical solutions have the following beneficial effects:
the invention discloses a method, a system, a medium and equipment for optimizing a bird voice recognition model of a power transmission line, wherein front-end analysis operation data are acquired from a cloud analysis server and are subjected to classification analysis, alarm information after confirmation is used as standard information to be fed back to terminal equipment, and on the equipment terminal side, the adjustment of neural network model parameters is performed on the basis of the uploaded alarm information as a training sample, so that the adaptability of the model to a scene is improved. The invention can screen out the subset of the devices needing to be subjected to model iteration upgrading from the mass front-end analysis devices running, and avoids the phenomenon of carrying out model iteration upgrading training on all the devices. And through real-time model fine tuning, the problem that model training resources are insufficient under different scenes is solved greatly, a scene hidden danger identification model with high false alarm rate can be updated in time, and false alarm are reduced.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flowchart of a method for optimizing a bird voice recognition model of a transmission line according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;
embodiment one:
the first embodiment of the invention provides an optimization method for a bird voice recognition model of a power transmission line, as shown in fig. 1, background voice is collected through equipment for a period of time, the background voice is input into a front end for analysis, the result of the front end analysis is transmitted to a cloud for secondary analysis, and the secondary analysis result is confirmed manually, so that the possibility of false alarm and missing report is reduced. After manual confirmation, the platform supplements the sample to the equipment, and the equipment generates a new sample on line according to the supplemented data for end-side model fine tuning training. By the method, the subset of the devices needing to be subjected to model iterative upgrade can be screened from the mass front-end analysis devices running in the running mode, and the phenomenon of performing model iterative upgrade training on all the devices is avoided.
The method specifically comprises the following steps:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
In step 1, after the sound collection device operates for a period of time, the background sound collected by the device is collected periodically, and the background sound is preprocessed. The step of preprocessing the background sound comprises:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
And transmitting the preprocessed background sound to a front-end server for analysis.
In step 2, the specific steps of using the front end server and the cloud analysis server to classify and detect the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
firstly, comparing the background sound preprocessed by the front-end server with a preset alarm sound, and determining that the background sound is bird sound. Specifically, mel spectrum features are extracted for two sounds, cosine similarity is calculated for the two features, and when the similarity is greater than a set threshold, the two sounds are considered to be the same type of sound.
When the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined. And after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform. The terminal monitoring equipment periodically and actively requests manual confirmation to the platform, and the confirmed alarm information of the equipment sound is used as a training sample for fine adjustment of the model.
In a specific embodiment, the cloud analysis server calculates power to be far from the advanced analysis server, so that the bird voice recognition model deployed in the cloud has more parameters, the model is larger, and the recognition accuracy is better. Therefore, the cloud server is used for carrying out secondary classification on the background sound, so that the quality of data in the sample set is ensured; the method is characterized in that pre-emphasis, uniform sliding window segmentation and the like are carried out on the bird song signals, the bird song signals are converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally prediction recognition of bird species through bird song is achieved.
Firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters.
The model structure adopts an Efficient network.
The warning sounds are set according to different common bird sounds in each scene environment. More specifically, a bird song database and a regional bird song distribution information base are constructed according to bird song distribution information around different regions and power transmission lines/power transmission networks, and meanwhile, the characteristic of strong bird song distribution territory is combined, and the current regional environment information is input into a model together in a regional gridding mode to predict the current regional environment information and bird song characteristics.
The bird sound calling feature adopts a Mel spectrogram, a digital code is given to each region, then the code is coded (one-hot) to obtain the region feature, and then the region feature and the sound calling feature are spliced into a new feature as a model to be input.
In step 3, the specific steps of performing sample expansion by using the buffered sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data. Specifically, firstly, sequentially selecting target type fragments from marked preset samples, then traversing all the preliminary sample sets, randomly inserting the target fragments into the selected preliminary samples, and achieving the purpose of enhancing the data diversity of the samples in this way.
Wherein the preset sample is a known target bird sound fragment data set with marks. In this embodiment, a part of sound samples and labels of the sound samples, which need to be identified, are preset when the sound collection device is installed and deployed.
According to the embodiment, the diversity of the training samples is enriched through sample expansion, and meanwhile, the characteristics of the training samples are highlighted through data enhancement, so that the recognition accuracy of the recognition model is improved.
In step 4, the embodiment performs model fine-tuning based on the curing and activation strategy of the neural network hidden layer of the parameter weight. Because the calculation force and the memory limitation of the end side are limited, the online model fine-tuning calculation of the end side cannot iteratively update all layers of the neural network model, and only a certain layer with a larger contribution weight to the target detection task must be subjected to iterative update. Thus, the trimming process includes: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In one specific embodiment, more specifically the steps include:
(1) And (5) pre-training the model setting of the bird voice recognition model. And taking a bird voice recognition model which is preset in the terminal equipment and is running as a pre-training model, wherein the bird voice recognition model structure adopts an Efficientnet network, the Efficientnet is divided into 9 stages in total, and a convolution layer is followed by a BN layer and a Swish activation function by default. stage 1 is a 3x3 convolutional layer. For stage 2 to stage 8, the structure of MBConv in repeated stacks, MBConv for the main branch, a convolution layer of 1x1 (+bn+swish) is used for the up-scaling, followed by a DW convolution (+bn+swish), the convolution kernel size being 3x3 or 5x5, followed by a SE block, followed by a convolution of 1x1 (+bn) for the down-scaling, and finally by a dropout operation. And finally, directly transmitting the matrix of the input branch to be added with the main branch to obtain the final output. stage 9 consists of three parts, first a 1x1 convolution, then averaging pooling, and finally a fully connected layer. The bird sound obtains feature information through a feature extraction part, the obtained feature input model is subjected to convolution, pooling, batch normalization and other steps, and finally confidence scores of birds corresponding to the full-connection layer data are obtained.
(2) And preprocessing the terminal equipment model. The pre-training model keeps the operators used in the training process of BatchNorm (batch normalization), dropout (random discard) and the like during conversion.
(3) For fine tuning scenes, the model is not required to be built from zero at the end side, only the pre-training model is required to be loaded, parameters of the front layer of the neural network are fixed, and only the last layer of the full-connection layer is used for fine tuning. The input name of the last layer is viewed through the netron model visualization tool (or model json file output by other tools), leaving the pre-trained model of the last layer removed.
Embodiment two:
the second embodiment of the invention provides an optimization system for a bird voice recognition model of a power transmission line, which comprises the following components:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In the data acquisition module, after the sound acquisition equipment operates for a period of time, background sound acquired by the equipment is collected regularly, and the background sound is preprocessed by the preprocessing module.
The step of preprocessing the background sound by the preprocessing module comprises the following steps:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
And transmitting the preprocessed background sound to a front-end server for analysis.
In the classification detection module, the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
firstly, comparing the background sound preprocessed by the front-end server with a preset alarm sound, and determining that the background sound is bird sound. Specifically, mel spectrum features are extracted for two sounds, cosine similarity is calculated for the two features, and when the similarity is greater than a set threshold, the two sounds are considered to be the same type of sound.
When the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined. And after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform. The terminal monitoring equipment periodically and actively requests manual confirmation to the platform, and the confirmed alarm information of the equipment sound is used as a training sample for fine adjustment of the model.
In a specific embodiment, the cloud analysis server calculates power to be far from the advanced analysis server, so that the bird voice recognition model deployed in the cloud has more parameters, the model is larger, and the recognition accuracy is better. Therefore, the cloud server is used for carrying out secondary classification on the background sound, so that the quality of data in the sample set is ensured; the method is characterized in that pre-emphasis, uniform sliding window segmentation and the like are carried out on the bird song signals, the bird song signals are converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally prediction recognition of bird species through bird song is achieved.
Firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters. Model structure: the model structure adopts an Efficient network.
The warning sounds are set according to different common bird sounds in each scene environment. More specifically, a bird song database and a regional bird song distribution information base are constructed according to bird song distribution information around different regions and power transmission lines/power transmission networks, and meanwhile, the characteristic of strong bird song distribution territory is combined, and the current regional environment information is input into a model together in a regional gridding mode to predict the current regional environment information and bird song characteristics.
The bird sound calling feature adopts a Mel spectrogram, a digital code is given to each region, then the code is coded (one-hot) to obtain the region feature, and then the region feature and the sound calling feature are spliced into a new feature as a model to be input.
In the sample expansion module, the specific steps of sample expansion by using the cached sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data. Specifically, firstly, sequentially selecting target type fragments from marked preset samples, then traversing all the preliminary sample sets, randomly inserting the target fragments into the selected preliminary samples, and achieving the purpose of enhancing the data diversity of the samples in this way.
The preset sample is a known target bird sound fragment data set with marks, and is stored in a database after being collected in advance. In this embodiment, a part of sound samples and labels of the sound samples, which need to be identified, are preset when the sound collection device is installed and deployed.
According to the embodiment, the diversity of the training samples is enriched through sample expansion, and meanwhile, the characteristics of the training samples are highlighted through data enhancement, so that the recognition accuracy of the recognition model is improved.
In the model fine tuning module, the embodiment performs model fine tuning based on the curing and activating strategy of the neural network hidden layer of the parameter weight. Because the calculation force and the memory limitation of the end side are limited, the online model fine-tuning calculation of the end side cannot iteratively update all layers of the neural network model, and only a certain layer with a larger contribution weight to the target detection task must be subjected to iterative update. Thus, the trimming process includes: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In one specific embodiment, more specifically the steps include:
(1) And (5) pre-training the model setting of the bird voice recognition model. And taking a bird voice recognition model which is preset in the terminal equipment and is running as a pre-training model, wherein the bird voice recognition model structure adopts an Efficientnet network, the Efficientnet is divided into 9 stages in total, and a convolution layer is followed by a BN layer and a Swish activation function by default. stage 1 is a 3x3 convolutional layer. For stage 2 to stage 8, the structure of MBConv in repeated stacks, MBConv for the main branch, a convolution layer of 1x1 (+bn+swish) is used for the up-scaling, followed by a DW convolution (+bn+swish), the convolution kernel size being 3x3 or 5x5, followed by a SE block, followed by a convolution of 1x1 (+bn) for the down-scaling, and finally by a dropout operation. And finally, directly transmitting the matrix of the input branch to be added with the main branch to obtain the final output. stage 9 consists of three parts, first a 1x1 convolution, then averaging pooling, and finally a fully connected layer. The bird sound obtains feature information through a feature extraction part, the obtained feature input model is subjected to convolution, pooling, batch normalization and other steps, and finally confidence scores of birds corresponding to the full-connection layer data are obtained.
(2) And preprocessing the terminal equipment model. The pre-training model keeps operators used in the training process of BatchNorm, dropout and the like during conversion.
(3) For fine tuning scenes, the model is not required to be built from zero at the end side, only the pre-training model is required to be loaded, parameters of the front layer of the neural network are fixed, and only the last layer of the full-connection layer is used for fine tuning. The input name of the last layer is viewed through the netron model visualization tool (or model json file output by other tools), leaving the pre-trained model of the last layer removed.
Embodiment III:
the third embodiment of the present invention provides a medium, on which a program is stored, the program when executed by a processor implementing the steps in the power transmission line bird voice recognition model optimization method according to the first embodiment of the present invention, where the steps are as follows:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
The detailed steps are the same as those of the power transmission line bird voice recognition model optimization method provided in the first embodiment, and are not repeated here.
Embodiment four:
the fourth embodiment of the invention provides a device, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the power transmission line bird sound recognition model optimization method according to the first embodiment of the invention when executing the program, and the steps are as follows:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
The detailed steps are the same as those of the power transmission line bird voice recognition model optimization method provided in the first embodiment, and are not repeated here.
The steps involved in the second, third and fourth embodiments correspond to those of the first embodiment, and the detailed description of the second embodiment will be referred to in the related description section of the first embodiment.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.
Claims (9)
1. The method for optimizing the bird voice recognition model of the power transmission line is characterized by comprising the following steps of:
acquiring background sound of a current scene within a period of time, and preprocessing the background sound;
sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result;
performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line;
and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine-tuning process comprises the following steps: fixing bird voice recognition model parameter layers, newly initializing one layer as a fine tuning layer, and configuring the fine tuning layer only for training an expanded sample set;
the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, performing secondary classification detection on the background sound by using a cloud analysis server, and determining the alarm type of the background sound;
the specific steps of carrying out secondary classification detection on the background sound by using the cloud analysis server are as follows:
pre-emphasis and sliding window uniform segmentation processing are carried out on the bird song signal, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally, prediction recognition is carried out on bird species through bird song;
the training process of the bird voice recognition model comprises the following steps:
firstly, extracting bird voice characteristics from bird voice data, splicing regional characteristics and voice characteristics to serve as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, iteratively updating the parameters in continuous training through data and forward and backward propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters;
the alarm sound is set according to different common bird sounds in each scene environment; constructing a bird song database and a regional bird species distribution information base according to bird species distribution information around different regions and power transmission lines/power transmission grids, and simultaneously, combining the characteristic of strong regional bird species distribution, and jointly inputting the current regional environment information and bird song characteristics into a model for prediction in a regional gridding mode;
the bird voice recognition model structure adopts an Efficientenet network; the Efficientnet is divided into 9 stages altogether, wherein stage 9 consists of three parts, and finally a full connection layer is arranged; for fine tuning scenarios, the parameters of the front layer of the neural network are fixed, only the last layer of the fully connected layer is used for fine tuning.
2. The method for optimizing a bird voice recognition model for a power transmission line according to claim 1, wherein the step of preprocessing the background sound comprises:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
3. The method for optimizing the bird voice recognition model of the power transmission line according to claim 1, wherein bird voice features adopt mel frequency spectrograms, each region is given a digital code, the code is then encoded to obtain region features, and the region features and the voice features are spliced into new features to be input as the model.
4. The method for optimizing bird voice recognition model of power transmission line according to claim 1, wherein after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed by the monitoring platform.
5. The method for optimizing a bird voice recognition model for a power transmission line according to claim 1, wherein the specific step of performing sample expansion using the buffered voice record, the preset sample, and the preliminary sample set comprises:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data.
6. The method of optimizing transmission line bird voice recognition models of claim 5, wherein the pre-set samples are known tagged target bird voice clip datasets.
7. The utility model optimization system is characterized in that includes:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: fixing bird voice recognition model parameter layers, newly initializing one layer as a fine tuning layer, and configuring the fine tuning layer only for training an expanded sample set;
the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, performing secondary classification detection on the background sound by using a cloud analysis server, and determining the alarm type of the background sound;
the specific steps of carrying out secondary classification detection on the background sound by using the cloud analysis server are as follows:
pre-emphasis and sliding window uniform segmentation processing are carried out on the bird song signal, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally, prediction recognition is carried out on bird species through bird song;
the training process of the bird voice recognition model comprises the following steps:
firstly, extracting bird voice characteristics from bird voice data, splicing regional characteristics and voice characteristics to serve as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, iteratively updating the parameters in continuous training through data and forward and backward propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters;
the alarm sound is set according to different common bird sounds in each scene environment; constructing a bird song database and a regional bird species distribution information base according to bird species distribution information around different regions and power transmission lines/power transmission grids, and simultaneously, combining the characteristic of strong regional bird species distribution, and jointly inputting the current regional environment information and bird song characteristics into a model for prediction in a regional gridding mode;
the bird voice recognition model structure adopts an Efficientenet network; the Efficientnet is divided into 9 stages altogether, wherein stage 9 consists of three parts, and finally a full connection layer is arranged; for fine tuning scenarios, the parameters of the front layer of the neural network are fixed, only the last layer of the fully connected layer is used for fine tuning.
8. A computer readable storage medium, characterized in that a plurality of instructions are stored, which instructions are adapted to be loaded by a processor of a terminal device and to perform the transmission line bird voice recognition model optimization method of any one of claims 1-6.
9. A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the transmission line bird voice recognition model optimization method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311506466.4A CN117238299B (en) | 2023-11-14 | 2023-11-14 | Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311506466.4A CN117238299B (en) | 2023-11-14 | 2023-11-14 | Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117238299A CN117238299A (en) | 2023-12-15 |
CN117238299B true CN117238299B (en) | 2024-01-30 |
Family
ID=89086506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311506466.4A Active CN117238299B (en) | 2023-11-14 | 2023-11-14 | Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117238299B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113180030A (en) * | 2021-07-01 | 2021-07-30 | 广东电网有限责任公司中山供电局 | Embedded bird recognition system |
CN113707158A (en) * | 2021-08-02 | 2021-11-26 | 南昌大学 | Power grid harmful bird seed singing recognition method based on VGGish migration learning network |
CN114863937A (en) * | 2022-05-17 | 2022-08-05 | 武汉工程大学 | Hybrid birdsong identification method based on deep migration learning and XGboost |
WO2022205249A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Audio feature compensation method, audio recognition method, and related product |
CN115299428A (en) * | 2022-08-04 | 2022-11-08 | 国网江苏省电力有限公司南通供电分公司 | Intelligent bird system that drives of thing networking based on degree of depth study |
CN116687438A (en) * | 2023-05-30 | 2023-09-05 | 北京石油化工学院 | Method and device for identifying borborygmus |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11914661B2 (en) * | 2020-09-02 | 2024-02-27 | Google Llc | Integration of web and media snippets into map applications |
-
2023
- 2023-11-14 CN CN202311506466.4A patent/CN117238299B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022205249A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Audio feature compensation method, audio recognition method, and related product |
CN113180030A (en) * | 2021-07-01 | 2021-07-30 | 广东电网有限责任公司中山供电局 | Embedded bird recognition system |
CN113707158A (en) * | 2021-08-02 | 2021-11-26 | 南昌大学 | Power grid harmful bird seed singing recognition method based on VGGish migration learning network |
CN114863937A (en) * | 2022-05-17 | 2022-08-05 | 武汉工程大学 | Hybrid birdsong identification method based on deep migration learning and XGboost |
CN115299428A (en) * | 2022-08-04 | 2022-11-08 | 国网江苏省电力有限公司南通供电分公司 | Intelligent bird system that drives of thing networking based on degree of depth study |
CN116687438A (en) * | 2023-05-30 | 2023-09-05 | 北京石油化工学院 | Method and device for identifying borborygmus |
Non-Patent Citations (2)
Title |
---|
基于语谱图特征信息分割提取的声景观中鸟类生物多样性分析;蒋锦刚;邵小云;万海波;齐家国;荆长伟;程天佑;;生态学报(23);全文 * |
多特征融合的鸟类物种识别方法;谢将剑;杨俊;邢照亮;张卓;陈新;;应用声学(02);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117238299A (en) | 2023-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111462167A (en) | Intelligent terminal video analysis algorithm combining edge calculation and deep learning | |
CN116258941A (en) | Yolox target detection lightweight improvement method based on Android platform | |
CN114023354A (en) | Guidance type acoustic event detection model training method based on focusing loss function | |
CN116559667A (en) | Model training method and device, battery detection method and device, equipment and medium | |
CN116741159A (en) | Audio classification and model training method and device, electronic equipment and storage medium | |
CN115170988A (en) | Power grid line fault identification method and system based on deep learning | |
CN117238299B (en) | Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line | |
CN111563886B (en) | Unsupervised feature learning-based tunnel steel rail surface disease detection method and device | |
CN116977807A (en) | Multi-sensor fusion-based intelligent monitoring system and method for refrigerator | |
CN117054754A (en) | Quick radio storm signal searching method based on target detection model | |
CN116884435A (en) | Voice event detection method and device based on audio prompt learning | |
CN115712834A (en) | Alarm false alarm detection method, device, equipment and storage medium | |
CN114973173A (en) | Method and device for classifying driving scene data, electronic equipment and storage medium | |
CN111652083B (en) | Weak supervision time sequence action detection method and system based on self-adaptive sampling | |
CN114219051A (en) | Image classification method, classification model training method and device and electronic equipment | |
CN114169623A (en) | Power equipment fault analysis method and device, electronic equipment and storage medium | |
CN117611957B (en) | Unsupervised visual representation learning method and system based on unified positive and negative pseudo labels | |
CN115100592B (en) | Method and device for identifying hidden danger of external damage of power transmission channel and storage medium | |
CN113743355B (en) | Switch device state checking method, device, system and computer equipment | |
CN117909813A (en) | System for classifying and storing data by using deep learning technology | |
CN117372723A (en) | Intelligent substation violation operation early warning system | |
CN115411724A (en) | Wind power generation system and method monitored through cloud computing of cloud server | |
CN116681195A (en) | Robot road-finding device based on artificial intelligence | |
Zhang et al. | Vulcan: Automatic Query Planning for Live {ML} Analytics | |
CN118015839A (en) | Expressway road domain risk prediction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |