CN117238299B - Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line - Google Patents

Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line Download PDF

Info

Publication number
CN117238299B
CN117238299B CN202311506466.4A CN202311506466A CN117238299B CN 117238299 B CN117238299 B CN 117238299B CN 202311506466 A CN202311506466 A CN 202311506466A CN 117238299 B CN117238299 B CN 117238299B
Authority
CN
China
Prior art keywords
bird
model
sound
background sound
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311506466.4A
Other languages
Chinese (zh)
Other versions
CN117238299A (en
Inventor
沈浩
李丹丹
周超
黄振宁
梅红伟
吴雄
刘辉
贾然
李常勇
程磊
张洋
刘嵘
刘传彬
李成
毛永杰
周学坤
周立志
孟海磊
孙晓斌
耿博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongying Power Industry Bureau Of State Grid Shandong Electric Power Co
Wuhan NARI Ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Dongying Power Industry Bureau Of State Grid Shandong Electric Power Co
Wuhan NARI Ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongying Power Industry Bureau Of State Grid Shandong Electric Power Co, Wuhan NARI Ltd, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, Shenzhen International Graduate School of Tsinghua University filed Critical Dongying Power Industry Bureau Of State Grid Shandong Electric Power Co
Priority to CN202311506466.4A priority Critical patent/CN117238299B/en
Publication of CN117238299A publication Critical patent/CN117238299A/en
Application granted granted Critical
Publication of CN117238299B publication Critical patent/CN117238299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, a medium and equipment for optimizing a bird voice recognition model of a power transmission line, and relates to the technical field of power transmission line monitoring. The method comprises the steps of obtaining background sound of a current scene within a period of time; sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result; performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line; and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set. The invention solves the problem of insufficient training resources of the model under different scenes, can update a scene hidden trouble identification model with high false alarm rate in time, and reduces false alarm and false alarm.

Description

Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line
Technical Field
The invention relates to the technical field of power transmission line monitoring, in particular to a power transmission line bird voice recognition model optimization method, a system, a medium and equipment.
Background
In recent years, the power grid is used for greatly promoting the construction of a visual line, so that the inspection efficiency of the power transmission line is improved by a visual and intelligent means, the visual technology of the power transmission channel plays an important role in timely finding hidden danger and timely early warning, and makes outstanding contributions to the safety and reliability guarantee and intelligent promotion of the power transmission line. Although the visualization technology of the transmission channel and the artificial intelligence technology related to the visualization technology of the transmission channel have advanced in recent years, the visualization technology of the transmission channel is still in a starting stage, and a long path is required to be taken for meeting the requirements of transmission operation and maintenance. To meet the requirements of safety, reliability and intelligent transmission lines, not only visual technology needs to be researched, but also related technologies of hearing, touch and smell should be widely researched.
Bird activities are an important factor causing faults of the transmission line, and more prevention measures against bird damage are continuously taken, for example, strengthening inspection, using bird repellent devices and the like. But the problems of wide range of the whole bird prevention area, more towers, great difficulty in bird recognition and the like are faced, and the general measures are difficult to achieve the targeted prevention and treatment effect. The accurate identification of bird images and sound classification and image audio retrieval are important methods for solving the problem of preventing and controlling bird pests such as large birds like Geranium in the environment of a power transmission line.
The current working mode of the terminal bird song classifying algorithm based on deep learning is as follows: and (3) carrying out sample data accumulation labeling on the cloud server, carrying out model training, and deploying the model training after test quantization compression to the mobile terminal for reasoning. However, as the number of the deployment of the monitoring devices increases, each monitoring field becomes diversified, so that the detection model trained in the cloud and deployed to the monitoring terminal lacks certain robustness for identifying the bird sounds under different backgrounds. Limited to the resources of the cloud training server, training a model for each monitoring device's installation scenario alone is almost impossible. Each transmission line monitoring device monitors a fixed scene for a long time, and the bird species moving in the environment of the fixed scene are not changed greatly in a period of time. Therefore, if the sound classification algorithm is researched only for a fixed scene, good effect can be obtained, when the scene changes, the algorithm is debugged timely, parameters are modified, and finally, higher detection performance can be achieved. Therefore, online tuning of the end-side based deep learning recognition model is of great practical significance.
The current analysis flow of the bird sounds monitored by the power transmission line is mainly carried out on a cloud server. Along with the continuous increase of the monitoring devices, the monitoring density is continuously increased. The continuous sound data is gushed to the cloud analysis server through the wireless network, and huge pressure is brought to the 4g wireless network and the cloud server. The analysis server can not timely analyze and process massive audio data, so that a large amount of data backlog is caused, alarm delay is caused, a user can not be timely notified of hidden danger, and huge hidden danger is brought to the safety of a power transmission line.
The intelligent migration of hidden danger analysis to monitoring equipment for front-end analysis is a necessary trend of hidden danger identification of power transmission line monitoring, but due to the limitation of low calculation power and low power consumption of the front-end equipment, a common deep neural network model is too large, is not suitable for front-end hidden danger identification, analysis and calculation at a terminal, and only a lightweight deep neural network model can be deployed. However, the lightweight network model has insufficient expression of the characteristics of the audio, and the recognition accuracy cannot reach that of the cloud analysis server. In addition, the scene of the power transmission line changes in many ways, one model trained by the cloud server is used for identifying hidden dangers under all power transmission scenes, so that the network model and the specifically identified power transmission line monitoring scene are not fused sufficiently, a large number of false positives are caused, and the identification precision is reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide the power transmission line bird voice recognition model optimization method, system, medium and equipment, which can screen out a subset of equipment which needs to be subjected to model iterative upgrade from mass front-end analysis equipment running, and avoid the phenomenon of performing model iterative upgrade training on all the equipment. And through real-time model fine tuning, the problem that model training resources are insufficient under different scenes is solved greatly, a scene hidden danger identification model with high false alarm rate can be updated in time, and false alarm are reduced.
In order to achieve the above object, the present invention is realized by the following technical scheme:
the invention provides a power transmission line bird voice recognition model optimization method, which comprises the following steps:
acquiring background sound of a current scene within a period of time, and preprocessing the background sound;
sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result;
performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line;
and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine-tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
Further, the step of preprocessing the background sound includes:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
Further, the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined.
Furthermore, the specific steps of using the cloud analysis server to perform secondary classification detection on the background sound are as follows:
the bird song signal is subjected to pre-emphasis and sliding window uniform segmentation processing, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally the bird species are predicted and recognized through bird song.
Further, the training process of the bird voice recognition model is as follows:
firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters.
Further, the alarm sound is set according to different common bird sounds in each scene environment.
Furthermore, the bird voice feature adopts a Mel spectrogram, a digital code is given to each region, then the code is encoded to obtain region features, and then the region features and the voice features are spliced into new features to be input as a model.
Further, after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform.
Further, the specific steps of performing sample expansion by using the buffered sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data.
Further, the preset sample is a known labeled target bird sound fragment dataset.
Further, the bird voice recognition model structure adopts an afflicientnet network.
The second aspect of the invention provides an optimization system for a bird voice recognition model of a power transmission line, comprising:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
A third aspect of the present invention provides a medium having stored thereon a program which when executed by a processor performs the steps of the transmission line bird voice recognition model optimization method according to the first aspect of the present invention.
A fourth aspect of the invention provides an apparatus comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method for optimizing a bird voice recognition model for an electric transmission line according to the first aspect of the invention when the program is executed.
The one or more of the above technical solutions have the following beneficial effects:
the invention discloses a method, a system, a medium and equipment for optimizing a bird voice recognition model of a power transmission line, wherein front-end analysis operation data are acquired from a cloud analysis server and are subjected to classification analysis, alarm information after confirmation is used as standard information to be fed back to terminal equipment, and on the equipment terminal side, the adjustment of neural network model parameters is performed on the basis of the uploaded alarm information as a training sample, so that the adaptability of the model to a scene is improved. The invention can screen out the subset of the devices needing to be subjected to model iteration upgrading from the mass front-end analysis devices running, and avoids the phenomenon of carrying out model iteration upgrading training on all the devices. And through real-time model fine tuning, the problem that model training resources are insufficient under different scenes is solved greatly, a scene hidden danger identification model with high false alarm rate can be updated in time, and false alarm are reduced.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flowchart of a method for optimizing a bird voice recognition model of a transmission line according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;
embodiment one:
the first embodiment of the invention provides an optimization method for a bird voice recognition model of a power transmission line, as shown in fig. 1, background voice is collected through equipment for a period of time, the background voice is input into a front end for analysis, the result of the front end analysis is transmitted to a cloud for secondary analysis, and the secondary analysis result is confirmed manually, so that the possibility of false alarm and missing report is reduced. After manual confirmation, the platform supplements the sample to the equipment, and the equipment generates a new sample on line according to the supplemented data for end-side model fine tuning training. By the method, the subset of the devices needing to be subjected to model iterative upgrade can be screened from the mass front-end analysis devices running in the running mode, and the phenomenon of performing model iterative upgrade training on all the devices is avoided.
The method specifically comprises the following steps:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
In step 1, after the sound collection device operates for a period of time, the background sound collected by the device is collected periodically, and the background sound is preprocessed. The step of preprocessing the background sound comprises:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
And transmitting the preprocessed background sound to a front-end server for analysis.
In step 2, the specific steps of using the front end server and the cloud analysis server to classify and detect the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
firstly, comparing the background sound preprocessed by the front-end server with a preset alarm sound, and determining that the background sound is bird sound. Specifically, mel spectrum features are extracted for two sounds, cosine similarity is calculated for the two features, and when the similarity is greater than a set threshold, the two sounds are considered to be the same type of sound.
When the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined. And after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform. The terminal monitoring equipment periodically and actively requests manual confirmation to the platform, and the confirmed alarm information of the equipment sound is used as a training sample for fine adjustment of the model.
In a specific embodiment, the cloud analysis server calculates power to be far from the advanced analysis server, so that the bird voice recognition model deployed in the cloud has more parameters, the model is larger, and the recognition accuracy is better. Therefore, the cloud server is used for carrying out secondary classification on the background sound, so that the quality of data in the sample set is ensured; the method is characterized in that pre-emphasis, uniform sliding window segmentation and the like are carried out on the bird song signals, the bird song signals are converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally prediction recognition of bird species through bird song is achieved.
Firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters.
The model structure adopts an Efficient network.
The warning sounds are set according to different common bird sounds in each scene environment. More specifically, a bird song database and a regional bird song distribution information base are constructed according to bird song distribution information around different regions and power transmission lines/power transmission networks, and meanwhile, the characteristic of strong bird song distribution territory is combined, and the current regional environment information is input into a model together in a regional gridding mode to predict the current regional environment information and bird song characteristics.
The bird sound calling feature adopts a Mel spectrogram, a digital code is given to each region, then the code is coded (one-hot) to obtain the region feature, and then the region feature and the sound calling feature are spliced into a new feature as a model to be input.
In step 3, the specific steps of performing sample expansion by using the buffered sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data. Specifically, firstly, sequentially selecting target type fragments from marked preset samples, then traversing all the preliminary sample sets, randomly inserting the target fragments into the selected preliminary samples, and achieving the purpose of enhancing the data diversity of the samples in this way.
Wherein the preset sample is a known target bird sound fragment data set with marks. In this embodiment, a part of sound samples and labels of the sound samples, which need to be identified, are preset when the sound collection device is installed and deployed.
According to the embodiment, the diversity of the training samples is enriched through sample expansion, and meanwhile, the characteristics of the training samples are highlighted through data enhancement, so that the recognition accuracy of the recognition model is improved.
In step 4, the embodiment performs model fine-tuning based on the curing and activation strategy of the neural network hidden layer of the parameter weight. Because the calculation force and the memory limitation of the end side are limited, the online model fine-tuning calculation of the end side cannot iteratively update all layers of the neural network model, and only a certain layer with a larger contribution weight to the target detection task must be subjected to iterative update. Thus, the trimming process includes: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In one specific embodiment, more specifically the steps include:
(1) And (5) pre-training the model setting of the bird voice recognition model. And taking a bird voice recognition model which is preset in the terminal equipment and is running as a pre-training model, wherein the bird voice recognition model structure adopts an Efficientnet network, the Efficientnet is divided into 9 stages in total, and a convolution layer is followed by a BN layer and a Swish activation function by default. stage 1 is a 3x3 convolutional layer. For stage 2 to stage 8, the structure of MBConv in repeated stacks, MBConv for the main branch, a convolution layer of 1x1 (+bn+swish) is used for the up-scaling, followed by a DW convolution (+bn+swish), the convolution kernel size being 3x3 or 5x5, followed by a SE block, followed by a convolution of 1x1 (+bn) for the down-scaling, and finally by a dropout operation. And finally, directly transmitting the matrix of the input branch to be added with the main branch to obtain the final output. stage 9 consists of three parts, first a 1x1 convolution, then averaging pooling, and finally a fully connected layer. The bird sound obtains feature information through a feature extraction part, the obtained feature input model is subjected to convolution, pooling, batch normalization and other steps, and finally confidence scores of birds corresponding to the full-connection layer data are obtained.
(2) And preprocessing the terminal equipment model. The pre-training model keeps the operators used in the training process of BatchNorm (batch normalization), dropout (random discard) and the like during conversion.
(3) For fine tuning scenes, the model is not required to be built from zero at the end side, only the pre-training model is required to be loaded, parameters of the front layer of the neural network are fixed, and only the last layer of the full-connection layer is used for fine tuning. The input name of the last layer is viewed through the netron model visualization tool (or model json file output by other tools), leaving the pre-trained model of the last layer removed.
Embodiment two:
the second embodiment of the invention provides an optimization system for a bird voice recognition model of a power transmission line, which comprises the following components:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In the data acquisition module, after the sound acquisition equipment operates for a period of time, background sound acquired by the equipment is collected regularly, and the background sound is preprocessed by the preprocessing module.
The step of preprocessing the background sound by the preprocessing module comprises the following steps:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
And transmitting the preprocessed background sound to a front-end server for analysis.
In the classification detection module, the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
firstly, comparing the background sound preprocessed by the front-end server with a preset alarm sound, and determining that the background sound is bird sound. Specifically, mel spectrum features are extracted for two sounds, cosine similarity is calculated for the two features, and when the similarity is greater than a set threshold, the two sounds are considered to be the same type of sound.
When the preprocessed background sound reaches the model optimization standard, the cloud analysis server is utilized to carry out secondary classification detection on the background sound, and the alarm type of the background sound is determined. And after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed through the monitoring platform. The terminal monitoring equipment periodically and actively requests manual confirmation to the platform, and the confirmed alarm information of the equipment sound is used as a training sample for fine adjustment of the model.
In a specific embodiment, the cloud analysis server calculates power to be far from the advanced analysis server, so that the bird voice recognition model deployed in the cloud has more parameters, the model is larger, and the recognition accuracy is better. Therefore, the cloud server is used for carrying out secondary classification on the background sound, so that the quality of data in the sample set is ensured; the method is characterized in that pre-emphasis, uniform sliding window segmentation and the like are carried out on the bird song signals, the bird song signals are converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally prediction recognition of bird species through bird song is achieved.
Firstly, extracting bird voice characteristics from bird voice data, meanwhile splicing regional characteristics and voice characteristics to be used as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, then iteratively updating the parameters in continuous training through data and forward and reverse propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters. Model structure: the model structure adopts an Efficient network.
The warning sounds are set according to different common bird sounds in each scene environment. More specifically, a bird song database and a regional bird song distribution information base are constructed according to bird song distribution information around different regions and power transmission lines/power transmission networks, and meanwhile, the characteristic of strong bird song distribution territory is combined, and the current regional environment information is input into a model together in a regional gridding mode to predict the current regional environment information and bird song characteristics.
The bird sound calling feature adopts a Mel spectrogram, a digital code is given to each region, then the code is coded (one-hot) to obtain the region feature, and then the region feature and the sound calling feature are spliced into a new feature as a model to be input.
In the sample expansion module, the specific steps of sample expansion by using the cached sound record, the preset sample and the preliminary sample set include:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data. Specifically, firstly, sequentially selecting target type fragments from marked preset samples, then traversing all the preliminary sample sets, randomly inserting the target fragments into the selected preliminary samples, and achieving the purpose of enhancing the data diversity of the samples in this way.
The preset sample is a known target bird sound fragment data set with marks, and is stored in a database after being collected in advance. In this embodiment, a part of sound samples and labels of the sound samples, which need to be identified, are preset when the sound collection device is installed and deployed.
According to the embodiment, the diversity of the training samples is enriched through sample expansion, and meanwhile, the characteristics of the training samples are highlighted through data enhancement, so that the recognition accuracy of the recognition model is improved.
In the model fine tuning module, the embodiment performs model fine tuning based on the curing and activating strategy of the neural network hidden layer of the parameter weight. Because the calculation force and the memory limitation of the end side are limited, the online model fine-tuning calculation of the end side cannot iteratively update all layers of the neural network model, and only a certain layer with a larger contribution weight to the target detection task must be subjected to iterative update. Thus, the trimming process includes: and fixing the bird voice recognition model parameter layer, newly initializing a layer as a fine tuning layer, and configuring the fine tuning layer only for training the expanded sample set.
In one specific embodiment, more specifically the steps include:
(1) And (5) pre-training the model setting of the bird voice recognition model. And taking a bird voice recognition model which is preset in the terminal equipment and is running as a pre-training model, wherein the bird voice recognition model structure adopts an Efficientnet network, the Efficientnet is divided into 9 stages in total, and a convolution layer is followed by a BN layer and a Swish activation function by default. stage 1 is a 3x3 convolutional layer. For stage 2 to stage 8, the structure of MBConv in repeated stacks, MBConv for the main branch, a convolution layer of 1x1 (+bn+swish) is used for the up-scaling, followed by a DW convolution (+bn+swish), the convolution kernel size being 3x3 or 5x5, followed by a SE block, followed by a convolution of 1x1 (+bn) for the down-scaling, and finally by a dropout operation. And finally, directly transmitting the matrix of the input branch to be added with the main branch to obtain the final output. stage 9 consists of three parts, first a 1x1 convolution, then averaging pooling, and finally a fully connected layer. The bird sound obtains feature information through a feature extraction part, the obtained feature input model is subjected to convolution, pooling, batch normalization and other steps, and finally confidence scores of birds corresponding to the full-connection layer data are obtained.
(2) And preprocessing the terminal equipment model. The pre-training model keeps operators used in the training process of BatchNorm, dropout and the like during conversion.
(3) For fine tuning scenes, the model is not required to be built from zero at the end side, only the pre-training model is required to be loaded, parameters of the front layer of the neural network are fixed, and only the last layer of the full-connection layer is used for fine tuning. The input name of the last layer is viewed through the netron model visualization tool (or model json file output by other tools), leaving the pre-trained model of the last layer removed.
Embodiment III:
the third embodiment of the present invention provides a medium, on which a program is stored, the program when executed by a processor implementing the steps in the power transmission line bird voice recognition model optimization method according to the first embodiment of the present invention, where the steps are as follows:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
The detailed steps are the same as those of the power transmission line bird voice recognition model optimization method provided in the first embodiment, and are not repeated here.
Embodiment four:
the fourth embodiment of the invention provides a device, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the power transmission line bird sound recognition model optimization method according to the first embodiment of the invention when executing the program, and the steps are as follows:
step 1, obtaining background sound of a current scene within a period of time, and preprocessing the background sound.
And step 2, sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result.
And step 3, performing sample expansion by using the cached sound record, the preset sample and the preliminary sample set, and generating a training sample set on line.
And 4, fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set.
The detailed steps are the same as those of the power transmission line bird voice recognition model optimization method provided in the first embodiment, and are not repeated here.
The steps involved in the second, third and fourth embodiments correspond to those of the first embodiment, and the detailed description of the second embodiment will be referred to in the related description section of the first embodiment.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (9)

1. The method for optimizing the bird voice recognition model of the power transmission line is characterized by comprising the following steps of:
acquiring background sound of a current scene within a period of time, and preprocessing the background sound;
sequentially utilizing a front-end server and a cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and generating a preliminary sample set according to the classification detection result;
performing sample expansion by using the cached sound record, a preset sample and a preliminary sample set, and generating a training sample set on line;
and fine-tuning the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine-tuning process comprises the following steps: fixing bird voice recognition model parameter layers, newly initializing one layer as a fine tuning layer, and configuring the fine tuning layer only for training an expanded sample set;
the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, performing secondary classification detection on the background sound by using a cloud analysis server, and determining the alarm type of the background sound;
the specific steps of carrying out secondary classification detection on the background sound by using the cloud analysis server are as follows:
pre-emphasis and sliding window uniform segmentation processing are carried out on the bird song signal, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally, prediction recognition is carried out on bird species through bird song;
the training process of the bird voice recognition model comprises the following steps:
firstly, extracting bird voice characteristics from bird voice data, splicing regional characteristics and voice characteristics to serve as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, iteratively updating the parameters in continuous training through data and forward and backward propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters;
the alarm sound is set according to different common bird sounds in each scene environment; constructing a bird song database and a regional bird species distribution information base according to bird species distribution information around different regions and power transmission lines/power transmission grids, and simultaneously, combining the characteristic of strong regional bird species distribution, and jointly inputting the current regional environment information and bird song characteristics into a model for prediction in a regional gridding mode;
the bird voice recognition model structure adopts an Efficientenet network; the Efficientnet is divided into 9 stages altogether, wherein stage 9 consists of three parts, and finally a full connection layer is arranged; for fine tuning scenarios, the parameters of the front layer of the neural network are fixed, only the last layer of the fully connected layer is used for fine tuning.
2. The method for optimizing a bird voice recognition model for a power transmission line according to claim 1, wherein the step of preprocessing the background sound comprises:
and (3) detecting strong changes of all the collected background sounds, judging the collected background sounds from three angles of the sound, the time domain and the frequency domain, calculating three characteristic values of sound pressure level, frame maximum energy and frequency domain average energy, comparing the characteristic values with a preset threshold value, and retaining the three characteristic values when the three characteristic values exceed the preset threshold value, otherwise filtering the background sounds.
3. The method for optimizing the bird voice recognition model of the power transmission line according to claim 1, wherein bird voice features adopt mel frequency spectrograms, each region is given a digital code, the code is then encoded to obtain region features, and the region features and the voice features are spliced into new features to be input as the model.
4. The method for optimizing bird voice recognition model of power transmission line according to claim 1, wherein after the classification detection result of the background sound is obtained, the classification detection result is manually checked and confirmed by the monitoring platform.
5. The method for optimizing a bird voice recognition model for a power transmission line according to claim 1, wherein the specific step of performing sample expansion using the buffered voice record, the preset sample, and the preliminary sample set comprises:
mixing and enhancing the labeling sample and the environment sound record in the cached sound record by using a confusion method;
randomly adding target type fragments marked in a preset sample into the sample of the preliminary sample set to further enhance sample data.
6. The method of optimizing transmission line bird voice recognition models of claim 5, wherein the pre-set samples are known tagged target bird voice clip datasets.
7. The utility model optimization system is characterized in that includes:
the data acquisition module is configured to acquire background sound of the current scene within a period of time and preprocess the background sound;
the classification detection module is configured to sequentially utilize the front-end server and the cloud analysis server to carry out classification detection on the background sound to obtain a classification detection result of the background sound, and a preliminary sample set is generated according to the classification detection result;
the sample expansion module is configured to utilize the cached sound record, the preset sample and the preliminary sample set to carry out sample expansion and generate a training sample set on line;
the model fine tuning module is configured to fine tune the bird voice recognition model operated at the terminal equipment side by using the expanded sample set, wherein the fine tuning process comprises the following steps: fixing bird voice recognition model parameter layers, newly initializing one layer as a fine tuning layer, and configuring the fine tuning layer only for training an expanded sample set;
the specific steps of sequentially utilizing the front-end server and the cloud analysis server to carry out classification detection on the background sound are as follows:
the front-end server is utilized to compare and classify the preprocessed background sound and the preset alarm sound, and whether the model optimization standard is reached is judged;
when the preprocessed background sound reaches the model optimization standard, performing secondary classification detection on the background sound by using a cloud analysis server, and determining the alarm type of the background sound;
the specific steps of carrying out secondary classification detection on the background sound by using the cloud analysis server are as follows:
pre-emphasis and sliding window uniform segmentation processing are carried out on the bird song signal, the bird song signal is converted into corresponding image characteristic information through a sound image conversion method, the obtained image characteristic information is used as input, a trained bird song recognition model is used, and finally, prediction recognition is carried out on bird species through bird song;
the training process of the bird voice recognition model comprises the following steps:
firstly, extracting bird voice characteristics from bird voice data, splicing regional characteristics and voice characteristics to serve as new characteristics, inputting the new characteristics into a model, training the model by using the new characteristics, specifically, after a model structure is constructed, randomly initializing internal parameters of the model, iteratively updating the parameters in continuous training through data and forward and backward propagation algorithms, learning parameters capable of fitting the distribution of transmission scene voice data, and finally, jointly realizing the identification of bird voice by using the model structure and the corresponding parameters;
the alarm sound is set according to different common bird sounds in each scene environment; constructing a bird song database and a regional bird species distribution information base according to bird species distribution information around different regions and power transmission lines/power transmission grids, and simultaneously, combining the characteristic of strong regional bird species distribution, and jointly inputting the current regional environment information and bird song characteristics into a model for prediction in a regional gridding mode;
the bird voice recognition model structure adopts an Efficientenet network; the Efficientnet is divided into 9 stages altogether, wherein stage 9 consists of three parts, and finally a full connection layer is arranged; for fine tuning scenarios, the parameters of the front layer of the neural network are fixed, only the last layer of the fully connected layer is used for fine tuning.
8. A computer readable storage medium, characterized in that a plurality of instructions are stored, which instructions are adapted to be loaded by a processor of a terminal device and to perform the transmission line bird voice recognition model optimization method of any one of claims 1-6.
9. A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the transmission line bird voice recognition model optimization method of any one of claims 1-6.
CN202311506466.4A 2023-11-14 2023-11-14 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line Active CN117238299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311506466.4A CN117238299B (en) 2023-11-14 2023-11-14 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311506466.4A CN117238299B (en) 2023-11-14 2023-11-14 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line

Publications (2)

Publication Number Publication Date
CN117238299A CN117238299A (en) 2023-12-15
CN117238299B true CN117238299B (en) 2024-01-30

Family

ID=89086506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311506466.4A Active CN117238299B (en) 2023-11-14 2023-11-14 Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line

Country Status (1)

Country Link
CN (1) CN117238299B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113180030A (en) * 2021-07-01 2021-07-30 广东电网有限责任公司中山供电局 Embedded bird recognition system
CN113707158A (en) * 2021-08-02 2021-11-26 南昌大学 Power grid harmful bird seed singing recognition method based on VGGish migration learning network
CN114863937A (en) * 2022-05-17 2022-08-05 武汉工程大学 Hybrid birdsong identification method based on deep migration learning and XGboost
WO2022205249A1 (en) * 2021-03-31 2022-10-06 华为技术有限公司 Audio feature compensation method, audio recognition method, and related product
CN115299428A (en) * 2022-08-04 2022-11-08 国网江苏省电力有限公司南通供电分公司 Intelligent bird system that drives of thing networking based on degree of depth study
CN116687438A (en) * 2023-05-30 2023-09-05 北京石油化工学院 Method and device for identifying borborygmus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11914661B2 (en) * 2020-09-02 2024-02-27 Google Llc Integration of web and media snippets into map applications

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022205249A1 (en) * 2021-03-31 2022-10-06 华为技术有限公司 Audio feature compensation method, audio recognition method, and related product
CN113180030A (en) * 2021-07-01 2021-07-30 广东电网有限责任公司中山供电局 Embedded bird recognition system
CN113707158A (en) * 2021-08-02 2021-11-26 南昌大学 Power grid harmful bird seed singing recognition method based on VGGish migration learning network
CN114863937A (en) * 2022-05-17 2022-08-05 武汉工程大学 Hybrid birdsong identification method based on deep migration learning and XGboost
CN115299428A (en) * 2022-08-04 2022-11-08 国网江苏省电力有限公司南通供电分公司 Intelligent bird system that drives of thing networking based on degree of depth study
CN116687438A (en) * 2023-05-30 2023-09-05 北京石油化工学院 Method and device for identifying borborygmus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于语谱图特征信息分割提取的声景观中鸟类生物多样性分析;蒋锦刚;邵小云;万海波;齐家国;荆长伟;程天佑;;生态学报(23);全文 *
多特征融合的鸟类物种识别方法;谢将剑;杨俊;邢照亮;张卓;陈新;;应用声学(02);全文 *

Also Published As

Publication number Publication date
CN117238299A (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN111462167A (en) Intelligent terminal video analysis algorithm combining edge calculation and deep learning
CN116258941A (en) Yolox target detection lightweight improvement method based on Android platform
CN114023354A (en) Guidance type acoustic event detection model training method based on focusing loss function
CN116559667A (en) Model training method and device, battery detection method and device, equipment and medium
CN116741159A (en) Audio classification and model training method and device, electronic equipment and storage medium
CN115170988A (en) Power grid line fault identification method and system based on deep learning
CN117238299B (en) Method, system, medium and equipment for optimizing bird voice recognition model of power transmission line
CN111563886B (en) Unsupervised feature learning-based tunnel steel rail surface disease detection method and device
CN116977807A (en) Multi-sensor fusion-based intelligent monitoring system and method for refrigerator
CN117054754A (en) Quick radio storm signal searching method based on target detection model
CN116884435A (en) Voice event detection method and device based on audio prompt learning
CN115712834A (en) Alarm false alarm detection method, device, equipment and storage medium
CN114973173A (en) Method and device for classifying driving scene data, electronic equipment and storage medium
CN111652083B (en) Weak supervision time sequence action detection method and system based on self-adaptive sampling
CN114219051A (en) Image classification method, classification model training method and device and electronic equipment
CN114169623A (en) Power equipment fault analysis method and device, electronic equipment and storage medium
CN117611957B (en) Unsupervised visual representation learning method and system based on unified positive and negative pseudo labels
CN115100592B (en) Method and device for identifying hidden danger of external damage of power transmission channel and storage medium
CN113743355B (en) Switch device state checking method, device, system and computer equipment
CN117909813A (en) System for classifying and storing data by using deep learning technology
CN117372723A (en) Intelligent substation violation operation early warning system
CN115411724A (en) Wind power generation system and method monitored through cloud computing of cloud server
CN116681195A (en) Robot road-finding device based on artificial intelligence
Zhang et al. Vulcan: Automatic Query Planning for Live {ML} Analytics
CN118015839A (en) Expressway road domain risk prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant