CN111951812A - Animal emotion recognition method and device and electronic equipment - Google Patents

Animal emotion recognition method and device and electronic equipment Download PDF

Info

Publication number
CN111951812A
CN111951812A CN202010871315.9A CN202010871315A CN111951812A CN 111951812 A CN111951812 A CN 111951812A CN 202010871315 A CN202010871315 A CN 202010871315A CN 111951812 A CN111951812 A CN 111951812A
Authority
CN
China
Prior art keywords
audio
frequency
animal
emotion recognition
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010871315.9A
Other languages
Chinese (zh)
Inventor
储莫华
游道军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Aicai Network Technology Co ltd
Original Assignee
Hangzhou Aicai Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Aicai Network Technology Co ltd filed Critical Hangzhou Aicai Network Technology Co ltd
Priority to CN202010871315.9A priority Critical patent/CN111951812A/en
Publication of CN111951812A publication Critical patent/CN111951812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The embodiment of the invention provides an animal emotion recognition method, an animal emotion recognition device and electronic equipment, wherein the method comprises the following steps: determining the type of an animal corresponding to the audio to be recognized; obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence; inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance. By the embodiment of the invention, the current emotion of the animal is accurately identified, and the management and the communication between the user and the animal are facilitated.

Description

Animal emotion recognition method and device and electronic equipment
Technical Field
The invention relates to the technical field of voice recognition, in particular to an animal emotion recognition method and device and electronic equipment.
Background
Animals, like humans, may also have joy, anger, sadness, for example, a dog that will return to waning when happy, or a kay call when feared. The existing method can only identify the sound of the animal through human experience and judge the emotional change of the animal, and cannot accurately know the actual demand of the animal and give help, care and the like in time.
Disclosure of Invention
The embodiment of the invention aims to provide an animal emotion recognition method, device and electronic equipment, so as to solve the problems that the actual requirements of animals cannot be accurately known, and help, care and the like are given in time.
In order to solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention provides an animal emotion recognition method, including:
determining the type of an animal corresponding to the audio to be recognized;
obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
In a second aspect, an embodiment of the present invention provides an animal emotion recognition apparatus, including:
the type acquisition unit is used for determining the type of the animal corresponding to the audio to be identified;
the characteristic analysis unit is used for obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
the emotion recognition unit is used for inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor is used for executing the program stored in the memory to realize the steps of the animal emotion recognition method according to the first aspect.
In a fourth aspect, the embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the animal emotion recognition method according to the first aspect are realized.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention determines the type of the animal corresponding to the audio to be identified; obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence; and inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain the emotion identification of the audio to be recognized. By the embodiment of the invention, the current emotion of the animal is accurately identified, and the management and the communication between the user and the animal are facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic view of a first flowchart of a method for recognizing animal emotion according to an embodiment of the present invention;
FIG. 2 is a schematic view of a second flowchart of an animal emotion recognition method provided by the embodiment of the invention;
FIG. 3 is a schematic diagram of the module components of an animal emotion recognition device provided by an embodiment of the invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides an animal emotion recognition method and device and electronic equipment.
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides an animal emotion recognition method. The method may specifically comprise the steps of:
and step S01, determining the type of the animal corresponding to the audio to be recognized.
Because different animal types have different sound characteristics, before emotion recognition, the animal type corresponding to the acquired audio to be recognized needs to be determined. The animal types can be classified according to actual needs, for example, according to the categories: cats, dogs, birds, etc.; the dog can be further subdivided according to different breeds, and the dog is divided into: golden hair, pine lion, doll, etc.; further subdivision into growth stages is possible.
In practice, the animal type can be selected by the user in the animal type selection page according to actual needs, or automatically acquired by acquiring images of the animal.
Step S02, obtaining frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence.
Because the audio frequencies differ, depending primarily on frequency, sounds with high frequencies sound short and sharp, and sounds with low frequencies sound deep, each audio frequency being composed of different signal strengths at these different frequencies.
Different animal types have different physiological structures, so that the emitted audio frequency of the animal types has large difference in frequency, and therefore, the frequency sequence corresponding to each animal type needs to be acquired in advance.
Further, the frequency sequence is obtained through a preset frequency selection strategy according to the frequency range corresponding to the animal type.
The frequency range or dominant frequency range of audio that may be produced by each animal type is determined, for example, the frequency range of audio produced by dogs is 0-2000 Hz.
And then selecting representative frequencies from the frequency range according to a preset frequency selection strategy to serve as the frequency sequence corresponding to the animal type.
The frequency selection strategy can be set according to actual needs, and an average interval mode can be adopted, for example, every 10Hz is taken as an interval, and the frequency is selected to form a frequency sequence corresponding to the animal type; the frequency may also be selected in a non-equally spaced manner. The non-average interval mode can be obtained by specifically selecting a preset calculation method.
The frequency sequence corresponding to each animal type can be different, and the corresponding frequency selection strategy can also be different. In the specific determination method, in the training process of the emotion recognition model, a frequency selection strategy with the best test result is selected from a plurality of preset candidate frequency selection strategies and used for generating a frequency sequence corresponding to the animal type.
And carrying out preset spectral feature analysis on the audio to be recognized to obtain a frequency domain signal corresponding to the audio to be recognized. And selecting frequency domain signals corresponding to each frequency in the frequency sequence from the medium frequency domain signals according to the frequency sequence corresponding to the animal type to form a frequency domain signal sequence as the frequency spectrum characteristic information of the audio to be identified.
Step S03, inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identification of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
And collecting a large number of training audios of the animal types in advance, and marking the corresponding emotion identification for each training audio.
And according to the frequency sequence corresponding to the animal type, performing spectral feature analysis on each training audio to obtain spectral feature information of each training audio. And taking the frequency spectrum characteristic information of each training audio of the animal type as a sample, and training a pre-constructed neural network model to obtain an emotion recognition model corresponding to the animal type. In the actual training process, all training audios can be divided into a training set and a test set, the model is trained by the training set, and then the model after each training is tested by the test set.
The neural network model can be set according to actual needs, and the embodiment of the invention only takes three layers of neural networks as an example for illustration, and the neural network model respectively consists of an input layer, a hidden layer and an output layer, wherein the input layer comprises input nodes with a first node number, the hidden layer comprises hidden nodes with a second node number, the output layer comprises output nodes with a third node number, and the neural network model is trained by adopting back propagation.
Each output node of the output layer can correspond to one emotion identifier.
Further, the number of the emotion identifications can be set according to actual needs, and the embodiment of the present invention only provides one example of the number, where the emotion identifications include: joy, anger, sadness, startle, and sadness. And each emotion identification can be characterized by a number, e.g. happy for 0, angry for 1, sadness for 2, panic for 3, sadness for 4.
And inputting the frequency spectrum characteristic information of the audio to be recognized into the emotion recognition model after training, and outputting to obtain the emotion identification corresponding to the audio to be recognized, thereby realizing emotion recognition of the animal which sends the audio to be recognized.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention determines the type of the animal corresponding to the audio to be identified; obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence; and inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain the emotion identification of the audio to be recognized. By the embodiment of the invention, the current emotion of the animal is accurately identified, and the management and the communication between the user and the animal are facilitated.
Further, as shown in fig. 2, the specific processing manner of step S02 can be varied, and an alternative processing manner is provided below, which can be specifically referred to the processing of steps S021-S022.
And S021, intercepting an effective audio clip with a preset time length from the audio to be identified.
Since the sounds made by animals are often brief, in the beginning of the above-mentioned spectral feature analysis, a valid audio segment of a preset time length, for example, a valid audio segment of 1 second, may be cut from the audio to be recognized. The method for intercepting the effective audio segment may be set according to actual needs, for example, the audio segment with the maximum average signal intensity may be intercepted according to the signal intensity in the audio to be identified, or the audio segment after the signal intensity is greater than a preset intensity threshold is intercepted.
And S022, performing preset fast Fourier transform on the effective audio clip, and obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type.
Through Fast Fourier Transform (FFT), time domain signals of the effective audio segments can be converted into frequency domain signals, and signal intensity corresponding to each frequency is obtained. And then according to the frequency sequence of the animal type, extracting a frequency domain signal sequence corresponding to the frequency sequence from the converted frequency list to serve as a frequency spectrum characteristic signal of the audio to be identified.
The specific processing manner of step S022 can be varied, and an alternative processing manner is provided below, which can be specifically referred to the processing of steps S0221-S0224 described below.
And S0221, dividing the effective audio segment into audio sub-segments with preset segment numbers.
In the process of performing the preset fast fourier transform on the effective audio segment, the effective audio segment may be segmented according to the preset number of segments to obtain a plurality of audio sub-segments. For example, a 1 second segment of valid audio may be divided into 10 segments of 100 millisecond audio sub-segments.
And S0222, performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment.
Then, according to the preset sampling frequency, a time domain signal sequence of each audio sub-segment is obtained. For example, the sampling frequency is 44100/s, and the sampling precision is 16-bit floating point number, each 100-millisecond audio sub-segment will correspond to a time-domain signal sequence consisting of 4410 time-domain signals.
And carrying out fast Fourier transform on the time domain signal sequence of each audio sub-segment, and converting to obtain a frequency domain signal sequence of the audio sub-segment. The frequency domain signal sequence obtained by the conversion can be set according to parameters of the fast fourier transform, and includes frequency domain signals corresponding to each frequency within a set frequency range, for example, includes frequency domain signals corresponding to each frequency within a range of 0 to 4095 Hz. The value range of the signal intensity of each frequency domain signal can be a floating point number between 0 and 1, and the floating point number is used for representing the sound intensity under the corresponding frequency.
And S0223, extracting a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequence of the audio sub-segment according to the frequency sequence corresponding to the animal type.
For example, it is preset that the frequency sequence corresponding to the dog includes 200 frequencies selected in an average interval manner within a range of 0 to 2000Hz, and the frequency domain signal sequence corresponding to the audio sub-segment and the frequency sequence also includes 200 frequency domain signals.
And S0224, splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified.
And splicing the frequency domain signal sequences corresponding to the frequency sequences according to the sequence of the audio frequency sub-sequences, thereby obtaining the frequency spectrum characteristic information overlapping the audio frequency to be identified. For example, by splicing each audio sub-segment in the above example with the frequency domain signal sequence number corresponding to the frequency sequence, the obtained spectral feature information of the audio to be identified includes 2000 frequency domain signals.
The number of frequency domain signals contained in the frequency spectrum characteristic information is the same as the number of first nodes of an input layer of the emotion recognition model.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention intercepts the effective audio clip with the preset time length from the audio to be identified; dividing the effective audio segment into audio sub-segments with preset segmentation quantity; performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment; extracting a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequence of the audio sub-segment according to the frequency sequence corresponding to the animal type; and splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified, so that the current emotion of the animal is accurately identified, and the management and communication between the user and the animal are facilitated.
Corresponding to the animal emotion recognition method provided by the above embodiment, based on the same technical concept, an embodiment of the present invention further provides an animal emotion recognition apparatus, fig. 3 is a schematic diagram of module compositions of the animal emotion recognition apparatus provided by the embodiment of the present invention, the animal emotion recognition apparatus is used for executing the animal emotion recognition method described in fig. 1 to fig. 2, and as shown in fig. 3, the animal emotion recognition apparatus includes: type acquisition section 301, feature analysis section 302, and emotion recognition section 303.
The type obtaining unit 301 is configured to determine an animal type corresponding to the audio to be recognized; the feature analysis unit 302 is configured to obtain frequency spectrum feature information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence; the emotion recognition unit 303 is configured to input the frequency spectrum feature information of the audio to be recognized to a preset emotion recognition model corresponding to the animal type, so as to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention determines the type of the animal corresponding to the audio to be identified; obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence; and inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain the emotion identification of the audio to be recognized. By the embodiment of the invention, the current emotion of the animal is accurately identified, and the management and the communication between the user and the animal are facilitated.
Further, the feature analysis unit includes: the device comprises an audio interception module and a feature extraction module.
The audio intercepting module is used for intercepting effective audio clips with preset time length from the audio to be identified;
the characteristic extraction module is used for carrying out preset fast Fourier transform on the effective audio frequency fragments and obtaining the frequency spectrum characteristic information of the audio frequency to be identified according to the frequency sequence corresponding to the animal type.
Further, the feature extraction module includes: the device comprises a first extraction module, a second extraction module, a third extraction module and a fourth extraction module.
The first extraction module is used for dividing the effective audio segment into audio sub-segments with preset segmentation quantity;
the second extraction module is used for performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment;
the third extraction module is used for extracting a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequence of the audio sub-segment according to the frequency sequence corresponding to the animal type;
the fourth extraction module is used for splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention intercepts the effective audio clip with the preset time length from the audio to be identified; dividing the effective audio segment into audio sub-segments with preset segmentation quantity; performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment; extracting a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequence of the audio sub-segment according to the frequency sequence corresponding to the animal type; and splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified, so that the current emotion of the animal is accurately identified, and the management and communication between the user and the animal are facilitated.
The animal emotion recognition device provided by the embodiment of the invention can realize each process in the embodiment corresponding to the animal emotion recognition method, and is not repeated here for avoiding repetition.
It should be noted that the animal emotion recognition device provided in the embodiment of the present invention and the animal emotion recognition method provided in the embodiment of the present invention are based on the same inventive concept, and therefore, specific implementation of the embodiment may refer to implementation of the animal emotion recognition method, and repeated details are not repeated.
On the basis of the same technical concept, the embodiment of the present invention further provides an electronic device for implementing the animal emotion recognition method, and fig. 4 is a schematic structural diagram of an electronic device for implementing the embodiments of the present invention, as shown in fig. 4. Electronic devices may vary widely in configuration or performance and may include one or more processors 401 and memory 402, where the memory 402 may store one or more stored applications or data. Wherein memory 402 may be transient or persistent. The application program stored in memory 402 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for the electronic device. Still further, the processor 401 may be configured to communicate with the memory 402 to execute a series of computer-executable instructions in the memory 402 on the electronic device. The electronic device may also include one or more power supplies 403, one or more wired or wireless network interfaces 404, one or more input-output interfaces 405, one or more keyboards 406.
Specifically, in this embodiment, the electronic device includes a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor is used for executing the program stored in the memory and realizing the following method steps:
determining the type of an animal corresponding to the audio to be recognized;
obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when executed by a processor, the computer program implements the following method steps:
determining the type of an animal corresponding to the audio to be recognized;
obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, an electronic device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An animal emotion recognition method, characterized in that the method comprises:
determining the type of an animal corresponding to the audio to be recognized;
obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
2. The animal emotion recognition method of claim 1, wherein the obtaining of the spectral feature information of the audio to be recognized according to the frequency sequence corresponding to the animal type comprises:
intercepting effective audio clips with preset time length from the audio to be identified;
and carrying out preset fast Fourier transform on the effective audio clip, and obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type.
3. The animal emotion recognition method of claim 2, wherein the obtaining of the spectral feature information of the audio to be recognized by performing a preset fast fourier transform on the effective audio segment and according to the frequency sequence corresponding to the animal type includes:
dividing the effective audio segment into audio sub-segments with preset segmentation quantity;
performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment;
extracting a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequence of the audio sub-segment according to the frequency sequence corresponding to the animal type;
and splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified.
4. The animal emotion recognition method of claim 3, wherein the frequency sequence is obtained by a preset frequency selection strategy according to the frequency range corresponding to the animal type.
5. The animal emotion recognition method of claim 4, wherein the emotion recognition comprises: joy, anger, sadness, startle, and sadness.
6. An animal emotion recognition apparatus, characterized in that the apparatus comprises:
the type acquisition unit is used for determining the type of the animal corresponding to the audio to be identified;
the characteristic analysis unit is used for obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type; the frequency spectrum characteristic information comprises a frequency domain signal sequence corresponding to the frequency sequence;
the emotion recognition unit is used for inputting the frequency spectrum characteristic information of the audio to be recognized into a preset emotion recognition model corresponding to the animal type to obtain an emotion identifier of the audio to be recognized; the emotion recognition model is obtained by training by taking the frequency spectrum characteristic information of the training audio of the animal type as a sample and taking the emotion identification corresponding to the training audio as a sample label in advance.
7. The animal emotion recognition device of claim 6, wherein the feature analysis unit includes:
the audio intercepting module is used for intercepting effective audio segments with preset time length from the audio to be identified;
and the characteristic extraction module is used for carrying out preset fast Fourier transform on the effective audio clip and obtaining the frequency spectrum characteristic information of the audio to be identified according to the frequency sequence corresponding to the animal type.
8. The animal emotion recognition device of claim 7, wherein the feature extraction module comprises:
the first extraction module is used for dividing the effective audio segment into audio sub-segments with preset segmentation quantity;
the second extraction module is used for performing the preset fast Fourier transform on the time domain signal sequence of the audio sub-segment according to a preset sampling frequency to obtain a frequency domain signal sequence of the audio sub-segment;
a third extraction module, configured to extract, according to the frequency sequence corresponding to the animal type, a frequency domain signal sequence corresponding to the frequency sequence from the frequency domain signal sequences of the audio sub-segments;
and the fourth extraction module is used for splicing the frequency domain signal sequences of the audio sub-segments corresponding to the frequency sequences to obtain the frequency spectrum characteristic information of the audio to be identified.
9. An electronic device comprising a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor is used for executing the program stored in the memory to realize the steps of the animal emotion recognition method according to any one of claims 1 to 5.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the animal emotion recognition method as recited in any of claims 1-5.
CN202010871315.9A 2020-08-26 2020-08-26 Animal emotion recognition method and device and electronic equipment Pending CN111951812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010871315.9A CN111951812A (en) 2020-08-26 2020-08-26 Animal emotion recognition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010871315.9A CN111951812A (en) 2020-08-26 2020-08-26 Animal emotion recognition method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111951812A true CN111951812A (en) 2020-11-17

Family

ID=73366476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010871315.9A Pending CN111951812A (en) 2020-08-26 2020-08-26 Animal emotion recognition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111951812A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634947A (en) * 2020-12-18 2021-04-09 大连东软信息学院 Animal voice and emotion feature set sequencing and identifying method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050963A (en) * 2014-06-23 2014-09-17 东南大学 Continuous speech emotion prediction algorithm based on emotion data field
CN105280178A (en) * 2014-07-04 2016-01-27 玄舟科技有限公司 audio signal processing device and audio signal processing method thereof
CN108898164A (en) * 2018-06-11 2018-11-27 南京理工大学 A kind of chirping of birds automatic identifying method based on Fusion Features
CN208173243U (en) * 2017-12-20 2018-11-30 华中农业大学 One boar sound signal collecting system
CN109272986A (en) * 2018-08-29 2019-01-25 昆明理工大学 A kind of dog sound sensibility classification method based on artificial neural network
CN109493874A (en) * 2018-11-23 2019-03-19 东北农业大学 A kind of live pig cough sound recognition methods based on convolutional neural networks
CN110826358A (en) * 2018-08-08 2020-02-21 杭州海康威视数字技术股份有限公司 Animal emotion recognition method and device and storage medium
CN111477236A (en) * 2020-05-14 2020-07-31 深聆科技(北京)有限公司 Piglet cry recognition method based on neural network, breeding monitoring method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050963A (en) * 2014-06-23 2014-09-17 东南大学 Continuous speech emotion prediction algorithm based on emotion data field
CN105280178A (en) * 2014-07-04 2016-01-27 玄舟科技有限公司 audio signal processing device and audio signal processing method thereof
CN208173243U (en) * 2017-12-20 2018-11-30 华中农业大学 One boar sound signal collecting system
CN108898164A (en) * 2018-06-11 2018-11-27 南京理工大学 A kind of chirping of birds automatic identifying method based on Fusion Features
CN110826358A (en) * 2018-08-08 2020-02-21 杭州海康威视数字技术股份有限公司 Animal emotion recognition method and device and storage medium
CN109272986A (en) * 2018-08-29 2019-01-25 昆明理工大学 A kind of dog sound sensibility classification method based on artificial neural network
CN109493874A (en) * 2018-11-23 2019-03-19 东北农业大学 A kind of live pig cough sound recognition methods based on convolutional neural networks
CN111477236A (en) * 2020-05-14 2020-07-31 深聆科技(北京)有限公司 Piglet cry recognition method based on neural network, breeding monitoring method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634947A (en) * 2020-12-18 2021-04-09 大连东软信息学院 Animal voice and emotion feature set sequencing and identifying method and system
CN112634947B (en) * 2020-12-18 2023-03-14 大连东软信息学院 Animal voice and emotion feature set sequencing and identifying method and system

Similar Documents

Publication Publication Date Title
US10388279B2 (en) Voice interaction apparatus and voice interaction method
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
US20180122377A1 (en) Voice interaction apparatus and voice interaction method
Han et al. Acoustic classification of Australian anurans based on hybrid spectral-entropy approach
US20180286410A1 (en) Voice data processing method, apparatus and storage medium
CN110853648B (en) Bad voice detection method and device, electronic equipment and storage medium
CN110108992B (en) Cable partial discharge fault identification method and system based on improved random forest algorithm
CN110069781B (en) Entity label identification method and related equipment
JP2020166839A (en) Sentence recommendation method and apparatus based on associated points of interest
CN104409080A (en) Voice end node detection method and device
CN109979485B (en) Audio evaluation method and device
Allen et al. Using self-organizing maps to classify humpback whale song units and quantify their similarity
CN112733549B (en) Patent value information analysis method and device based on multiple semantic fusion
CN113724734B (en) Sound event detection method and device, storage medium and electronic device
CN111444382A (en) Audio processing method and device, computer equipment and storage medium
CN105161116A (en) Method and device for determining climax fragment of multimedia file
CN109817227A (en) A kind of the abnormal sound monitoring method and system of farm
CN111276119A (en) Voice generation method and system and computer equipment
Keen et al. Automated detection of low-frequency rumbles of forest elephants: A critical tool for their conservation
CN111951812A (en) Animal emotion recognition method and device and electronic equipment
Comazzi et al. Acoustic monitoring of golden jackals in Europe: setting the frame for future analyses
Cartwright et al. Tricycle: Audio representation learning from sensor network data using self-supervision
CN110556098B (en) Voice recognition result testing method and device, computer equipment and medium
CN117727308B (en) Mixed bird song recognition method based on deep migration learning
Aodha et al. Towards a general approach for bat echolocation detection and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination