CN111462766B - Auditory pulse coding method and system based on sparse coding - Google Patents

Auditory pulse coding method and system based on sparse coding Download PDF

Info

Publication number
CN111462766B
CN111462766B CN202010273268.8A CN202010273268A CN111462766B CN 111462766 B CN111462766 B CN 111462766B CN 202010273268 A CN202010273268 A CN 202010273268A CN 111462766 B CN111462766 B CN 111462766B
Authority
CN
China
Prior art keywords
coded
signal
coding
sound signal
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010273268.8A
Other languages
Chinese (zh)
Other versions
CN111462766A (en
Inventor
唐华锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010273268.8A priority Critical patent/CN111462766B/en
Publication of CN111462766A publication Critical patent/CN111462766A/en
Application granted granted Critical
Publication of CN111462766B publication Critical patent/CN111462766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • G10L19/107Sparse pulse excitation, e.g. by using algebraic codebook

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to an auditory pulse coding method and system based on sparse coding. The method comprises the following steps: constructing a kernel set capable of expressing sound basic elements; acquiring a sound signal to be coded; preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded; according to the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded; mapping each of the sparse codes to an auditory impulse code. The coding of the auditory pulse mode generated by the invention can be suitable for a pulse neural network, and can ensure high coding efficiency and high coding fidelity.

Description

Auditory pulse coding method and system based on sparse coding
Technical Field
The invention relates to the field of sound processing, in particular to an auditory pulse coding method and system based on sparse coding.
Background
Sound structures in nature have non-static and time-dependent properties, such as transients, temporal relationships between acoustic events, and harmonic periodicity, among others. In sound localization, the human subject can reliably detect an interaural time difference of less than 10 μ s, which corresponds to a binaural sound source offset of about 1 degree. In contrast, the sampling interval of the audio CD sampled at 44.1kHz is 22.7 μ s. Studies have shown that some sound cues, such as the onset and offset of sound events, harmonic coordination modulation, and sound source localization, all depend on accurate time information. Therefore, it is a very important thing to extract sound structure features containing accurate time information from the nature. However, this presents a number of challenges, as in the natural acoustic environment, with the various sound sources and background noise, sound events cannot be directly observed and must be inferred using a number of ambiguous cues.
Most conventional sound feature expressions, such as Discrete Wavelet Transform (Discrete Wavelet Transform), Perceptual linear Prediction (Perceptual linear Prediction), Mel-Frequency Cepstral Coefficients (Mel-Frequency Cepstral coeffients), etc., are based on time blocks, i.e., signals are segmented in a series of Discrete blocks. Transient and non-stationary periods in the signal may be temporally smeared between blocks, which may cause the precise representation of the sound event to be greatly shifted by any alignment of the blocks. Therefore, when processing acoustic tasks requiring time sensitivity, the conventional sound expression method has the disadvantage of sound event information deviation.
Disclosure of Invention
The invention aims to provide an auditory pulse coding method and system based on sparse coding so as to improve the sound coding efficiency and fidelity.
In order to achieve the purpose, the invention provides the following scheme:
a sparse coding based auditory pulse coding method, the method comprising:
constructing a kernel set capable of expressing sound basic elements;
acquiring a sound signal to be coded;
preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded;
according to the kernel function group and the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded;
mapping each of the sparse codes to an auditory impulse code.
Optionally, the constructing a set of kernels that can express the sound basic elements specifically includes:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and constructing a set of gamma functions with various center frequencies according to the center frequency set.
Optionally, the preprocessing the sound signal to be encoded to obtain a preprocessed sound signal to be encoded specifically includes:
judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal;
determining the maximum absolute value of the single sound channel signal according to the single sound channel signal;
dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded;
and dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
Optionally, the obtaining a plurality of sparse codes of the preprocessed sound signals to be coded by using a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded specifically includes:
obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code; the maximum value is the encoded value of the encoding;
adding the code to a code table;
multiplying the code value of each code in the code table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals;
superposing the plurality of coded short signals according to the time position corresponding to each coded short signal to form a reconstructed signal;
subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal;
according to the residual signal, obtaining the quotient of the length of the residual signal and the length of the sound signal to be coded;
judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result;
if the second judgment result indicates that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be coded at all time positions;
and if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
Optionally, the mapping each sparse code to an auditory pulse code specifically includes:
obtaining the maximum value of all the coding values in the coding table;
obtaining a plurality of equally spaced distribution values within a natural index range from 0 to the maximum value; each distribution value corresponds to an intensity level;
numbering the intensity levels in sequence according to the distribution values;
acquiring the intensity level of each code in the code table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
mapping each of said codes to an impulse event; the occurrence time of the pulse event is each coded time position, and the pulse sequence position to which the pulse event belongs is L ═ m-1 x n + S;
all pulse events constitute an auditory pulse pattern;
wherein L is a pulse sequence position to which the pulse event belongs, m is a kernel function index of each code, n is a total number of intensity levels, and S is an intensity level of each code.
A sparse coding based auditory pulse coding system, the system comprising:
a kernel function group construction unit for constructing a kernel function group capable of expressing basic elements of sound;
the device comprises a to-be-coded sound signal acquisition unit, a coding unit and a coding unit, wherein the to-be-coded sound signal acquisition unit is used for acquiring a sound signal to be coded;
the pre-processed sound signal to be coded acquiring unit is used for pre-processing the sound signal to be coded to acquire a pre-processed sound signal to be coded;
the sparse code acquisition unit is used for acquiring sparse codes of a plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded;
an auditory pulse code acquisition unit for mapping each of the sparse codes to an auditory pulse code.
Optionally, the kernel function set constructing unit specifically includes:
the central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
Optionally, the pre-processed sound signal to be encoded obtaining unit specifically includes:
the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
a monaural signal obtaining subunit, configured to, if the first determination result indicates that the sound signal to be encoded is a multi-channel signal, average signals of all channels in the multi-channel signal to obtain a monaural signal;
a monaural signal maximum value obtaining subunit, configured to determine an absolute value maximum value of the monaural signal according to the monaural signal;
the preprocessing sound signal to be coded determining subunit is used for dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
a sound signal to be encoded maximum value obtaining subunit, configured to obtain an absolute value maximum value of the sound signal to be encoded if the first determination result indicates that the sound signal to be encoded is not a multi-channel signal;
and the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
Optionally, the sparse coding acquisition unit specifically includes:
an inner product multi-group value obtaining subunit, configured to obtain a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be encoded at all time positions;
an inner product maximum value obtaining subunit configured to obtain a maximum value of the plurality of values;
the code acquisition subunit is used for forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is the encoded value of the encoding;
an encoding table acquisition subunit, configured to add the encoding to an encoding table;
the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals;
the reconstructed signal determining subunit is configured to superimpose the plurality of encoded short signals according to a time position corresponding to each encoded short signal to form a reconstructed signal;
a residual signal determining subunit, configured to perform a difference between the preprocessed to-be-encoded sound signal and the reconstructed signal to obtain a residual signal;
a quotient obtaining subunit, configured to obtain, according to the residual signal, a quotient of a length of the residual signal and a length of the sound signal to be encoded;
a quotient judgment result obtaining subunit, configured to judge whether the quotient is smaller than a preset quotient threshold value, and obtain a second judgment result;
a to-be-coded sound signal obtaining subunit, configured to, if the second determination result indicates that the quotient is not smaller than the preset quotient threshold, use the residual signal as a preprocessed to-be-coded sound signal, and return the preprocessed to the inner-product multi-group value obtaining subunit;
and the coding table output subunit is configured to output the coding table if the second determination result indicates that the quotient is smaller than the preset quotient threshold.
Optionally, the auditory pulse code acquiring unit specifically includes:
the maximum value acquisition subunit of all time positions is used for acquiring the maximum values of all the coding values in the coding table;
an equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to the maximum value; each distribution value corresponds to an intensity level;
an intensity level numbering and determining subunit, configured to number the intensity levels in sequence according to the size of the distribution value;
a coding strength level acquiring subunit, configured to acquire a strength level of each code in the coding table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
a pulse event acquiring subunit, configured to map each of the codes into a pulse event; the occurrence time of the pulse event is each coded time position, and the pulse sequence position to which the pulse event belongs is L ═ m-1 x n + S;
an auditory pulse pattern acquisition subunit, configured to construct an auditory pulse pattern from all pulse events;
wherein L is a pulse sequence position to which the pulse event belongs, m is a kernel function index of each code, n is a total number of intensity levels, and S is an intensity level of each code.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprises the steps of firstly constructing a kernel function group capable of expressing basic elements of sound, preprocessing sound signals to be coded, adopting a time sequence matching tracking algorithm according to the kernel function group and the preprocessed sound signals to be coded, obtaining sparse codes of a plurality of preprocessed sound signals to be coded, namely decomposing sound into a plurality of combinations of kernel functions with different coefficients and different time points, being capable of maximally retaining information of original signals, minimizing required computing resources and having high coding efficiency; and finally mapping each sparse code into a pulse event, wherein all the pulse events form an auditory pulse mode, and the occurrence time of each pulse event is the time position of each code, so that the extracted sound characteristics have accurate time information, and the coded sound signals have extremely high fidelity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a sparse coding-based auditory pulse coding method provided by the present invention;
FIG. 2 is a schematic diagram of a sparse coding-based auditory pulse coding method according to the present invention;
FIG. 3 is a schematic diagram of a gamma function provided by the present invention;
FIG. 4 is a schematic diagram of sparse coding mapping to auditory pulse coding provided by the present invention;
FIG. 5 is a block diagram of a sparse coding based auditory pulse coding system provided by the present invention;
description of the symbols:
the method comprises the steps of 1-kernel function group construction unit, 2-to-be-coded sound signal acquisition unit, 3-preprocessed to-be-coded sound signal acquisition unit, 4-sparse coding acquisition unit and 5-auditory pulse coding acquisition unit.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an auditory pulse coding method and system based on sparse coding so as to improve the sound coding efficiency and fidelity.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of an auditory pulse encoding method based on sparse coding according to the present invention. As shown in fig. 1, a sparse coding-based auditory pulse coding method includes:
s101, constructing a kernel set capable of expressing sound basic elements, specifically comprising:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the center frequency group includes a plurality of center frequencies, and each of the center frequencies has a different value. Each center frequency is in the range of 20Hz to 8000 Hz.
From the set of center frequencies, a set of gamma functions with various center frequencies is constructed, as shown in FIG. 3. Specifically, the central frequency in the central frequency group is used as an input quantity and is input into a time domain expression of a gamma atone function, so as to obtain an output time domain expression (discrete time signal) of the gamma atone filter, and the time domain expression of the gamma atone filter can be understood as a one-dimensional time-varying vector.
The set of kernels corresponds to the set of kernels Φ of FIG. 2, and the various center frequencies in the set of kernels correspond to those of FIG. 2
Figure BDA0002443883100000081
And
Figure BDA0002443883100000082
s102, acquiring a sound signal to be coded, such as the sound signal shown in FIG. 2.
S103, preprocessing the sound signal to be encoded to obtain a preprocessed sound signal to be encoded, which specifically includes:
and judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result.
And if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal.
The maximum value of the absolute value of the monophonic signal is determined from the monophonic signal.
And dividing the mono signal by the maximum absolute value of the mono signal to obtain the preprocessed sound signal to be coded.
And if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded.
And dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
S104, according to the kernel function group and the preprocessed to-be-coded sound signals, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed to-be-coded sound signals, and the method specifically comprises the following steps:
acquiring a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is a coded value of the code;
the code is added to the code table. The coding table corresponds to the coding information table of fig. 2. The code value corresponds to s in the code information table of fig. 2, the time position corresponds to τ in the code information table of fig. 2, and the kernel index corresponds to m in the code information table of fig. 2.
And multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals. The encoded short signal is the short signal in fig. 2.
And superposing the plurality of coded short signals according to the corresponding time position of each coded short signal to form a reconstructed signal. The reconstructed signal is the reconstructed signal in fig. 2
Figure BDA0002443883100000091
And subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal. The residual signal is epsilon (t) in fig. 2.
From the residual signal, a quotient of the length of the residual signal and the length of the sound signal to be encoded is obtained.
And judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result. The preset quotient threshold is preset according to actual conditions.
And if the second judgment result shows that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of groups of values of the inner product of the kernel function group and the preprocessed sound signal to be coded at each time position. One time position corresponds to a set of values ".
And if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
S105, mapping each sparse code to an auditory pulse code, as shown in fig. 4, specifically including:
the maximum value of all the encoding values in the encoding table is obtained.
A plurality of equally spaced distribution values in a natural index range from 0 to a maximum value are obtained. Each distribution value corresponds to an intensity level.
And numbering the intensity levels in sequence according to the size of the distribution value.
The intensity level of each code in the code table is obtained. And the difference value between the natural index value of the distribution value corresponding to the strength level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value.
Each code is mapped to a pulse event. The occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 x n + S.
All pulse events constitute an auditory pulse pattern.
Wherein, L is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
The obtained auditory pulse mode can be directly input into any pulse neural network for processing.
The invention provides an auditory nerve coding algorithm based on a Temporal Matching Pursuit (TMP), and compared with the original traditional sound feature expression mode, the auditory nerve coding algorithm based on the TMP has a plurality of advantages that:
(1) time sensitivity: unlike the conventional time block-based sound feature extraction technology, this encoding method can extract sound features having accurate time information. This feature allows the encoded sound signal to have very high fidelity and is well suited to impulse neural networks that require accurate impulses as inputs.
(2) High efficiency: the sound is decomposed into a plurality of combinations of kernel functions with different coefficients and different time points by utilizing a nonlinear signal decomposition method, so that the information of the original signal can be maximally reserved, the required computing resources are minimized, and the energy consumption is reduced.
(3) Robustness: spiketrum exhibits natural information robustness to noise and self-loss of coding. The information robustness comes from the global greedy selection of kernel functions at different moments in the sound decomposition process, namely the maximum projection principle. Thus, when the pulse frequency is reduced, spikerum will look for and preferentially discard the atom with the least amount of information for the reconstructed signal under the global input signal time window.
The invention also provides a sparse coding-based auditory pulse coding system corresponding to a sparse coding-based auditory pulse coding method, as shown in fig. 5, the system comprises: the method comprises a kernel function group construction unit 1, a sound signal to be coded acquisition unit 2, a preprocessed sound signal to be coded acquisition unit 32, a sparse coding acquisition unit 4 and an auditory pulse coding acquisition unit 5.
A kernel function group constructing unit 1, configured to construct a kernel function group that can express the sound basic elements.
And the sound signal to be coded acquiring unit 2 is used for acquiring the sound signal to be coded.
The preprocessed to-be-coded sound signal obtaining unit 32 is configured to preprocess the to-be-coded sound signal to obtain a preprocessed to-be-coded sound signal.
And the sparse code acquisition unit 4 is configured to acquire sparse codes of the plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded.
An auditory pulse code acquisition unit 5 for mapping each sparse code to an auditory pulse code.
The kernel function group construction unit 1 specifically includes: a center frequency group acquisition subunit and a gamma function acquisition subunit.
The central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the center frequency group includes a plurality of center frequencies, and each of the center frequencies has a different value.
And the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
The pre-processed sound signal to be encoded obtaining unit 32 specifically includes: the device comprises a sound signal judgment result acquisition subunit, a single sound channel signal maximum value acquisition subunit, a preprocessing sound signal to be coded determination subunit, a sound signal maximum value acquisition subunit to be coded and a preprocessed sound signal to be coded acquisition subunit.
And the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result.
And the single-channel signal acquisition subunit is used for averaging the signals of all the channels in the multi-channel signal to obtain a single-channel signal if the first judgment result indicates that the sound signal to be encoded is a multi-channel signal.
And the monaural signal maximum value acquisition subunit is used for determining the maximum value of the absolute value of the monaural signal according to the monaural signal.
And the preprocessing sound signal to be coded determining subunit is used for dividing the monaural signal by the maximum absolute value of the monaural signal to obtain the preprocessed sound signal to be coded.
And the maximum value acquiring subunit is used for acquiring the maximum value of the absolute value of the sound signal to be coded if the first judgment result indicates that the sound signal to be coded is not a multi-channel signal.
And the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
The sparse code acquisition unit 4 specifically includes: the device comprises an inner product multi-group value acquisition subunit, an inner product maximum value acquisition subunit, a code table acquisition subunit, a code short signal acquisition subunit, a reconstructed signal determination subunit, a residual signal determination subunit, a quotient acquisition subunit, a quotient judgment result acquisition subunit, a to-be-coded sound signal acquisition subunit and a code table output subunit.
And the inner product multi-group value acquisition subunit is used for acquiring a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions.
And the inner product maximum value acquisition subunit is used for acquiring the maximum value of the plurality of values.
And the code acquisition subunit is used for combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code. The maximum value is the encoded value of the encoding.
And the coding table acquisition subunit is used for adding the codes into the coding table.
And the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals.
And the reconstructed signal determining subunit is used for superposing the plurality of encoded short signals according to the time position corresponding to each encoded short signal to form a reconstructed signal.
And the residual signal determining subunit is used for subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal.
And the quotient acquisition subunit is used for acquiring the quotient of the length of the residual signal and the length of the sound signal to be coded according to the residual signal.
And the quotient judgment result acquisition subunit is used for judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result.
And the sound signal to be coded acquiring subunit is used for taking the residual signal as the sound signal to be coded after the preprocessing if the second judgment result shows that the quotient is not less than the preset quotient threshold value, and returning to the inner product multi-group value acquiring subunit.
And the coding table output subunit is used for outputting the coding table if the second judgment result shows that the quotient is smaller than the preset quotient threshold.
The auditory pulse code acquiring unit 5 specifically includes: the device comprises an all-time-position inner product maximum value acquisition subunit, an equal-spacing distribution value acquisition subunit, an intensity level number determination subunit, a coding intensity level acquisition subunit, a pulse event acquisition subunit and an auditory pulse mode acquisition subunit.
And the maximum value acquisition subunit of all time positions is used for acquiring the maximum value of all the code values in the code table.
An equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to a maximum value. Each distribution value corresponds to an intensity level.
And the strength grade number determining subunit is used for sequentially numbering the strength grades according to the size of the distribution value.
And the coding strength level acquisition subunit is used for acquiring the strength level of each code in the coding table. And the difference value between the natural index value of the distribution value corresponding to the strength level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value.
And the pulse event acquisition subunit is used for mapping each code into a pulse event. The occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 x n + S.
An auditory pulse pattern acquisition subunit for all pulse events constituting an auditory pulse pattern.
Wherein, L is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
Firstly, setting a group of kernel functions capable of expressing auditory basic characteristics as a dictionary, taking a single-channel sound signal with any length and type as input, continuously searching a part which is most similar to the dictionary in the signal by using a matching tracking algorithm, removing the part from an original signal, recording the time position of the part, corresponding dictionary element indexes and corresponding grades of similar strength; the above process is repeated until the sum of the squares of the sound signals is less than a certain threshold. The sparse set of codes thus obtained can be regarded as a binary set of pulse event sequences, the time position of each code represents the occurrence time of its corresponding pulse event, and the intensity level and the element index determine the pulse sequence position where it is located. The auditory pulse code generated by the invention can be suitable for a pulse neural network, and can ensure high coding efficiency and high coding fidelity.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A sparse coding based auditory pulse coding method, the method comprising:
constructing a kernel set capable of expressing sound basic elements;
acquiring a sound signal to be coded;
preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded;
according to the kernel function group and the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded;
mapping each of the sparse codes to an auditory impulse code;
the method for obtaining the sparse codes of the preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the kernel function group and the preprocessed sound signals to be coded specifically comprises the following steps:
obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code; the maximum value is the encoded value of the encoding;
adding the code to a code table;
multiplying the code value of each code in the code table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals;
superposing the plurality of coded short signals according to the time position corresponding to each coded short signal to form a reconstructed signal;
subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal;
according to the residual signal, obtaining the quotient of the length of the residual signal and the length of the sound signal to be coded;
judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result;
if the second judgment result indicates that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be coded at all time positions;
and if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
2. An auditory pulse coding method based on sparse coding according to claim 1, wherein the constructing a set of kernels that can express basic elements of sound comprises:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and constructing a set of gamma functions with various center frequencies according to the center frequency set.
3. An auditory pulse coding method based on sparse coding according to claim 1, wherein the preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded specifically comprises:
judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal;
determining the maximum absolute value of the single sound channel signal according to the single sound channel signal;
dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded;
and dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
4. The sparse-coding-based auditory pulse coding method according to claim 1, wherein the mapping each of the sparse codes to an auditory pulse code specifically comprises:
obtaining the maximum value of all the coding values in the coding table;
obtaining a plurality of equally spaced distribution values within a natural index range from 0 to the maximum value; each distribution value corresponds to an intensity level;
numbering the intensity levels in sequence according to the distribution values;
acquiring the intensity level of each code in the code table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
mapping each of said codes to an impulse event; the occurrence time of the pulse event is each coded time position, and the pulse sequence position to which the pulse event belongs isL=(m-1)×n+S
All pulse events constitute an auditory pulse pattern;
wherein,Lis the pulse sequence position to which the pulse event belongs,mfor each of the encoded kernel function indices,nis the total number of intensity levels that are,Sfor each of saidThe strength level of the encoding.
5. A sparse coding based auditory pulse coding system, the system comprising:
a kernel function group construction unit for constructing a kernel function group capable of expressing basic elements of sound;
the device comprises a to-be-coded sound signal acquisition unit, a coding unit and a coding unit, wherein the to-be-coded sound signal acquisition unit is used for acquiring a sound signal to be coded;
the pre-processed sound signal to be coded acquiring unit is used for pre-processing the sound signal to be coded to acquire a pre-processed sound signal to be coded;
the sparse code acquisition unit is used for acquiring sparse codes of a plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the kernel function group and the preprocessed sound signals to be coded;
an auditory pulse code acquisition unit for mapping each of the sparse codes to an auditory pulse code;
the sparse code acquisition unit specifically includes:
an inner product multi-group value obtaining subunit, configured to obtain a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be encoded at all time positions;
an inner product maximum value obtaining subunit configured to obtain a maximum value of the plurality of values;
the code acquisition subunit is used for forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is the encoded value of the encoding;
an encoding table acquisition subunit, configured to add the encoding to an encoding table;
the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals;
the reconstructed signal determining subunit is configured to superimpose the plurality of encoded short signals according to a time position corresponding to each encoded short signal to form a reconstructed signal;
a residual signal determining subunit, configured to perform a difference between the preprocessed to-be-encoded sound signal and the reconstructed signal to obtain a residual signal;
a quotient obtaining subunit, configured to obtain, according to the residual signal, a quotient of a length of the residual signal and a length of the sound signal to be encoded;
a quotient judgment result obtaining subunit, configured to judge whether the quotient is smaller than a preset quotient threshold value, and obtain a second judgment result;
a to-be-coded sound signal obtaining subunit, configured to, if the second determination result indicates that the quotient is not smaller than the preset quotient threshold, use the residual signal as a preprocessed to-be-coded sound signal, and return the preprocessed to the inner-product multi-group value obtaining subunit;
and the coding table output subunit is configured to output the coding table if the second determination result indicates that the quotient is smaller than the preset quotient threshold.
6. The sparse coding-based auditory pulse coding system of claim 5, wherein the set of kernels constructing unit specifically comprises:
the central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
7. An auditory pulse coding system based on sparse coding according to claim 5, wherein the pre-processed sound signal to be coded acquisition unit specifically comprises:
the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
a monaural signal obtaining subunit, configured to, if the first determination result indicates that the sound signal to be encoded is a multi-channel signal, average signals of all channels in the multi-channel signal to obtain a monaural signal;
a monaural signal maximum value obtaining subunit, configured to determine an absolute value maximum value of the monaural signal according to the monaural signal;
the preprocessing sound signal to be coded determining subunit is used for dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
a sound signal to be encoded maximum value obtaining subunit, configured to obtain an absolute value maximum value of the sound signal to be encoded if the first determination result indicates that the sound signal to be encoded is not a multi-channel signal;
and the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
8. An auditory pulse code system based on sparse coding according to claim 5, wherein the auditory pulse code acquisition unit specifically comprises:
the maximum value acquisition subunit of all time positions is used for acquiring the maximum values of all the coding values in the coding table;
an equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to the maximum value; each distribution value corresponds to an intensity level;
an intensity level numbering and determining subunit, configured to number the intensity levels in sequence according to the size of the distribution value;
a coding strength level acquiring subunit, configured to acquire a strength level of each code in the coding table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
a pulse event acquiring subunit, configured to map each of the codes into a pulse event; the occurrence time of the pulse event is each coded time position, and the pulse sequence position to which the pulse event belongs isL=(m-1)×n+S
An auditory pulse pattern acquisition subunit, configured to construct an auditory pulse pattern from all pulse events;
wherein,Lis the pulse sequence position to which the pulse event belongs,mfor each of the encoded kernel function indices,nis the total number of intensity levels that are,Sfor each of said encoded intensity levels.
CN202010273268.8A 2020-04-09 2020-04-09 Auditory pulse coding method and system based on sparse coding Active CN111462766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010273268.8A CN111462766B (en) 2020-04-09 2020-04-09 Auditory pulse coding method and system based on sparse coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273268.8A CN111462766B (en) 2020-04-09 2020-04-09 Auditory pulse coding method and system based on sparse coding

Publications (2)

Publication Number Publication Date
CN111462766A CN111462766A (en) 2020-07-28
CN111462766B true CN111462766B (en) 2022-04-26

Family

ID=71683706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273268.8A Active CN111462766B (en) 2020-04-09 2020-04-09 Auditory pulse coding method and system based on sparse coding

Country Status (1)

Country Link
CN (1) CN111462766B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113049080A (en) * 2021-03-08 2021-06-29 中国电子科技集团公司第三十六研究所 GDWC auditory feature extraction method for ship radiation noise

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200830770A (en) * 2007-01-05 2008-07-16 Univ Nat Chiao Tung A joint channel estimation and data detection method for STBC/OFDM systems
CN103177265A (en) * 2013-03-25 2013-06-26 中山大学 High-definition image classification method based on kernel function and sparse coding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2560075A1 (en) * 2003-03-28 2004-10-07 Digital Accelerator Corporation Overcomplete basis transform-based motion residual frame coding method and apparatus for video compression
KR101090893B1 (en) * 2010-03-15 2011-12-08 한국과학기술연구원 Sound source localization system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200830770A (en) * 2007-01-05 2008-07-16 Univ Nat Chiao Tung A joint channel estimation and data detection method for STBC/OFDM systems
CN103177265A (en) * 2013-03-25 2013-06-26 中山大学 High-definition image classification method based on kernel function and sparse coding

Also Published As

Publication number Publication date
CN111462766A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
EP1941493B1 (en) Content-based audio comparisons
EP3469584B1 (en) Neural decoding of attentional selection in multi-speaker environments
Liutkus et al. Informed source separation through spectrogram coding and data embedding
CN103730131B (en) The method and apparatus of speech quality evaluation
CN1860526B (en) Encoding audio signals
van de Par et al. A perceptual model for sinusoidal audio coding based on spectral integration
CN106373583B (en) Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM
JP4538324B2 (en) Audio signal encoding
CN111696580B (en) Voice detection method and device, electronic equipment and storage medium
CN111462766B (en) Auditory pulse coding method and system based on sparse coding
Ideli et al. Visually assisted time-domain speech enhancement
CN105741853B (en) A kind of digital speech perceptual hash method based on formant frequency
JP4496378B2 (en) Restoration method of target speech based on speech segment detection under stationary noise
CN117238311A (en) Speech separation enhancement method and system in multi-sound source and noise environment
Suied et al. Auditory sketches: sparse representations of sounds based on perceptual models
Liu et al. Event-driven processing for hardware-efficient neural spike sorting
Derrien Detection of genuine lossless audio files: Application to the MPEG-AAC codec
Yost Overview: psychoacoustics
Ballesteros L et al. On the ability of adaptation of speech signals and data hiding
CN116110373B (en) Voice data acquisition method and related device of intelligent conference system
Balazs et al. Introducing time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking
CN101504835B (en) Measurement method for spacial sensed information content in acoustic field and application thereof
Xu et al. An improved pitch detection of speech combined with speech enhancement
CN118197346A (en) Brain-controlled speaker extraction method and system based on multi-scale voice-brain-electricity fusion
CN116168716A (en) Optical fiber voice enhancement method, system, medium and equipment based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant