CN103971702A

CN103971702A - Sound monitoring method, device and system

Info

Publication number: CN103971702A
Application number: CN201310332073.6A
Authority: CN
Inventors: 何勇军; 孙广路; 谢怡宁; 刘嘉辉
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2014-08-06

Abstract

The invention provides a sound monitoring method, device and system and relates to the technical field of sound signal processing and sound mode recognition. The method includes a sound training stage and a sound detecting stage. The sound training stage includes the steps of S1, acquiring sound training signals, and extracting sound training features; S2, training sound event models according to the sound training features. The sound detecting stage includes the steps of S3, extracting the features of to-be-detected sounds; S4, judging whether at least one of the sound event models is matched with the features of the to-be-detected sounds or not, if so, judging that a violent event exists, and if not, judging that no violent event exists. The method has the advantages that the sound features of sound signals are extracted and compared with trained sound event models, so that whether violent events exist in an elevator can be known through analyzing, automatic monitoring of the violent events in the elevator is achieved, the monitoring results can be provided in real time, and detecting accuracy can be effectively guaranteed.

Description

Sound monitoring method, Apparatus and system

Technical field

The present invention relates to sound signal processing and mode identification technology, be specifically related to a kind of sound monitoring method, Apparatus and system.

Background technology

Along with the high speed development of modern city, the use of elevator is more and more general and become the requisite vertical transportation instrument of skyscraper, closely related with resident's routine work and life.According to the statistics made by the departments concerned, up to now, the annual requirement of China's elevator has reached 1/3rd of the whole world.Meanwhile, due to elevator relative closure, become offender and implemented malfeasant splendid place, this has brought numerous potential safety hazards for daily life.Increasing offender implements to plunder, kill a person or sexual harassment in elevator, and elevator user's the security of the lives and property in serious threat.Document shows, elevator incident of violence in recent years presents the trend of increasing rapidly, only 2012 1 year, has the elevator crime dramas of placing on record just up to more than 6.2 ten thousand.Therefore, event in elevator is carried out to effectively monitoring and will the discovery of elevator incident of violence, prevention and detection etc. be had to important realistic meaning undoubtedly.

Extensively adopt at present the mode of camera video monitoring to realize effective monitoring to the incident of violence in elevator.

Although obtained certain effect, still existed following problem: the intelligent degree of monitoring is low, depend on Control Room staff's observation or the video of leafing through is found incident of violence.Obviously, this monitor mode will expend a large amount of manpower and materials and people sees that video image surpasses 20 minutes its notices and will obviously decline, and accuracy rate is also had a greatly reduced quality.

Summary of the invention

(1) technical matters solving

For the deficiencies in the prior art, the invention provides a kind of sound monitoring method, Apparatus and system, can automatically realize the monitoring of incident of violence in elevator.

(2) technical scheme

For realizing above object, the present invention is achieved by the following technical programs:

A sound monitoring method, comprises the training sound stage and detects the sound stage,

The described training sound stage comprises step:

S1, obtain training voice signal, extract the training sound characteristic of described training voice signal;

S2, according to described training sound characteristic, training sound event model;

The described detection sound stage comprises step:

S3, obtain voice signal to be detected, extract the sound characteristic to be detected of described voice signal to be detected;

S4, judge the sound event model that whether exists at least one to mate with described sound characteristic to be detected in described sound event model, as be yes, judge and have incident of violence; As be no, judge and do not have incident of violence.

Preferably, in step S1, comprise step:

S11, the described voice signal obtaining is carried out to pre-service;

S12, the voice signal after pretreated is made to discrete Fourier transform (DFT), try to achieve power spectrum;

S13, based on Mel bank of filters, try to achieve the Mel cepstrum coefficient of described power spectrum;

S14, the first order difference of calculating described Mel cepstrum coefficient and second order difference, by the coefficient of described first order difference and second order difference and the splicing of described Mel cepstrum coefficient, form sound characteristic.

Preferably, the pre-service in step S11 comprises minute frame operation and windowing operation;

Wherein, the window function that windowing operation adopts is Hamming window, and expression formula w (n) is:

In formula, n is time sequence number, and L is that window is long;

The expression formula X that asks power spectrum described in step S12 _a(k) be:

X_{a} (k) = {| | Σ_{n = 0}^{N - 1} x (n) e^{- j 2 kπ / N} | |}^{2}, 0 \leq k \leq N

X in formula (n) is the speech frame after windowing, and N represents counting of Fourier transform, and j represents imaginary unit.

Preferably, in step S2, by gauss hybrid models, train sound incident of violence model, the probability density function of described M rank gauss hybrid models is as follows:

P (o | λ) = Σ_{i = 1}^{M} c_{i} P (o | i, λ)

Wherein,

P (o | i, λ) = N (o, μ_{i}, Σ_{i}) = \frac{1}{{(2 π)}^{K / 2} {| Σ_{i} |}^{1 / 2}} \exp {- \frac{{(o - μ_{i})}^{T} Σ_{i}^{- 1} (o - μ_{i})}{2}}

In formula, λ={ c _i, μ _i, Σ _i; (i=1...M) }, μ _ifor mean value vector, Σ _ifor covariance matrix, i=1,2 ..M.Matrix Σ _iadopt diagonal matrix here:

c_{i} = \frac{1}{T} Σ_{i = 1}^{T} P (q_{t} = i | o_{t}, λ)

μ_{i} = \frac{Σ_{t = 1}^{T} P (q_{t} = i | o_{t}, λ) o_{t}}{Σ_{t = 1}^{T} P (q_{t} = i | o_{t}, λ)}

Preferably, step S4 comprises following steps:

S31, supposition sound event model have N, and each sound event model, by a gauss hybrid models modeling, is respectively λ ₁, λ ₂..., λ _n, in the judgement stage, the described sound characteristic to be detected of the observation of input integrates as O={o ₁, o ₂..., o _t, the frame number that T is sound import;

S32, calculate the posterior probability that described sound to be detected is n sound event model, 1≤n≤N;

S33, according to described posterior probability, obtain anticipation result;

S34, according to described anticipation result, obtain final court verdict.

Preferably,

Calculating posterior probability expression formula in step S32 is:

p (λ_{n} | O) = \frac{p (O | λ_{n}) p (λ_{n})}{p (O)}

= \frac{p (O | λ_{n}) p (λ_{n})}{Σ_{m = 1}^{N} p (O | λ_{m}) p (λ_{m})}

P (λ_{n}) = \frac{1}{N}, n = 1,2, N .

In formula, p (λ _n) be the prior probability of n sound event model; P (O) is the probability of described sound characteristic collection O to be detected under all sound event Model Conditions; P (O| λ _n) be the conditional probability that n sound event model produces described sound characteristic collection O to be detected.

Preferably, the calculating anticipation result expression in step S33 is:

n^{*} = \underset{1 \leq n \leq N}{\arg \max} \ln P (λ_{n} | O) = \underset{1 \leq n \leq N}{\arg \max} Σ_{t = 1}^{T} \ln P (λ_{n} | o_{t})

In formula, p (λ _n) be the prior probability of n sound event model; P (O) is the probability of described sound characteristic collection O to be detected under all sound event Model Conditions; P (λ _n| o _t) be o _tresult from λ _nprobability;

Preferably, the calculating court verdict expression formula in step S34 is:

In formula, p (λ _n) be the prior probability of n sound event model; P (O) is the probability of described sound characteristic collection O to be detected under all sound event Model Conditions; for o _tresult from probability; Threshold refuses to know thresholding for default.

The present invention is a kind of sound monitoring device also, comprises with lower module:

Training sound stage module, obtains training voice signal, extracts the training sound characteristic of described training voice signal; According to described training sound characteristic, training sound event model;

Detect sound stage module, obtain voice signal to be detected, extract the sound characteristic to be detected of described voice signal to be detected; Judge the sound event model that whether exists at least one to mate with described sound characteristic to be detected in described sound event model, as be yes, judge and have incident of violence; As be no, judge and do not have incident of violence.

The present invention also provides a kind of sound monitoring system, it is characterized in that, comprises microphone, and multiplexed signal sampling device also comprises sound monitoring device;

Described microphone is installed in elevator, and collected sound signal sends multiplexed signal sampling device to;

Described multiplexed signal sampling device, receives the voice signal that microphone sends, and sends sound monitoring device to;

Described sound monitoring device is processed voice signal.

(3) beneficial effect

The present invention, by a kind of sound monitoring method, Apparatus and system are provided, trains the training sound characteristic of voice signal by extraction, training sound event model; By extracting the sound characteristic to be detected of voice signal to be detected, extracted sound characteristic to be detected and training sound event model are compared, analysis draws in elevator, whether there is incident of violence, realized the automatic monitoring of incident of violence in elevator, provide in real time monitored results, can effectively guarantee the accuracy rate detecting, for next step processing of monitor staff provides foundation.

Equipment of the present invention is compared with the needed industrial camera of video monitoring, and microphone and relevant collecting device thereof have advantage with low cost, are convenient to promote the use of.

Microphone of the present invention is compared and the needed industrial camera of video monitoring, and volume is little, is convenient to be arranged in hiding corner, avoids being subject to offender's destruction, makes watch-dog safer.

Microphone of the present invention is compared and the needed industrial camera of video monitoring, and collection signal is not subject to illumination, block and the impact of the factor such as camouflage, makes monitor mode more stable.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the schematic flow sheet of a kind of sound monitoring method of the preferred embodiment of the present invention;

Fig. 2 is the schematic flow sheet of a kind of sound monitoring method of the preferred embodiment of the present invention;

Fig. 3 is the structural representation of a kind of sound monitoring device of the preferred embodiment of the present invention;

Fig. 4 is the configuration diagram of a kind of sound monitoring system of the preferred embodiment of the present invention.

Embodiment

Under regard to a kind of sound monitoring method, Apparatus and system proposed by the invention, describe in detail in conjunction with the accompanying drawings and embodiments.

Embodiment 1:

As shown in Figure 1, a kind of sound monitoring method, comprises the training sound stage and detects the sound stage,

The described training sound stage comprises step:

S2, according to described training sound characteristic training sound event model;

The described detection sound stage comprises step:

The embodiment of the present invention, by a kind of sound monitoring method is provided, is trained the training sound characteristic of voice signal by extraction, training sound event model; By extracting the sound characteristic to be detected of voice signal to be detected, extracted sound characteristic to be detected and training sound event model are compared, analysis draws in elevator, whether there is incident of violence, realized the automatic monitoring of incident of violence in elevator, provide in real time monitored results, can effectively guarantee the accuracy rate detecting, for next step processing of monitor staff provides foundation.

The embodiment of the present invention is proceeded to detailed elaboration below:

The described training sound stage comprises step:

Preferably, in step S1, comprise step:

S11, the training voice signal obtaining is carried out to pre-service;

In formula, n is time sequence number, and L is that window is long;

Preferably, the expression formula X that asks power spectrum described in step S12 _a(k) be:

X_{a} (k) = {| | Σ_{n = 0}^{N - 1} x (n) e^{- j 2 kπ / N} | |}^{2}, 0 \leq k \leq N

The embodiment of the present invention is trained voice signal for each and is set up a GMM.The probability density function of M rank GMM is as follows:

P (o | λ) = Σ_{i = 1}^{M} P (o | i, λ) = Σ_{i = 1}^{M} c_{i} P (o | i, λ)

Wherein, λ is the parameter set of GMM model; O is the acoustic feature vector of K dimension; I is hidden state number, the sequence number of gaussian component namely, and M rank GMM just has M hidden state; c _ibe the mixed weight-value of i component, its value corresponds to the prior probability of hidden state i, therefore has:

Σ_{i = 1}^{M} c_{i} = 1

P (o|i, λ) is Gaussian mixture components, the observation probability density function of corresponding hidden state i,

Wherein, in step S2, by gauss hybrid models, train sound incident of violence model, the probability density function of described M rank gauss hybrid models is as follows:

P (o | λ) = Σ_{i = 1}^{M} c_{i} P (o | i, λ)

Wherein,

P (o | i, λ) = N (o, μ_{i}, Σ_{i}) = \frac{1}{{(2 π)}^{K / 2} {| Σ_{i} |}^{1 / 2}} \exp {- \frac{{(o - μ_{i})}^{T} Σ_{i}^{- 1} (o - μ_{i})}{2}}

c_{i} = \frac{1}{T} Σ_{i = 1}^{T} P (q_{t} = i | o_{t}, λ)

μ_{i} = \frac{Σ_{t = 1}^{T} P (q_{t} = i | o_{t}, λ) o_{t}}{Σ_{t = 1}^{T} P (q_{t} = i | o_{t}, λ)}

The described detection sound stage comprises step:

Preferably, in step S3, comprise step:

S11 ', the voice signal to be detected obtaining is carried out to pre-service;

Preferably, the pre-service in step S11 ' comprises minute frame operation and windowing operation;

Wherein, divide the object of frame to be time signal to be divided into overlapping voice snippet, i.e. frame mutually.Every frame length is generally 30ms left and right, and frame moves as 10ms.

In formula, n is time sequence number, and L is that window is long;

S12 ', the voice signal after pretreated is made to discrete Fourier transform (DFT), try to achieve power spectrum;

Preferably, the expression formula X that asks power spectrum described in step S12 ' _a(k) be:

X_{a} (k) = {| | Σ_{n = 0}^{N - 1} x (n) e^{- j 2 kπ / N} | |}^{2}, 0 \leq k \leq N

S13 ', based on Mel bank of filters, try to achieve the Mel cepstrum coefficient of described power spectrum;

A bank of filters (number of wave filter is close with the number of critical band) that has M wave filter of embodiment of the present invention definition, the wave filter of employing is triangular filter, centre frequency is f (m), m=0,2 ..., M-1, the embodiment of the present invention is got M=28.In bank of filters, the span of each triangular filter equates in Mel scale, and the frequency response of triangular filter is defined as:

H_{m} (k) = \{\begin{matrix} 0 & k < f (m - 1) ork > f (m + 1) \\ \frac{2 (k - f (m - 1))}{(f (m + 1) - f (m - 1)) (f (m) - f (m - 1))} & f (m - 1) < k < f (m) \\ \frac{2 (f (m + 1) - k)}{(f (m + 1) - f (m - 1)) (f (m + 1) - f (m))} & f (m) \leq k \leq f (m + 1) \end{matrix}

Next power spectrum is added to Mel bank of filters:

S (m) = \ln (Σ_{k = 0}^{N - 1} {| X_{a} (k) |}^{2} H_{m} (k)), 0 \leq m \leq M

Then do discrete cosine transform (DCT) and obtain Mel cepstrum coefficient:

c (n) = Σ_{m = 0}^{M - 1} S (m) \cos (nπ (m - 0.5) / M), 0 \leq n \leq M

S14 ', the first order difference of calculating described Mel cepstrum coefficient and second order difference, by the coefficient of described first order difference and second order difference and the splicing of described Mel cepstrum coefficient, form sound characteristic.

If t and t+1 cepstrum vector is constantly c _tand c _t+1,

The computing method of first order difference are:

Δc _t＝c _t+1-c _t

Second order difference is:

ΔΔc _t＝Δc _t+1-Δc _t

Spliced phonetic feature is:

[c _tΔc _tΔΔc _t]

Preferably, step S4 comprises following steps:

S33, according to described posterior probability, obtain anticipation result;

S34, according to described anticipation result, obtain final court verdict.

Preferably, the calculating posterior probability expression formula in step S32 is:

p (λ_{n} | O) = \frac{p (O | λ_{n}) p (λ_{n})}{p (O)}

= \frac{p (O | λ_{n}) p (λ_{n})}{Σ_{m = 1}^{N} p (O | λ_{m}) p (λ_{m})}

P (λ_{n}) = \frac{1}{N}, n = 1,2, N .

Preferably, the calculating anticipation result expression in step S33 is:

n^{*} = \underset{1 \leq n \leq N}{\arg \max} \ln P (λ_{n} | O) = \underset{1 \leq n \leq N}{\arg \max} Σ_{t = 1}^{T} \ln P (λ_{n} | o_{t})

Preferably, the calculating court verdict expression formula in step S34 is:

Embodiment 2:

As shown in Figure 3, a kind of sound monitoring device, is characterized in that, comprises with lower module:

Embodiment 3:

As shown in Figure 4, a kind of sound monitoring system, is characterized in that, comprises microphone, and multiplexed signal sampling device also comprises sound monitoring device as described in example 2 above;

Described sound monitoring device is processed voice signal.

To sum up, the embodiment of the present invention, by a kind of sound monitoring method, Apparatus and system are provided, is trained the training sound characteristic of voice signal by extraction, training sound event model; By extracting the sound characteristic to be detected of voice signal to be detected, extracted sound characteristic to be detected and training sound event model are compared, analysis draws in elevator, whether there is incident of violence, realized the automatic monitoring of incident of violence in elevator, provide in real time monitored results, can effectively guarantee the accuracy rate detecting, for next step processing of monitor staff provides foundation.

The equipment that the embodiment of the present invention adopts is compared with the needed industrial camera of video monitoring, and microphone and relevant collecting device thereof have advantage with low cost, are convenient to promote the use of.

The microphone that the embodiment of the present invention adopts is compared and the needed industrial camera of video monitoring, and volume is little, is convenient to be arranged in hiding corner, avoids being subject to offender's destruction, makes watch-dog safer.

The microphone that the embodiment of the present invention adopts is compared and the needed industrial camera of video monitoring, and collection signal is not subject to illumination, block and the impact of the factor such as camouflage, makes monitor mode more stable.

It should be noted that, in this article, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the equipment that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or equipment.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises described key element and also have other identical element.

Above embodiment only, in order to technical scheme of the present invention to be described, is not intended to limit; Although the present invention is had been described in detail with reference to previous embodiment, those of ordinary skill in the art is to be understood that: its technical scheme that still can record aforementioned each embodiment is modified, or part technical characterictic is wherein equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a sound monitoring method, is characterized in that, comprise the training sound stage and detect the sound stage,

The described training sound stage comprises step:

The described detection sound stage comprises step:

2. sound monitoring method as claimed in claim 1, is characterized in that, comprises step in step S1 or step S3:

S11, the voice signal obtaining is carried out to pre-service;

3. sound monitoring method as claimed in claim 2, is characterized in that,

Pre-service in step S11 comprises minute frame operation and windowing operation;

In formula, n is time sequence number, and L is that window is long;

4. sound monitoring method as claimed in claim 1, is characterized in that, in step S2, by gauss hybrid models, trains sound incident of violence model, and the probability density function of described M rank gauss hybrid models is as follows:

Wherein,

5. sound monitoring method as claimed in claim 1, is characterized in that, step S4 comprises following steps:

S33, according to described posterior probability, obtain anticipation result;

S34, according to described anticipation result, obtain final court verdict.

6. sound monitoring method as claimed in claim 5, is characterized in that,

Calculating posterior probability expression formula in step S32 is:

7. sound monitoring method as claimed in claim 5, is characterized in that,

Calculating anticipation result expression in step S33 is:

In formula, p (λ _n) be the prior probability of n sound event model; P (O) is the probability of described sound characteristic collection O to be detected under all sound event Model Conditions; P (λ _n| o _t) be o _tresult from λ _nprobability.

8. sound monitoring method as claimed in claim 5, is characterized in that,

Calculating court verdict expression formula in step S34 is:

9. a sound monitoring device, is characterized in that, comprises with lower module:

10. a sound monitoring system, is characterized in that, comprises microphone, and multiplexed signal sampling device also comprises sound monitoring device as claimed in claim 9;

Described sound monitoring device is processed voice signal.