WO2017163572A1 - Playback apparatus and playback method - Google Patents

Playback apparatus and playback method Download PDF

Info

Publication number
WO2017163572A1
WO2017163572A1 PCT/JP2017/001957 JP2017001957W WO2017163572A1 WO 2017163572 A1 WO2017163572 A1 WO 2017163572A1 JP 2017001957 W JP2017001957 W JP 2017001957W WO 2017163572 A1 WO2017163572 A1 WO 2017163572A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
unsteady
collected
playback
Prior art date
Application number
PCT/JP2017/001957
Other languages
French (fr)
Japanese (ja)
Inventor
正也 小西
村田 寿子
優美 藤井
敬洋 下条
敬介 小田
Original Assignee
株式会社Jvcケンウッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Jvcケンウッド filed Critical 株式会社Jvcケンウッド
Publication of WO2017163572A1 publication Critical patent/WO2017163572A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems

Definitions

  • the present invention relates to a playback device and a playback method.
  • a portable music player such as a digital audio player can listen to music while jogging outdoors.
  • inconvenience may occur if surrounding sounds such as announcement sounds cannot be heard.
  • Patent Document 1 discloses a portable device that uses headphones to deliver ambient sounds to the user's ears while listening to music with headphones.
  • ambient sounds are collected by a microphone.
  • the portable device compares the sound input from the microphone with the data stored in the database, and determines whether the sound is to be output. When it is determined that the input sound is to be output, the mode is shifted to the alert mode and is mixed with the output sound.
  • Patent Document 2 discloses a sound reproducing device having an earphone and a microphone.
  • the earphone emits a sound signal into the ear.
  • the microphone picks up the sound outside the ear.
  • the sound reproducing device extracts the sound emitted from the moving body using a filter that passes through the band of the sound emitted from the moving body such as an automobile or a motorcycle.
  • the sound reproducing device calculates a time gradient of the collected sound signal and determines whether or not the moving body is approaching. When the moving body is approaching, the sound reproducing device reduces the level of the sound signal supplied to the earphone. By doing in this way, the user can know that the moving body is approaching.
  • Patent Document 3 discloses a headphone device that localizes music output from a music player outside the listener's head and localizes a ringing tone called from a mobile phone.
  • the level of the sound signal is lowered when the moving body is approaching.
  • it can be heard that surrounding sounds and music are sounding at the ears. For this reason, it may be difficult to hear ambient sounds due to the masking effect.
  • earphones or headphones with high sound insulation are used, the sound emitted from the moving body is blocked. Even if the sound insulation is lowered so that the ambient sound can be heard, the ambient sound becomes difficult to hear if the volume of the music being played is high. Therefore, in the apparatus of Patent Document 2, it may be difficult for the user to listen to surrounding sounds. Similarly, in the apparatus of Patent Document 3, it may be difficult for the user to listen to surrounding sounds.
  • earphones with low sound insulation are used, sound leakage may occur when the playback volume is increased.
  • the present embodiment has been made in view of the above points, and an object thereof is to provide a playback device and a playback method that allow a user to appropriately listen to surrounding sounds.
  • a playback apparatus includes a playback unit that outputs a playback signal for playing back playback sound, and an output unit that includes left and right output units that output sound toward the left and right ears of the user, respectively.
  • the reproduction method includes a step of outputting a reproduction signal for reproducing reproduction sound, and a step of collecting ambient sound using one or more microphones to obtain a sound collection signal.
  • FIG. 1 is a block diagram showing a configuration of a playback device according to a first embodiment. It is a figure which shows typically the structure of the earphone used for a reproducing
  • FIG. 6 is a flowchart showing a process for detecting an unsteady sound in the reproduction method according to the present embodiment. It is a figure for demonstrating the sound source direction of an unsteady sound. It is a figure which shows typically the change of the sound collection signal at the time of normal time and unsteady sound detection. It is a figure which shows typically the signal at the time of normal time and unsteady sound detection. It is a flowchart which shows the process which specifies the sound source direction of an unsteady sound. It is a flowchart which shows the process which switches a filter according to a sound source direction. It is a conceptual diagram which shows the structure which produces
  • the playback device includes a playback unit that plays back playback sound and an output unit such as an earphone or headphones.
  • the playback unit is typically a portable music player or a portable video player such as a digital audio player, a smartphone, or a tablet terminal, and records a playback sound in advance in an internal memory or cloud.
  • the playback sound is a sound at the time of music or video playback.
  • the music player With the user wearing earphones or headphones, the music player outputs a playback signal to the output unit.
  • a portable playback device By using a portable playback device, a user can listen to music outdoors.
  • FIG. 1 is a block diagram of a control configuration of the playback apparatus 100 according to the first embodiment.
  • FIG. 2 is a diagram schematically showing the configuration of the earphone 105 and the directional microphone 101 used in the playback apparatus 100.
  • the playback apparatus 100 includes a directional microphone 101, an out-of-head localization processing unit 102, a music player 103, a synthesis unit 104, and an earphone 105.
  • the music player 103 is, for example, a digital audio player or a smartphone application, and plays prerecorded music.
  • the music player 103 outputs a reproduction signal based on the reproduced music to the synthesis unit 104.
  • the reproduction signal is a stereo input signal. That is, the reproduction signal includes an Lch stereo input signal input to the left output unit 105L of the earphone 105 and an Rch stereo input signal input to the right output unit 105R.
  • the Lch stereo input signal is output from the left output unit 105L to the user's left ear, and the Rch stereo input signal is output from the right output unit 105R to the user's right ear.
  • the playback sound is music
  • the music player 103 is used.
  • an application that can play a moving image can be used instead of the music player 103.
  • the directional microphone 101 is a microphone that collects sounds around the user.
  • the directional microphone 101 outputs a collected sound signal based on the ambient sound to the out-of-head localization processing unit 102.
  • the directional microphone 101 collects sound generated behind the user. That is, the directional microphone 101 is a microphone having directivity behind the user.
  • the out-of-head localization processing unit 102 performs out-of-head localization processing on the collected sound signal from the directional microphone 101. Specifically, the out-of-head localization processing unit 102 convolves a filter with the collected sound signal. By doing so, the out-of-head localization processing unit 102 generates a sound collection signal subjected to localization processing. The out-of-head localization processing unit 102 outputs the collected sound signal subjected to the localization process to the synthesis unit 104. The out-of-head localization processing unit 102 performs out-of-head localization processing by convolving transfer characteristics such as a head-related transfer function HRTF (Head Related Transfer Function).
  • HRTF Head-related Transfer Function
  • an Lch filter corresponding to the transfer characteristic for the left ear and an Rch filter corresponding to the transfer characteristic for the right ear are set.
  • the out-of-head localization processing unit 102 convolves an Lch filter with the collected sound signal from the directional microphone 101 to generate a localized Lch collected sound signal.
  • the out-of-head localization processing unit 102 convolves an Rch filter with the collected sound signal from the directional microphone 101 to generate a localized Rch collected sound signal.
  • the filter used for the out-of-head localization processing unit 102 will be described later.
  • the synthesizing unit 104 is a mixer that synthesizes the localized sound collection signal and the reproduction signal. Specifically, the synthesis unit 104 generates a synthesized signal by adjusting and synthesizing the volume balance between the collected sound signal and the reproduction signal. The combining unit 104 outputs the combined signal to the earphone 105.
  • the synthesis unit 104 multiplies the sound collection signal and the reproduction signal by a coefficient corresponding to the volume balance. Then, the synthesis unit 104 generates a synthesized signal by adding the collected sound signal multiplied by the coefficient and the reproduction signal.
  • the synthesizing unit 104 performs synthesis processing on each of the Lch signal and the Rch signal. That is, the combining unit 104 adds the Lch sound pickup signal and the Lch reproduction signal to generate an Lch combined signal. Similarly, the synthesis unit 104 adds the Rch reproduction signal and the Rch sound collection signal to generate an Rch synthesis signal.
  • the combining unit 104 outputs the Lch combined signal and the Rch combined signal to the earphone 105.
  • the earphone 105 includes a left output unit 105L and a right output unit 105R as shown in FIG.
  • the left output unit 105L and the right output unit 105R are connected by a neckband 110.
  • a directional microphone 101 is attached to the neckband 110.
  • the directional microphone 101 is provided at the left and right center of the neckband 110. When the user wears 105, the directional microphone 101 is attached to the neckband 110 so as to face backward.
  • the directional microphone 101 collects sound behind the user.
  • the user puts the neckband 110 on the neck and wears the earphone 105.
  • the left output unit 105L outputs the Lch composite signal toward the left ear of the user.
  • the right output unit 105R outputs the Rch composite signal toward the right ear of the user. That is, the earphone 105 reproduces a sound in which ambient sound and music are mixed based on the synthesized signal.
  • the out-of-head localization processing unit 102 performs out-of-head localization processing on at least one of the collected sound signal and the reproduced signal, so that the localization position of the collected sound signal and the localized position of the reproduced signal when output from the earphone 105 are obtained.
  • FIG. 3 is a diagram schematically showing the localization position A of the reproduction signal (music) and the localization position B of the collected sound signal (ambient sound).
  • the out-of-head localization processing unit 102 convolves an out-of-head localization filter with the collected sound signal. Therefore, the collected sound signal is localized outside the user U's head. Specifically, since the localization position B of the collected sound signal is behind the user U, the convolution process is performed using an out-of-head localization filter that is localized backward.
  • the convolution processing is not applied to the reproduction signal.
  • the music played from the music player 103 is not filtered. Therefore, music sounds in the ear as usual and can be localized and heard in the head. That is, the localization position A of the reproduction signal is in the vicinity of the user U's ears 9L and 9R.
  • the localization position A of the reproduction signal and the localization position B of the sound pickup signal are different.
  • the music reproduced by the music player 103 is heard at the ear and is localized in the head, but the ambient sound is localized behind the user U's head. While the music of the reproduction signal can be heard from the ear, the ambient sound of the collected signal can be heard from behind. For the user U, it can be heard that music and ambient sounds are generated at different positions. Therefore, even when the volume level of the collected sound signal is lower than the volume level of the reproduction signal, the user U can easily hear the ambient sound. Further, the synthesizing unit 104 may change the ambient sound level in conjunction with the music playback level so that the ambient sound is not buried in the music.
  • the localization of the music being played at the ear and the ambient sound is separated. For this reason, it is easy to hear the ambient sound even during music reproduction, and the ambient sound can be detected more accurately.
  • the music and ambient sound have the same localization, the ambient sound becomes difficult to hear due to the masking effect. Since the localization position is different, it is possible to hear ambient sounds even when music is being played at a high volume. In other words, even when the relative volume level of the collected sound signal with respect to the volume level of the reproduction signal is small, the user U can easily hear the ambient sound. Therefore, even if the earphone 105 always outputs a sound collection signal, the user U can effectively listen to music. Furthermore, when the ambient sound increases, the ambient sound output from the earphone 105 also increases. Therefore, the user can appropriately listen to ambient sounds and music.
  • the directional microphone 101 collects ambient sounds from the rear. Therefore, it is possible to alert the user U to the rear. For example, when a vehicle such as an automobile, a bicycle, or a motorcycle is approaching from behind, the approach sound of the vehicle is output from the left and right output units 105L and 105R of the earphone 105 as an ambient sound. The ambient sound is localized backward. Therefore, it is possible to accurately detect the approach of the vehicle from the rear, and to prompt attention to the rear.
  • the user U can appropriately listen to the ambient sound. Since the earphone 105 with high sound insulation can be used, sound leakage can be prevented. Furthermore, if the directional microphone 101 is turned off, it is possible to prevent the ambient sound from being heard depending on the use situation. That is, when there is no need to hear ambient sounds, the microphone 101 may be turned off. Alternatively, the synthesis unit 104 may set the volume balance of ambient sounds to zero. For example, when listening to music indoors, the earphone 105 does not output ambient sound, and only the music is reproduced.
  • FIG. 4 is a conceptual diagram showing a configuration for generating a filter used for localization processing.
  • a stereo speaker 5 is arranged behind the listener 1.
  • the stereo speaker 5 has a left speaker 5L and a right speaker 5R.
  • the left speaker 5L is installed diagonally left rear of the listener 1
  • the right speaker 5R is installed diagonally right rear of the listener 1.
  • the left speaker 5L and the right speaker 5R are disposed symmetrically with respect to the listener 1.
  • the left speaker 5L and the right speaker 5R output an impulse sound or the like for performing impulse response measurement.
  • the microphone 2L is installed in the left ear 9L of the listener 1, and the microphone 2R is installed in the right ear 9R. Specifically, it is preferable to install microphones 2L and 2R at the ear canal entrance or the eardrum position of the left ear 9L and the right ear 9R.
  • the microphones 2L and 2R collect the measurement signal output from the stereo speaker 5 and acquire the collected sound signal.
  • the listener 1 is preferably the user U who uses the playback apparatus 100, but transfer characteristics obtained by measurement with a person other than the user U may be used. Further, the listener 1 may be a person or a dummy head. That is, in this embodiment, the listener 1 is a concept including not only a person but also a dummy head.
  • the impulse response is measured by measuring the impulse sound output from the left and right speakers 5L and 5R with the microphones 2L and 2R.
  • the collected sound signal acquired based on the impulse response measurement is stored in a memory or the like.
  • the transfer characteristic Hls between the left speaker 5L and the left microphone 2L, the transfer characteristic Hlo between the left speaker 5L and the right microphone 2R, the transfer characteristic Hro between the right speaker 5R and the left microphone 2L, and the right speaker A transfer characteristic Hrs between 5R and the right microphone 2R is measured. That is, the transfer characteristic Hls is acquired by the left microphone 2L collecting the measurement signal output from the left speaker 5L.
  • the transfer characteristic Hlo is acquired by the right microphone 2R collecting the measurement signal output from the left speaker 5L.
  • the transfer characteristic Hro is acquired.
  • the transfer characteristic Hrs is acquired.
  • a filter corresponding to the transfer characteristics Hls to Hrs measured in this way is generated. Specifically, the transfer characteristics Hls to Hrs are cut out with a predetermined filter length and generated as a filter used for the convolution calculation of the out-of-head localization processing unit 102.
  • a filter may be generated using a head related transfer function (HRTF).
  • the out-of-head localization processing unit 102 convolves the sound pickup signal, which is a monaural signal, with filters corresponding to the four transfer characteristics Hls to Hrs.
  • the out-of-head localization processing unit 102 adds the sound collection signal in which the transfer characteristic Hls is convoluted and the sound collection signal in which the transfer characteristic Hro is convoluted to obtain an Lch sound collection signal.
  • the collected sound signal with the transfer characteristic Hlo convoluted and the collected sound signal with the transfer characteristic Hrs convoluted are added to obtain an Rch sound collected signal.
  • the ambient sound can be localized behind the user U by the impulse response measurement with the stereo speaker 5 installed behind the listener 1.
  • FIG. FIG. 5 is a diagram schematically showing the configuration of the earphone 105 used in the playback apparatus 100 for the playback apparatus according to the present embodiment.
  • directional microphone 101 attached to earphone 105 is different from that in the first embodiment. Since the configuration other than the directional microphone 101 is the same as that of the first embodiment, description thereof will be omitted as appropriate.
  • two directional microphones 101L and 101R are attached to the neckband 110.
  • the directional microphone 101L is provided on the left side of the center of the neckband 110, and the directional microphone 101R is provided on the right side.
  • the directional microphones 101L and 101R are preferably attached symmetrically. Note that the collected sound signals collected by the directional microphones 101L and 101R are stereo signals.
  • the directional microphones 101L and 101R output a sound collection signal to the out-of-head localization processing unit 102.
  • the out-of-head localization processing unit 102 convolves the transmission characteristics Hls and Hlo with the collected sound signal from the directional microphone 101L.
  • the out-of-head localization processing unit 102 convolves the transfer characteristics Hro and Hrs with the sound pickup signal from the directional microphone 101R.
  • the out-of-head localization processing unit 102 adds the sound collection signal in which the transfer characteristic Hls is convoluted and the sound collection signal in which the transfer characteristic Hro is convoluted to obtain an Lch sound collection signal.
  • the collected sound signal with the transfer characteristic Hlo convoluted and the collected sound signal with the transfer characteristic Hrs convoluted are added to obtain an Rch sound collected signal.
  • the same effect as in the first embodiment can be obtained.
  • since directional microphones 101L and 101R are stereo microphones, ambient sounds can be localized backward in a stereo sound field.
  • FIG. 6 is a control block diagram illustrating the configuration of the playback apparatus according to the third embodiment.
  • an unsteady sound detection unit 106 is added. Since the configuration other than the unsteady sound detection unit 106 is the same as that of the first and second embodiments, the description thereof is omitted.
  • the unsteady sound detection unit 106 detects whether or not unsteady sound is generated around the user U. Examples of the unsteady sound include running sounds of vehicles such as automobiles, motorcycles, and bicycles. When an unsteady sound is generated, the unsteady sound detection unit 106 outputs a control signal to the synthesis unit 104.
  • the synthesizing unit 104 controls the volume levels of ambient sounds and music according to the control signal. Specifically, the synthesizing unit 104 increases the volume level of the ambient sound when detecting the unsteady sound.
  • ambient sounds can be output at a high volume level only when unsteady sounds are detected. That is, when an unsteady sound is not detected, an ambient sound can be output at a low volume level.
  • the earphone 105 normally outputs the ambient sound at a low level. However, when the rear unsteady sound is detected, the earphone 105 increases the volume of the ambient sound and outputs it. By doing so, the playback device 100 can emphasize the non-stationary sound. Therefore, it is possible to appropriately alert the rear.
  • FIG. 7 is a flowchart showing processing in the unsteady sound detection unit 106.
  • the directional microphone 101 detects a rear sound (S11).
  • the directional microphone 101 may be a monaural microphone as in the first embodiment, or may be a stereo microphone as in the second embodiment.
  • the collected sound signal from the directional microphone 101 is converted into the frequency domain (S12).
  • the collected sound signal can be converted into the frequency domain by performing a discrete Fourier transform.
  • a sound pickup signal before being subjected to the out-of-head localization processing by the out-of-head localization processing unit 102 can be used.
  • the unsteady sound detection unit 106 monitors the time change of the frequency spectrum (S13). Then, it is determined whether or not the spectrum has changed abruptly (S14). The unsteady sound detection unit 106 obtains the frequency spectrum continuously and calculates the change.
  • the unsteady sound detection unit 106 determines that an unsteady sound is generated behind (S15). For example, the unsteady sound detection unit 106 determines that a vehicle such as a car or a bicycle is approaching from behind. Then, the unsteady sound detection unit 106 outputs a control signal to the synthesis unit 104, and the synthesis unit 104 increases the surrounding volume (S16).
  • the unsteady sound detection unit 106 obtains the peak value of the frequency spectrum acquired continuously. And when there is a big fluctuation
  • the process returns to step S12 because an unsteady sound has occurred.
  • the unsteady sound detection unit 106 continuously monitors the frequency spectrum and determines whether or not there is a large fluctuation in the spectrum. By doing in this way, the unsteady sound detection part 106 can detect the presence or absence of unsteady sound. And when the unsteady sound has generate
  • Embodiment 4 the sound source direction of the unsteady sound is specified using a stereo microphone. For this reason, directional microphones 101L and 101R as shown in FIG. Based on the collected sound signals from the left and right directional microphones 101L and 101R, the unsteady sound detection unit 106 detects the direction of the unsteady sound. For example, the unsteady sound detection unit 106 specifies the sound source direction of the unsteady sound by using the volume difference or arrival time difference between the two collected sound signals. Note that the basic configuration and processing of the playback apparatus 100 are the same as those in the above-described embodiment, and thus description thereof is omitted.
  • FIG. 8 shows a direction based on the user U.
  • the user U is the origin, the front of the user U is 0 °, the right direction is 90 °, the rear is 180 °, and the left direction is 270 °.
  • 90 ° -120 ° in R3 direction 120 ° -150 ° in R2 direction, 150 ° -180 ° in R1 direction, 180 ° -210 ° in L1 direction, 210 ° -240 ° in L2 direction, 240 ° -270 ° Is the L3 direction.
  • the unsteady sound detector 106 monitors the change in sound pressure for each direction.
  • the unsteady sound changes greatly in the R2 direction.
  • the unsteady sound detection unit 106 specifies the sound source direction of the unsteady sound as the R2 direction.
  • the sound pressure in the sound source direction R2 of the collected sound signal is increased and emphasized. By doing so, it is possible to listen to the ambient sound more appropriately. For example, the running sound of the vehicle can be heard from the direction in which the vehicle is approaching. Therefore, the user can perceive the approach of the vehicle more effectively.
  • the unsteady sound detection unit 106 switches the filter according to the sound source direction. That is, the out-of-head localization processing unit 102 stores a filter for each direction of R3 to L3. A sound source direction filter emphasized by the out-of-head localization processing unit 102 may be convoluted with the collected sound signal. By doing so, the ambient sound can be localized in the sound source direction, so that the sound source direction can be emphasized.
  • filters Hls_L3, Hlo_L3, Hro_L3, and Hrs_L3 that emphasize the L3 direction are convoluted with the collected sound signal.
  • filters Hls_L2, Hlo_L2, Hro_L2, and Hrs_L2 that emphasize the L2 direction are convoluted with the collected sound signal.
  • filters Hls_L1, Hlo_L1, Hro_L1, and Hrs_L1 that emphasize the L1 direction are convolved with the collected sound signal.
  • filters Hls_R1, Hlo_R1, Hro_R1, and Hrs_R1 that emphasize the R1 direction are convoluted with the collected sound signal.
  • filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 that emphasize the R2 direction are convoluted with the collected sound signal.
  • the filters Hls_R3, Hlo_R3, Hro_R3, and Hrs_R3 are used to emphasize the R3 direction.
  • FIG. 11 is a flowchart showing a process for specifying the sound source direction.
  • the non-stationary sound detection unit 106 converts the collected sound signal into the frequency domain at regular time intervals (S21). Note that the unsteady sound detection unit 106 converts the collected sound signals acquired by the plurality of directional microphones 101 into the frequency domain. Here, the collected sound signal before the filter is convoluted by the out-of-head localization processing unit 102 is converted into the frequency domain.
  • the unsteady sound detection unit 106 determines whether or not a part of the spectrum has changed significantly (S22). If a part of the spectrum has not changed significantly (NO in S22), the process returns to step S21. When a part of the spectrum changes greatly (YES in S22), it is determined as an unsteady sound (S23).
  • the unsteady sound detection unit 106 obtains a difference M between the volume of the left directional microphone 101L and the volume of the right directional microphone 101R (S24).
  • the unsteady sound detection unit 106 obtains a difference M between the spectrum of the collected sound signal of the directional microphone 101L and the spectrum of the collected sound signal of the directional microphone 101R.
  • the frequency band for obtaining the difference is not particularly limited. For example, the difference of only the peak frequency at which the spectrum has changed greatly may be calculated, or the difference of the entire spectrum may be calculated. Alternatively, the unsteady sound detection unit 106 may obtain the difference between the left and right collected sound signals in the time domain.
  • the unsteady sound detection unit 106 specifies the sound source direction according to the value of the difference M. For example, when the difference M is +6 dB or more, the unsteady sound detection unit 106 determines that there is a sound source in the L3 direction (S25). That is, when the volume of the directional microphone 101L is considerably larger than the volume of the directional microphone 101R, the unsteady sound detection unit 106 determines that the unsteady sound source is in the L3 direction.
  • the unsteady sound detection unit 106 determines that there is a sound source in the L2 direction (S26). When the difference M is 0 dB or more and less than +3 dB, the unsteady sound detection unit 106 determines that there is a sound source in the L1 direction (S27). When the difference M is not less than ⁇ 3 dB and less than 0 dB, the unsteady sound detection unit 106 determines that there is a sound source in the R1 direction (S28).
  • the unsteady sound detection unit 106 determines that there is a sound source in the R2 direction (S29). When the difference M is less than ⁇ 6 dB, the unsteady sound detection unit 106 determines that there is a sound source in the R3 direction (S30).
  • the unsteady sound detection unit 106 specifies the sound source direction according to the difference M in the microphone volume. Specifically, as the difference M is larger, the unsteady sound detection unit 106 determines that there is a sound source in the left direction (270 °). Note that the direction of the sound source may be specified using the phase difference between signals collected by the left and right microphones, not limited to the volume difference. The sound source direction may be specified by combining the volume difference (or volume ratio) and the phase difference.
  • FIG. 12 is a flowchart illustrating processing for switching filters.
  • the out-of-head localization processing unit 102 convolves the rear front filters (Hls, Hlo, Hro, Hrs) with the collected sound signal (S41).
  • the filter (Hls, Hlo, Hro, Hrs) of the front front can be acquired by the measurement shown in FIG.
  • the unsteady sound detection unit 106 determines the sound source direction (S43). That is, the unsteady sound detection unit 106 determines which of the sound source directions is L3 to R3 by the process shown in FIG. Step S43 corresponds to the process shown in FIG.
  • the out-of-head localization processing unit 102 When the sound source direction is the L3 direction (L3 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_L3, Hlo_L3, Hro_L3, and Hrs_L3 in the L3 direction (S44). When the sound source direction is the L2 direction (L2 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_L2, Hlo_L2, Hro_L2, and Hrs_L2 in the L2 direction (S45).
  • the out-of-head localization processing unit 102 convolves the filters Hls_L1, Hlo_L1, Hro_L1, and Hrs_L1 in the L1 direction (S46).
  • the out-of-head localization processing unit 102 When the sound source direction is the R1 direction (R1 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_R1, Hlo_R1, Hro_R1, and Hrs_R1 in the R1 direction (S47). When the sound source direction is the R2 direction (R2 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction (S48).
  • the out-of-head localization processing unit 102 convolves the filters Hls_R3, Hlo_R3, Hro_R3, and Hrs_R3 in the R3 direction (S49).
  • the unsteady sound detection unit 106 identifies the sound source direction of the unsteady sound. Then, the out-of-head localization processing unit 102 convolves the filter set for each direction. By doing so, since the collected sound signal is localized in the sound source direction, it is possible to emphasize the unsteady sound in the sound source direction.
  • FIG. 13 is a diagram schematically illustrating a measurement configuration for generating a filter.
  • FIG. 13 shows a measurement configuration for generating the filters Hls, Hlo, Hro, and Hrs in the rear front and the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction.
  • the installation position of the stereo speaker 5 is changed according to the filter to be generated.
  • the midpoint between the left speaker 5L and the right speaker 5R is in the direction of 180 °.
  • the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction are generated, the midpoint between the left speaker 5L and the right speaker 5R is in the direction of 135 °.
  • the angle corresponds to the angle shown in FIG.
  • a filter for each direction can be generated. That is, the left speaker 5L and the right speaker 5R are rotated by an angle corresponding to the directions L3 to R3 with the listener 1 as the rotation center. Thereby, the filter according to direction can be easily generated.
  • the specified sound source direction is divided into six directions from the L3 direction to the R3 direction, but the number of divisions is not limited to six.
  • the sound source direction only needs to be divided into two or more according to the accuracy of the sound source direction. For example, when the accuracy of the sound source direction is low, it is sufficient that the sound source direction can be specified only in the two directions of the left diagonal rear and the right diagonal rear.
  • the specific direction of the sound source may be 6 or more, or may be less than 6.
  • Examples of localization positions In the above description, the example in which the sound pickup signal (ambient sound) is localized backward and the reproduction signal (music) is played at the ear (localization in the head) has been described. It is not limited. Examples of the localization positions A and B will be described below with reference to FIGS. 14 to 16 are diagrams showing localization positions according to Examples 1 to 3, respectively.
  • Example 1 shown in FIG. 14 the localization position A of the reproduction signal is in front of the user U, and the localization position B of the sound pickup signal is in the ear of the user U (localized in the head). .
  • the out-of-head localization processing unit 102 performs out-of-head localization processing only on the reproduction signal. That is, the out-of-head localization processing unit 102 convolves the filter only with the reproduction signal and does not convolve the filter with the collected sound signal. By doing so, localization positions A and B as shown in FIG. 14 can be realized.
  • the localization position A of the reproduction signal is in front of the user U, and the localization position B of the collected sound signal is in the rear of the user U.
  • the out-of-head localization processing unit 102 performs out-of-head localization processing on both the reproduction signal and the collected sound signal. That is, the out-of-head localization processing unit 102 convolves a reproduced signal with a front localization filter, and convolves a sound localization signal with a rear localization filter. By doing so, localization positions A and B as shown in FIG. 15 can be realized.
  • the front localization filter is generated based on a measurement in which the stereo speaker 5 is installed in front of the listener 1.
  • Example 3 shown in FIG. 16 the localization position A of the reproduction signal is in front of the user U. Then, the localization position B of the collected sound signal changes depending on the presence or absence of the unsteady sound. When an unsteady sound is not detected, the localization position B of the collected sound signal is near the ear (localized in the head). When an unsteady sound is detected, the localization position B ′ of the collected sound signal is rearward. That is, the localization position B of the collected sound signal is changed by switching the filter according to the unsteady sound. By doing so, the localization positions A and B as shown in FIG. 16 can be realized.
  • the localization position A of the reproduction signal and the localization position B of the sound pickup signal are not limited to those described above.
  • it can be localized at an arbitrary position. Only the reproduction signal may be filtered, or only the collected sound signal may be filtered. Alternatively, filter processing may be performed on both the reproduction signal and the collected sound signal.
  • the out-of-head localization processing unit 102 may switch the localization position A of the reproduction signal according to the presence or absence of an unsteady sound. Furthermore, the out-of-head localization processing unit 102 may switch both the localization position A of the reproduction signal and the localization position B of the collected sound signal in accordance with the presence or absence of an unsteady sound.
  • the output unit that outputs the synthesized signal is described as the earphone 105, but the output unit may be a headphone.
  • the sound reproduced by the reproduction signal has been described as music output from the music player 103, it may be a reproduction sound reproduced together with a moving image.
  • the reproduction signal combined with the collected sound signal by the synthesis unit 104 may be a reproduction signal based on movie sound or the like.
  • the number of microphones may be three or more.
  • a microphone that collects ambient sounds is not limited to a directional microphone. For example, when a smartphone application is used as the music player 103, ambient sounds may be collected using a smartphone microphone.
  • the above processing may be executed by a mobile terminal such as a smartphone, or may be executed by an earphone or a DSP (Digital Signal Processor) built in the headphones.
  • a part of the above processing may be executed by the mobile terminal, and the rest may be executed by the DSP built in the earphone or the headphone.
  • the directional microphone 101 is connected to the voice input terminal of the portable terminal.
  • Non-transitory computer readable media include various types of tangible storage media.
  • Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
  • the program may be supplied to a computer by various types of temporary computer readable media.
  • Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves.
  • the temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
  • This application is applicable to a playback device that performs out-of-head localization processing.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

There are provided a playback apparatus and a playback method with which a user can appropriately hear ambient sound. A playback apparatus according to the present embodiment is provided with: a music player (103) that outputs a playback signal for playback of music; earphones (105) that have right and left output units that respectively output sound toward right and left ears of a user (U); one or more directional microphones (101) that acquire a sound pickup signal by picking up ambient sound; an out-of-head localization processing unit (102) that makes a localization position of the sound pickup signal different from a localization position of the playback signal when the sound pickup signal and the playback signal are being output from the earphones (105), by subjecting at least one of the sound pickup signal and the playback signal to an out-of-head localization process; and a combining unit (104) that combines and outputs, to the output units, the sound pickup signal and the playback signal.

Description

再生装置、及び再生方法Playback apparatus and playback method
 本発明は、再生装置、及び再生方法に関する。 The present invention relates to a playback device and a playback method.
 デジタルオーディオプレーヤなどの携帯型音楽再生装置によって、屋外でのジョギング中等に音楽を受聴することが可能となっている。屋外でユーザがイヤホンやヘッドホンを装着している場合において、アナウンス音等の周囲の音を聴くことができないと、不都合が生じる場合がある。 A portable music player such as a digital audio player can listen to music while jogging outdoors. When the user wears earphones or headphones outdoors, inconvenience may occur if surrounding sounds such as announcement sounds cannot be heard.
 特許文献1には、ヘッドホンで音楽を聴いている状況下において、ヘッドホンを利用して周囲の音をユーザの耳に届ける携帯機器が開示されている。特許文献1の携帯機器では、マイクによって周囲の音を収音している。そして、携帯機器は、マイクから入力された音をデータベースに格納されたデータと比較して、出力されるべき音であるか否かを判定している。入力音が出力されるべき音と判定した場合、注意喚起モードに移行して、出力音声にミキシングしている。 Patent Document 1 discloses a portable device that uses headphones to deliver ambient sounds to the user's ears while listening to music with headphones. In the portable device of Patent Document 1, ambient sounds are collected by a microphone. Then, the portable device compares the sound input from the microphone with the data stored in the database, and determines whether the sound is to be output. When it is determined that the input sound is to be output, the mode is shifted to the alert mode and is mixed with the output sound.
 特許文献2には、イヤホンとマイクとを有する音再生装置が開示されている。イヤホンは、音信号を耳の中に放音している。マイクは、耳の外の音を収音している。音再生装置は、自動車やバイクなどの移動体の放出音の帯域を通過するフィルタを用いて、移動体の放出音を抽出している。音再生装置は、収音信号の時間勾配を算出して、移動体が近づいているか否かを判定している。移動体が近づいている場合、音再生装置は、イヤホンに供給する音信号のレベルを低下している。このようにすることで、ユーザが、移動体が近づいていることを知ることができる。 Patent Document 2 discloses a sound reproducing device having an earphone and a microphone. The earphone emits a sound signal into the ear. The microphone picks up the sound outside the ear. The sound reproducing device extracts the sound emitted from the moving body using a filter that passes through the band of the sound emitted from the moving body such as an automobile or a motorcycle. The sound reproducing device calculates a time gradient of the collected sound signal and determines whether or not the moving body is approaching. When the moving body is approaching, the sound reproducing device reduces the level of the sound signal supplied to the earphone. By doing in this way, the user can know that the moving body is approaching.
 特許文献3には、音楽用プレーヤから出力される音楽を受聴者の頭外に定位させ、携帯電話機から呼び出される呼び出し音を頭内に定位させるヘッドホン装置が開示されている。 Patent Document 3 discloses a headphone device that localizes music output from a music player outside the listener's head and localizes a ringing tone called from a mobile phone.
特開2007-243493号公報JP 2007-243493 A 特開2012-248964号公報JP 2012-248964 A 特開2004-201195号公報JP 2004-201195 A
 特許文献1の装置では、注意喚起モードになった場合に、周囲の音を出力音声にミキシングしている。しかしながら、周囲の音と音楽は、いずれも耳元で鳴っているように聴こえる。このため、マスキング効果によって周囲音が聞き取りづらい場合がある。 In the device of Patent Literature 1, when the attention mode is set, the surrounding sound is mixed with the output sound. However, both the surrounding sounds and music can be heard as if they are ringing at the ears. For this reason, it may be difficult to hear ambient sounds due to the masking effect.
 特許文献2の装置では、移動体が近づいている場合に、音信号のレベルを低下させている。しかしながら、特許文献1と同様に、周囲の音と音楽が耳元で鳴っているように聴こえる。このため、マスキング効果によって周囲音が聞き取りづらい場合がある。遮音性の高いイヤホンやヘッドホンを用いた場合、移動体の放出音が遮音されてしまう。また、周囲音を聴けるように遮音性を低くしても、再生している音楽の音量が大きいと周囲音は聴こえづらくなってしまう。したがって、特許文献2の装置では、ユーザが周囲の音を聴くことが困難な場合がある。特許文献3の装置でも、同様に、ユーザが周囲の音を聴くことが困難な場合がある。さらに、遮音性が低いイヤホンを用いた場合、再生音量を大きくすると音漏れしてしまう場合がある。 In the apparatus of Patent Document 2, the level of the sound signal is lowered when the moving body is approaching. However, similar to Patent Document 1, it can be heard that surrounding sounds and music are sounding at the ears. For this reason, it may be difficult to hear ambient sounds due to the masking effect. When earphones or headphones with high sound insulation are used, the sound emitted from the moving body is blocked. Even if the sound insulation is lowered so that the ambient sound can be heard, the ambient sound becomes difficult to hear if the volume of the music being played is high. Therefore, in the apparatus of Patent Document 2, it may be difficult for the user to listen to surrounding sounds. Similarly, in the apparatus of Patent Document 3, it may be difficult for the user to listen to surrounding sounds. Furthermore, when earphones with low sound insulation are used, sound leakage may occur when the playback volume is increased.
 本実施形態は上記の点に鑑みなされたもので、ユーザが適切に周囲の音を聴くことができる再生装置、及び再生方法を提供することを目的とする。 The present embodiment has been made in view of the above points, and an object thereof is to provide a playback device and a playback method that allow a user to appropriately listen to surrounding sounds.
 本実施形態の一態様にかかる再生装置は、再生音を再生するための再生信号を出力する再生部と、ユーザの左右の耳に向けて音をそれぞれ出力する左右の出力ユニットを有する出力部と、周囲音を収音して、収音信号を取得する1つ以上のマイクと、前記収音信号及び前記再生信号の少なくとも一方に対して頭外定位処理を行うことで、前記出力部から出力される際の前記収音信号の定位位置と前記再生信号の定位位置とを異ならせる定位処理部と、前記収音信号と前記再生信号とを合成して、前記出力部に出力する合成部と、を備えたものである。 A playback apparatus according to an aspect of the present embodiment includes a playback unit that outputs a playback signal for playing back playback sound, and an output unit that includes left and right output units that output sound toward the left and right ears of the user, respectively. Output from the output unit by performing out-of-head localization processing on at least one of the collected sound signal and the reproduction signal, and one or more microphones that collect ambient sound and acquire the collected sound signal A localization processing unit for differentiating the localization position of the sound pickup signal and the localization position of the reproduction signal when combined, and a synthesis unit that synthesizes the sound collection signal and the reproduction signal and outputs them to the output unit; , With.
 本実施形態の一態様にかかる再生方法は、再生音を再生するための再生信号を出力するステップと、1つ以上のマイクを用いて周囲音を収音して、収音信号を取得するステップと、前記収音信号及び前記再生信号の少なくとも一方に対して頭外定位処理を行うステップと、前記頭外定位処理の後、前記収音信号と前記再生信号とを合成して、合成信号を生成するステップと、左右の出力ユニットを有する出力部から、ユーザの左右の耳に向けて、前記合成信号をそれぞれ出力するステップと、を備えたものである。 The reproduction method according to one aspect of the present embodiment includes a step of outputting a reproduction signal for reproducing reproduction sound, and a step of collecting ambient sound using one or more microphones to obtain a sound collection signal. A step of performing out-of-head localization processing on at least one of the collected sound signal and the reproduction signal; and after the out-of-head localization processing, combining the collected sound signal and the reproduction signal, A step of generating, and a step of outputting the synthesized signal from the output unit having the left and right output units toward the left and right ears of the user, respectively.
 本実施形態によれば、ユーザが適切に周囲の音を聴くことができる再生装置、及び再生方法を提供することができる。 According to the present embodiment, it is possible to provide a playback device and a playback method that allow a user to appropriately listen to surrounding sounds.
本実施の形態1に係る再生装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a playback device according to a first embodiment. 再生装置に用いられるイヤホンの構成を模式的に示す図である。It is a figure which shows typically the structure of the earphone used for a reproducing | regenerating apparatus. マイクからの収音信号をユーザ後方に定位させるフィルタを生成する構成を示す概念図である。It is a conceptual diagram which shows the structure which produces | generates the filter which localizes the sound collection signal from a microphone to a user back. 再生信号の定位位置Aと、収音信号の定位位置Bを示す図である。It is a figure which shows the localization position A of a reproduction signal, and the localization position B of a sound collection signal. 本実施の形態2に係る再生装置に用いられるイヤホンの構成を模式的に示す図である。It is a figure which shows typically the structure of the earphone used for the reproducing | regenerating apparatus which concerns on this Embodiment 2. FIG. 本実施の形態3に係る再生装置の構成を示すブロック図である。It is a block diagram which shows the structure of the reproducing | regenerating apparatus concerning this Embodiment 3. FIG. 本実施の形態にかかる再生方法において、非定常音を検出するための処理を示すフローチャートである。6 is a flowchart showing a process for detecting an unsteady sound in the reproduction method according to the present embodiment. 非定常音の音源方向を説明するための図である。It is a figure for demonstrating the sound source direction of an unsteady sound. 平常時と非定常音検出時の収音信号の変化を模式的に示す図である。It is a figure which shows typically the change of the sound collection signal at the time of normal time and unsteady sound detection. 平常時と非定常音検出時の信号を模式的に示す図である。It is a figure which shows typically the signal at the time of normal time and unsteady sound detection. 非定常音の音源方向を特定する処理を示すフローチャートである。It is a flowchart which shows the process which specifies the sound source direction of an unsteady sound. 音源方向に応じてフィルタを切り替える処理を示すフローチャートである。It is a flowchart which shows the process which switches a filter according to a sound source direction. 音源方向に応じたフィルタを生成する構成を示す概念図である。It is a conceptual diagram which shows the structure which produces | generates the filter according to a sound source direction. 定位位置の実施例1を示す図である。It is a figure which shows Example 1 of a localization position. 定位位置の実施例2を示す図である。It is a figure which shows Example 2 of a localization position. 定位位置の実施例3を示す図である。It is a figure which shows Example 3 of a localization position.
 本実施の形態にかかる再生装置は、再生音を再生する再生部と、イヤホン又はヘッドホン等の出力部とを備えている。再生部は、典型的には、デジタルオーディオプレーヤ、スマートフォン、タブレット端末等の携帯型音楽プレーヤや携帯型動画プレーヤであり、内部メモリ又はクラウドに予め再生音を録音している。再生音は、音楽や動画再生時の音声である。ユーザがイヤホン又はヘッドホンを装着した状態で、音楽プレーヤが出力部に再生信号を出力する。携帯型の再生装置を用いることで、ユーザが屋外等で音楽を受聴することができる。 The playback device according to the present embodiment includes a playback unit that plays back playback sound and an output unit such as an earphone or headphones. The playback unit is typically a portable music player or a portable video player such as a digital audio player, a smartphone, or a tablet terminal, and records a playback sound in advance in an internal memory or cloud. The playback sound is a sound at the time of music or video playback. With the user wearing earphones or headphones, the music player outputs a playback signal to the output unit. By using a portable playback device, a user can listen to music outdoors.
実施の形態1.
 本実施の形態にかかる再生装置について、図1、図2を用いて説明する。図1は、実施の形態1にかかる再生装置100の制御構成を示すブロック図である。図2は、再生装置100に用いられるイヤホン105及び指向性マイク101の構成を模式的に示す図である。再生装置100は、指向性マイク101と、頭外定位処理部102と、音楽プレーヤ103と、合成部104と、イヤホン105と、を備えている。
Embodiment 1 FIG.
The playback apparatus according to this embodiment will be described with reference to FIGS. FIG. 1 is a block diagram of a control configuration of the playback apparatus 100 according to the first embodiment. FIG. 2 is a diagram schematically showing the configuration of the earphone 105 and the directional microphone 101 used in the playback apparatus 100. The playback apparatus 100 includes a directional microphone 101, an out-of-head localization processing unit 102, a music player 103, a synthesis unit 104, and an earphone 105.
 音楽プレーヤ103は、例えば、デジタルオーディオプレーヤやスマートフォンのアプリであり、予め録音されている音楽を再生する。音楽プレーヤ103は再生した音楽に基づく再生信号を、合成部104に出力する。再生信号は、ステレオ入力信号である。すなわち、再生信号は、イヤホン105の左出力ユニット105Lに入力されるLchのステレオ入力信号と、右出力ユニット105Rに入力されるRchのステレオ入力信号を含んでいる。Lchのステレオ入力信号は、左出力ユニット105Lからユーザの左耳に出力され、Rchのステレオ入力信号は、右出力ユニット105Rからユーザの右耳に向けて出力される。ここでは、再生音が音楽であるため、音楽プレーヤ103を用いているが、再生音が動画の音声である場合、音楽プレーヤ103の代わりに、動画再生可能なアプリ等を用いることができる。 The music player 103 is, for example, a digital audio player or a smartphone application, and plays prerecorded music. The music player 103 outputs a reproduction signal based on the reproduced music to the synthesis unit 104. The reproduction signal is a stereo input signal. That is, the reproduction signal includes an Lch stereo input signal input to the left output unit 105L of the earphone 105 and an Rch stereo input signal input to the right output unit 105R. The Lch stereo input signal is output from the left output unit 105L to the user's left ear, and the Rch stereo input signal is output from the right output unit 105R to the user's right ear. Here, since the playback sound is music, the music player 103 is used. However, when the playback sound is a moving image sound, an application that can play a moving image can be used instead of the music player 103.
 指向性マイク101は、ユーザの周囲の音を収音するマイクである。指向性マイク101は、周囲音に基づく収音信号を頭外定位処理部102に出力する。例えば、指向性マイク101は、ユーザの後方で発生する音を収音する。すなわち、指向性マイク101は、ユーザの後方に指向性を持つマイクである。 The directional microphone 101 is a microphone that collects sounds around the user. The directional microphone 101 outputs a collected sound signal based on the ambient sound to the out-of-head localization processing unit 102. For example, the directional microphone 101 collects sound generated behind the user. That is, the directional microphone 101 is a microphone having directivity behind the user.
 頭外定位処理部102は、指向性マイク101からの収音信号に対して、頭外定位処理を行う。具体的には、頭外定位処理部102には収音信号に対してフィルタを畳み込む。こうすることで、頭外定位処理部102は、定位処理された収音信号を生成する。頭外定位処理部102は、定位処理された収音信号を合成部104に出力する。頭外定位処理部102は、頭部伝達関数HRTF(Head Related Transfer Function)等の伝達特性を畳み込むことにより、頭外定位処理を行う。 The out-of-head localization processing unit 102 performs out-of-head localization processing on the collected sound signal from the directional microphone 101. Specifically, the out-of-head localization processing unit 102 convolves a filter with the collected sound signal. By doing so, the out-of-head localization processing unit 102 generates a sound collection signal subjected to localization processing. The out-of-head localization processing unit 102 outputs the collected sound signal subjected to the localization process to the synthesis unit 104. The out-of-head localization processing unit 102 performs out-of-head localization processing by convolving transfer characteristics such as a head-related transfer function HRTF (Head Related Transfer Function).
 具体的には、頭外定位処理部102には、左耳に対する伝達特性に応じたLchのフィルタと、右耳に対する伝達特性に応じたRchのフィルタが設定されている。頭外定位処理部102は、指向性マイク101からの収音信号に対してLchのフィルタを畳み込むことで、定位処理されたLchの収音信号を生成する。同様に頭外定位処理部102は、指向性マイク101からの収音信号に対してRchのフィルタを畳み込むことで、定位処理されたRchの収音信号を生成する。頭外定位処理部102に用いられるフィルタについては後述する。 Specifically, in the out-of-head localization processing unit 102, an Lch filter corresponding to the transfer characteristic for the left ear and an Rch filter corresponding to the transfer characteristic for the right ear are set. The out-of-head localization processing unit 102 convolves an Lch filter with the collected sound signal from the directional microphone 101 to generate a localized Lch collected sound signal. Similarly, the out-of-head localization processing unit 102 convolves an Rch filter with the collected sound signal from the directional microphone 101 to generate a localized Rch collected sound signal. The filter used for the out-of-head localization processing unit 102 will be described later.
 合成部104は、定位処理された収音信号と、再生信号とを合成するミキサーである。具体的には、合成部104は、収音信号と再生信号の音量バランスを調整して、合成することで、合成信号を生成する。合成部104は合成信号をイヤホン105に出力する。 The synthesizing unit 104 is a mixer that synthesizes the localized sound collection signal and the reproduction signal. Specifically, the synthesis unit 104 generates a synthesized signal by adjusting and synthesizing the volume balance between the collected sound signal and the reproduction signal. The combining unit 104 outputs the combined signal to the earphone 105.
 例えば、合成部104は、収音信号と再生信号とに対して、音量バランスに応じた係数を乗じる。そして、合成部104は、係数が乗じられた収音信号と再生信号とを加算することで、合成信号を生成する。また、合成部104は、Lchの信号とRchの信号とのそれぞれに対して合成処理を行う。すなわち、合成部104は、Lchの収音信号とLchの再生信号を加算することで、Lchの合成信号を生成する。同様に、合成部104は、Rchの再生信号とRchの収音信号とを加算することで、Rchの合成信号を生成する。合成部104はLchの合成信号とRchの合成信号をイヤホン105に出力する。 For example, the synthesis unit 104 multiplies the sound collection signal and the reproduction signal by a coefficient corresponding to the volume balance. Then, the synthesis unit 104 generates a synthesized signal by adding the collected sound signal multiplied by the coefficient and the reproduction signal. The synthesizing unit 104 performs synthesis processing on each of the Lch signal and the Rch signal. That is, the combining unit 104 adds the Lch sound pickup signal and the Lch reproduction signal to generate an Lch combined signal. Similarly, the synthesis unit 104 adds the Rch reproduction signal and the Rch sound collection signal to generate an Rch synthesis signal. The combining unit 104 outputs the Lch combined signal and the Rch combined signal to the earphone 105.
 イヤホン105は、図2に示すように、左出力ユニット105Lと右出力ユニット105Rとを備えている。そして、左出力ユニット105Lと右出力ユニット105Rは、ネックバンド110で連結されている。ネックバンド110には、指向性マイク101が取り付けられている。指向性マイク101は、ネックバンド110の左右中央に設けられている。ユーザが105を装着した時に、指向性マイク101が後方を向くように、ネックバンド110に取り付けられている。指向性マイク101は、ユーザの後方の音を収音する。 The earphone 105 includes a left output unit 105L and a right output unit 105R as shown in FIG. The left output unit 105L and the right output unit 105R are connected by a neckband 110. A directional microphone 101 is attached to the neckband 110. The directional microphone 101 is provided at the left and right center of the neckband 110. When the user wears 105, the directional microphone 101 is attached to the neckband 110 so as to face backward. The directional microphone 101 collects sound behind the user.
 ユーザがネックバンド110を首に掛けて、イヤホン105を装着する。左出力ユニット105Lは、Lchの合成信号をユーザの左耳に向けて出力する。右出力ユニット105Rは、Rchの合成信号をユーザの右耳に向けて出力する。すなわち、イヤホン105は、合成信号に基づいて、周囲音と音楽がミキシングされた音を再生する。 The user puts the neckband 110 on the neck and wears the earphone 105. The left output unit 105L outputs the Lch composite signal toward the left ear of the user. The right output unit 105R outputs the Rch composite signal toward the right ear of the user. That is, the earphone 105 reproduces a sound in which ambient sound and music are mixed based on the synthesized signal.
 このようにすることで、ユーザの周囲で発生した周囲音を頭外に定位させることができる。一方、音楽プレーヤ103で再生された音楽は、頭外定位されていないため、ユーザの耳元で鳴り、頭内に定位して聴こえる。収音信号の定位位置と、再生信号の定位位置とが異なっている。周囲音と音楽の定位を分離している。頭外定位処理部102は、収音信号及び再生信号の少なくとも一方に対して頭外定位処理を行うことで、イヤホン105から出力される際の収音信号の定位位置と再生信号の定位位置とを異ならせている。 In this way, ambient sounds generated around the user can be localized outside the head. On the other hand, the music played back by the music player 103 is not localized outside the head, so it can be heard at the user's ear and localized in the head. The localization position of the collected sound signal is different from the localization position of the reproduction signal. Ambient sounds and music localization are separated. The out-of-head localization processing unit 102 performs out-of-head localization processing on at least one of the collected sound signal and the reproduced signal, so that the localization position of the collected sound signal and the localized position of the reproduced signal when output from the earphone 105 are obtained. Are different.
 図3を用いて、収音信号の定位位置と再生信号の定位位置とについて説明する。図3は、再生信号(音楽)の定位位置Aと収音信号(周囲音)の定位位置Bとを模式的に示す図である。 Referring to FIG. 3, the localization position of the collected sound signal and the localization position of the reproduction signal will be described. FIG. 3 is a diagram schematically showing the localization position A of the reproduction signal (music) and the localization position B of the collected sound signal (ambient sound).
 上記したように、頭外定位処理部102は、収音信号に対して頭外定位用のフィルタを畳み込んでいる。したがって、収音信号がユーザUの頭外に定位している。具体的には、収音信号の定位位置BがユーザUの後方であるため、後方に定位する頭外定位フィルタを用いて畳み込み処理を行う。 As described above, the out-of-head localization processing unit 102 convolves an out-of-head localization filter with the collected sound signal. Therefore, the collected sound signal is localized outside the user U's head. Specifically, since the localization position B of the collected sound signal is behind the user U, the convolution process is performed using an out-of-head localization filter that is localized backward.
 一方、再生信号に対しては、畳み込み処理が施されていない。すなわち、音楽プレーヤ103から再生される音楽には、フィルタ処理が施されていない。よって、音楽は通常通りに耳元で鳴るため、頭内に定位して聴こえる。すなわち、再生信号の定位位置AはユーザUの耳9L、9Rの近傍になっている。 On the other hand, the convolution processing is not applied to the reproduction signal. In other words, the music played from the music player 103 is not filtered. Therefore, music sounds in the ear as usual and can be localized and heard in the head. That is, the localization position A of the reproduction signal is in the vicinity of the user U's ears 9L and 9R.
 このように、再生信号の定位位置Aと収音信号の定位位置Bとが異なっている。周囲音のみにフィルタを畳み込むことにより、音楽プレーヤ103で再生される音楽は耳元で鳴るため、頭内に定位するが、周囲音はユーザUの頭部の後方に定位する。再生信号の音楽は耳元から聴こえるのに対して、収音信号の周囲音が後方から聴こえる。ユーザUにとって、音楽と周囲音が異なる位置で発生しているように聴こえる。したがって、収音信号の音量レベルが再生信号の音量レベルに対して低い場合であっても、ユーザUが周囲音を聴きやすくなる。また、合成部104は、音楽再生レベルに連動して周囲音のレベルを変化させ、周囲音が音楽に埋もれないようにしてもよい。 Thus, the localization position A of the reproduction signal and the localization position B of the sound pickup signal are different. By convolving the filter only with the ambient sound, the music reproduced by the music player 103 is heard at the ear and is localized in the head, but the ambient sound is localized behind the user U's head. While the music of the reproduction signal can be heard from the ear, the ambient sound of the collected signal can be heard from behind. For the user U, it can be heard that music and ambient sounds are generated at different positions. Therefore, even when the volume level of the collected sound signal is lower than the volume level of the reproduction signal, the user U can easily hear the ambient sound. Further, the synthesizing unit 104 may change the ambient sound level in conjunction with the music playback level so that the ambient sound is not buried in the music.
 周囲音のみを後方に定位させることにより、耳元で鳴っている音楽と周囲音との定位が分離される。このため、音楽再生中でも周囲音が聴き取りやすくなり、周囲音をより正確に察知することができる。一方、音楽と周囲音が同じ定位だと、マスキング効果により周囲音が聞き取りづらくなってしまう。定位位置が異なるため、大音量で音楽を再生している場合でも周囲音を聴き取ることが出来るようになる。換言すると、再生信号の音量レベルに対する収音信号の相対的な音量レベルが小さい場合であっても、ユーザUが周囲音を聴き取りやすい。よって、イヤホン105が収音信号を常時出力していたとしても、ユーザUが効果的に音楽を聴くことができる。さらに、周囲音が大きくなると、イヤホン105から出力される周囲音も大きくなる。よって、ユーザが適切に周囲音と音楽とを聴き分けることができる。 By localizing only the ambient sound backward, the localization of the music being played at the ear and the ambient sound is separated. For this reason, it is easy to hear the ambient sound even during music reproduction, and the ambient sound can be detected more accurately. On the other hand, if the music and ambient sound have the same localization, the ambient sound becomes difficult to hear due to the masking effect. Since the localization position is different, it is possible to hear ambient sounds even when music is being played at a high volume. In other words, even when the relative volume level of the collected sound signal with respect to the volume level of the reproduction signal is small, the user U can easily hear the ambient sound. Therefore, even if the earphone 105 always outputs a sound collection signal, the user U can effectively listen to music. Furthermore, when the ambient sound increases, the ambient sound output from the earphone 105 also increases. Therefore, the user can appropriately listen to ambient sounds and music.
 また、指向性マイク101が後方からの周囲音を収音する。よって、ユーザUに対して、後方への注意喚起を行うことができる。例えば、後方から自動車、自転車、バイクなどの車両が接近している場合、車両の接近音が周囲音としてイヤホン105の左右の出力ユニット105L、105Rから出力されている。そして、周囲音が後方に定位している。よって、後方からの車両の接近を正確に察知することができ、後方への注意を促すことができる。 Also, the directional microphone 101 collects ambient sounds from the rear. Therefore, it is possible to alert the user U to the rear. For example, when a vehicle such as an automobile, a bicycle, or a motorcycle is approaching from behind, the approach sound of the vehicle is output from the left and right output units 105L and 105R of the earphone 105 as an ambient sound. The ambient sound is localized backward. Therefore, it is possible to accurately detect the approach of the vehicle from the rear, and to prompt attention to the rear.
 遮音性の高いイヤホン105を用いた場合でも、ユーザUが適切に周囲音を聴くことができる。遮音性の高いイヤホン105を用いることができるため、音漏れを防止することができる。さらに、指向性マイク101をオフするなどしておけば、使用状況に応じて周囲音を聴こえなくすることも可能である。すなわち、周囲音を聞く必要がない場合は、マイク101をオフしておけばよい。あるいは、合成部104が周囲音の音量バランスを0にすればよい。室内で音楽を聴く場合などは、イヤホン105が周囲音を出力せずに、音楽のみ再生すればよい。 Even when the earphone 105 having high sound insulation is used, the user U can appropriately listen to the ambient sound. Since the earphone 105 with high sound insulation can be used, sound leakage can be prevented. Furthermore, if the directional microphone 101 is turned off, it is possible to prevent the ambient sound from being heard depending on the use situation. That is, when there is no need to hear ambient sounds, the microphone 101 may be turned off. Alternatively, the synthesis unit 104 may set the volume balance of ambient sounds to zero. For example, when listening to music indoors, the earphone 105 does not output ambient sound, and only the music is reproduced.
 頭外定位処理部102における定位処理について、図4を用いて説明する。図4は、定位処理に用いられるフィルタを生成する構成を示す概念図である。 The localization process in the out-of-head localization processing unit 102 will be described with reference to FIG. FIG. 4 is a conceptual diagram showing a configuration for generating a filter used for localization processing.
 また、受聴者1の後方には、ステレオスピーカ5が配置されている。ステレオスピーカ5は、左スピーカ5Lと右スピーカ5Rを有している。左スピーカ5Lは受聴者1の斜め左後方に設置され、右スピーカ5Rは、受聴者1の斜め右後方に設置されている。左スピーカ5Lと右スピーカ5Rは受聴者1に対して左右対称に配置されている。左スピーカ5Lと右スピーカ5Rは、インパルス応答測定を行うためのインパルス音等を出力する。 Further, a stereo speaker 5 is arranged behind the listener 1. The stereo speaker 5 has a left speaker 5L and a right speaker 5R. The left speaker 5L is installed diagonally left rear of the listener 1, and the right speaker 5R is installed diagonally right rear of the listener 1. The left speaker 5L and the right speaker 5R are disposed symmetrically with respect to the listener 1. The left speaker 5L and the right speaker 5R output an impulse sound or the like for performing impulse response measurement.
 受聴者1の左耳9Lにはマイク2Lが設置され、右耳9Rにはマイク2Rが設置されている。具体的には、左耳9L、右耳9Rの外耳道入口又は鼓膜位置にマイク2L、2Rを設置することが好ましい。マイク2L、2Rは、ステレオスピーカ5から出力された測定信号を収音して、収音信号を取得する。なお、受聴者1は、再生装置100を用いるユーザUであることが好ましいが、ユーザU以外の人での測定で得られた伝達特性を用いてもよい。さらに、受聴者1は、人でもよく、ダミーヘッドでもよい。すなわち、本実施形態において、受聴者1は人だけでなく、ダミーヘッドを含む概念である。 The microphone 2L is installed in the left ear 9L of the listener 1, and the microphone 2R is installed in the right ear 9R. Specifically, it is preferable to install microphones 2L and 2R at the ear canal entrance or the eardrum position of the left ear 9L and the right ear 9R. The microphones 2L and 2R collect the measurement signal output from the stereo speaker 5 and acquire the collected sound signal. Note that the listener 1 is preferably the user U who uses the playback apparatus 100, but transfer characteristics obtained by measurement with a person other than the user U may be used. Further, the listener 1 may be a person or a dummy head. That is, in this embodiment, the listener 1 is a concept including not only a person but also a dummy head.
 上記のように、左右のスピーカ5L、5Rで出力されたインパルス音をマイク2L、2Rで測定することでインパルス応答が測定される。インパルス応答測定に基づいて取得した収音信号はメモリなどに記憶される。これにより、左スピーカ5Lと左マイク2Lとの間の伝達特性Hls、左スピーカ5Lと右マイク2Rとの間の伝達特性Hlo、右スピーカ5Rと左マイク2Lとの間の伝達特性Hro、右スピーカ5Rと右マイク2Rとの間の伝達特性Hrsが測定される。すなわち、左スピーカ5Lから出力された測定信号を左マイク2Lが収音することで、伝達特性Hlsが取得される。左スピーカ5Lから出力された測定信号を右マイク2Rが収音することで、伝達特性Hloが取得される。右スピーカ5Rから出力された測定信号を左マイク2Lが収音することで、伝達特性Hroが取得される。右スピーカ5Rから出力された測定信号を右マイク2Rが収音することで、伝達特性Hrsが取得される。 As described above, the impulse response is measured by measuring the impulse sound output from the left and right speakers 5L and 5R with the microphones 2L and 2R. The collected sound signal acquired based on the impulse response measurement is stored in a memory or the like. Thereby, the transfer characteristic Hls between the left speaker 5L and the left microphone 2L, the transfer characteristic Hlo between the left speaker 5L and the right microphone 2R, the transfer characteristic Hro between the right speaker 5R and the left microphone 2L, and the right speaker A transfer characteristic Hrs between 5R and the right microphone 2R is measured. That is, the transfer characteristic Hls is acquired by the left microphone 2L collecting the measurement signal output from the left speaker 5L. The transfer characteristic Hlo is acquired by the right microphone 2R collecting the measurement signal output from the left speaker 5L. When the left microphone 2L collects the measurement signal output from the right speaker 5R, the transfer characteristic Hro is acquired. When the right microphone 2R collects the measurement signal output from the right speaker 5R, the transfer characteristic Hrs is acquired.
 このように測定された伝達特性Hls~Hrsに応じたフィルタが生成される。具体的には、伝達特性Hls~Hrsが所定のフィルタ長で切り出されて、頭外定位処理部102の畳み込み演算に用いられるフィルタとして生成される。また、頭部伝達関数(HRTF)を用いて、フィルタを生成してもよい。 A filter corresponding to the transfer characteristics Hls to Hrs measured in this way is generated. Specifically, the transfer characteristics Hls to Hrs are cut out with a predetermined filter length and generated as a filter used for the convolution calculation of the out-of-head localization processing unit 102. A filter may be generated using a head related transfer function (HRTF).
 本実施の形態では、指向性マイク101がモノラルマイクであるので、頭外定位処理部102は、モノラル信号である収音信号に、4つの伝達特性Hls~Hrsに応じたフィルタをそれぞれ畳み込む。頭外定位処理部102は、伝達特性Hlsが畳み込まれた収音信号と伝達特性Hroが畳み込まれた収音信号とを加算して、Lchの収音信号とする。伝達特性Hloが畳み込まれた収音信号と伝達特性Hrsが畳み込まれた収音信号とを加算して、Rchの収音信号とする。このように受聴者1の後方にステレオスピーカ5を設置した状態でのインパルス応答測定によって、ユーザUの後方に周囲音を定位させることができる。 In this embodiment, since the directional microphone 101 is a monaural microphone, the out-of-head localization processing unit 102 convolves the sound pickup signal, which is a monaural signal, with filters corresponding to the four transfer characteristics Hls to Hrs. The out-of-head localization processing unit 102 adds the sound collection signal in which the transfer characteristic Hls is convoluted and the sound collection signal in which the transfer characteristic Hro is convoluted to obtain an Lch sound collection signal. The collected sound signal with the transfer characteristic Hlo convoluted and the collected sound signal with the transfer characteristic Hrs convoluted are added to obtain an Rch sound collected signal. Thus, the ambient sound can be localized behind the user U by the impulse response measurement with the stereo speaker 5 installed behind the listener 1.
実施の形態2.
 本実施の形態にかかる再生装置について、図5は再生装置100に用いられるイヤホン105の構成を模式的に示す図である。なお、本実施の形態では、イヤホン105に装着された指向性マイク101が実施の形態1と異なっている。なお、指向性マイク101以外の構成については、実施の形態1と同様であるため、適宜説明を省略する。
Embodiment 2. FIG.
FIG. 5 is a diagram schematically showing the configuration of the earphone 105 used in the playback apparatus 100 for the playback apparatus according to the present embodiment. In the present embodiment, directional microphone 101 attached to earphone 105 is different from that in the first embodiment. Since the configuration other than the directional microphone 101 is the same as that of the first embodiment, description thereof will be omitted as appropriate.
 本実施の形態2では、図5に示すように、ネックバンド110に2つの指向性マイク101L、101Rに取り付けられている。指向性マイク101Lは、ネックバンド110の中央よりも左側に設けられ、指向性マイク101Rは右側に設けられている。指向性マイク101L、101Rは左右対称に取り付けられることが好ましい。なお、指向性マイク101L、101Rで収音された収音信号はステレオ信号となる。指向性マイク101L、101Rは、頭外定位処理部102に収音信号を出力する In the second embodiment, as shown in FIG. 5, two directional microphones 101L and 101R are attached to the neckband 110. The directional microphone 101L is provided on the left side of the center of the neckband 110, and the directional microphone 101R is provided on the right side. The directional microphones 101L and 101R are preferably attached symmetrically. Note that the collected sound signals collected by the directional microphones 101L and 101R are stereo signals. The directional microphones 101L and 101R output a sound collection signal to the out-of-head localization processing unit 102.
 頭外定位処理部102は、指向性マイク101Lからの収音信号には、伝達特性Hls、Hloをそれぞれ畳み込む。頭外定位処理部102は、指向性マイク101Rからの収音信号に対して、伝達特性Hro、Hrsをそれぞれ畳み込む。頭外定位処理部102は、伝達特性Hlsが畳み込まれた収音信号と伝達特性Hroが畳み込まれた収音信号とを加算して、Lchの収音信号とする。伝達特性Hloが畳み込まれた収音信号と伝達特性Hrsが畳み込まれた収音信号とを加算して、Rchの収音信号とする。本実施の形態においても実施の形態1と同様の効果を得ることができる。また、本実施の形態では、指向性マイク101L、101Rがステレオマイクであるため、周囲音を後方にステレオ音場で定位させることができる。 The out-of-head localization processing unit 102 convolves the transmission characteristics Hls and Hlo with the collected sound signal from the directional microphone 101L. The out-of-head localization processing unit 102 convolves the transfer characteristics Hro and Hrs with the sound pickup signal from the directional microphone 101R. The out-of-head localization processing unit 102 adds the sound collection signal in which the transfer characteristic Hls is convoluted and the sound collection signal in which the transfer characteristic Hro is convoluted to obtain an Lch sound collection signal. The collected sound signal with the transfer characteristic Hlo convoluted and the collected sound signal with the transfer characteristic Hrs convoluted are added to obtain an Rch sound collected signal. Also in the present embodiment, the same effect as in the first embodiment can be obtained. In the present embodiment, since directional microphones 101L and 101R are stereo microphones, ambient sounds can be localized backward in a stereo sound field.
実施の形態3.
 本実施の形態にかかる再生装置について、図6を用いて説明する。図6は、実施の形態3にかかる再生装置の構成を示す制御ブロック図である。本実施の形態では、非定常音検出部106が追加されている。なお、非定常音検出部106以外の構成については、実施の形態1、2と同様であるため説明を省略する。
Embodiment 3 FIG.
The playback apparatus according to this embodiment will be described with reference to FIG. FIG. 6 is a control block diagram illustrating the configuration of the playback apparatus according to the third embodiment. In the present embodiment, an unsteady sound detection unit 106 is added. Since the configuration other than the unsteady sound detection unit 106 is the same as that of the first and second embodiments, the description thereof is omitted.
 非定常音検出部106はユーザUの周囲で非定常音が発生しているか否かを検出する。非定常音としては、自動車、バイク、自転車等の乗り物の走行音が挙げられる。非定常音が発生した場合、非定常音検出部106は制御信号を合成部104に出力する。合成部104は、制御信号に応じて、周囲音と音楽の音量レベルを制御する。具体的には、合成部104は、非定常音の検出時に、周囲音の音量レベルを高くする。 The unsteady sound detection unit 106 detects whether or not unsteady sound is generated around the user U. Examples of the unsteady sound include running sounds of vehicles such as automobiles, motorcycles, and bicycles. When an unsteady sound is generated, the unsteady sound detection unit 106 outputs a control signal to the synthesis unit 104. The synthesizing unit 104 controls the volume levels of ambient sounds and music according to the control signal. Specifically, the synthesizing unit 104 increases the volume level of the ambient sound when detecting the unsteady sound.
 このようにすることで、非定常音の検出時のみ、高い音量レベルで周囲音を出力することができる。すなわち、非定常音が検出されていないときは、低い音量レベルで周囲音を出力することができる。イヤホン105は、通常、周囲音を低めに出力しているが、後方の非定常音を検出した場合に、周囲音の音量を上げて出力する。こうすることで、再生装置100が、非定常音を強調することができる。よって、後方への注意喚起を適切に行うことができる。 In this way, ambient sounds can be output at a high volume level only when unsteady sounds are detected. That is, when an unsteady sound is not detected, an ambient sound can be output at a low volume level. The earphone 105 normally outputs the ambient sound at a low level. However, when the rear unsteady sound is detected, the earphone 105 increases the volume of the ambient sound and outputs it. By doing so, the playback device 100 can emphasize the non-stationary sound. Therefore, it is possible to appropriately alert the rear.
 本実施の形態にかかる再生方法について、図7を用いて説明する。図7は非定常音検出部106における処理を示すフローチャートである。 The reproduction method according to this embodiment will be described with reference to FIG. FIG. 7 is a flowchart showing processing in the unsteady sound detection unit 106.
 まず、指向性マイク101が後方音を検出する(S11)。なお、指向性マイク101は実施の形態1のようにモノラルマイクであってもよいし、実施の形態2のようにステレオマイクであってもよい。次に、指向性マイク101からの収音信号を周波数領域に変換する(S12)。たとえば、収音信号を離散フーリエ変換することで、周波数領域に変換することができる。なお、収音信号は、頭外定位処理部102によって頭外定位処理する前の収音信号を用いることができる。 First, the directional microphone 101 detects a rear sound (S11). The directional microphone 101 may be a monaural microphone as in the first embodiment, or may be a stereo microphone as in the second embodiment. Next, the collected sound signal from the directional microphone 101 is converted into the frequency domain (S12). For example, the collected sound signal can be converted into the frequency domain by performing a discrete Fourier transform. As the sound pickup signal, a sound pickup signal before being subjected to the out-of-head localization processing by the out-of-head localization processing unit 102 can be used.
 非定常音検出部106は、周波数スペクトルの時間変化を監視する(S13)。そして、スペクトルが急激に変化したか否かを判定する(S14)。非定常音検出部106は、周波数スペクトルを連続して取得して、その変化を算出する。 The unsteady sound detection unit 106 monitors the time change of the frequency spectrum (S13). Then, it is determined whether or not the spectrum has changed abruptly (S14). The unsteady sound detection unit 106 obtains the frequency spectrum continuously and calculates the change.
 スペクトルが急激に変化した場合(S14のYES)、非定常音検出部106は、後方に非定常音が発生していると判断する(S15)。例えば、非定常音検出部106は、後方から車や自転車等の車両が接近していると判断する。すると、非定常音検出部106が制御信号を合成部104に出力して、合成部104が周囲の音量を上げる(S16)。 When the spectrum changes abruptly (YES in S14), the unsteady sound detection unit 106 determines that an unsteady sound is generated behind (S15). For example, the unsteady sound detection unit 106 determines that a vehicle such as a car or a bicycle is approaching from behind. Then, the unsteady sound detection unit 106 outputs a control signal to the synthesis unit 104, and the synthesis unit 104 increases the surrounding volume (S16).
 例えば、ユーザUに車両が接近している場合、スペクトルが大きく変化する。非定常音検出部106は、連続して取得された周波数スペクトルのピーク値を求める。そして、ピーク値に大きな変動があった場合、非定常音検出部106は、スペクトルが急激に変化したと判定する。連続する2つのスペクトルのピーク値の差分をしきい値と比較することで、急激に変動したか否かを判定する。 For example, when the vehicle approaches the user U, the spectrum changes greatly. The unsteady sound detection unit 106 obtains the peak value of the frequency spectrum acquired continuously. And when there is a big fluctuation | variation in a peak value, the unsteady sound detection part 106 determines with the spectrum having changed rapidly. By comparing the difference between the peak values of two consecutive spectra with a threshold value, it is determined whether or not there is a sudden change.
 スペクトルが急激に変化していない場合(S14のNO)、非定常音が発生していなので、ステップS12に戻る。非定常音検出部106は、周波数スペクトルを継続的に監視して、スペクトルに大きな変動があったか否かを判定する。このようにすることで、非定常音検出部106は非定常音の有無を検出することができる。そして、後方に非定常音が発生している場合に、周囲音の音量レベルを高くすることができる。よって、後方への注意喚起を適切に行うことができる。 If the spectrum does not change abruptly (NO in S14), the process returns to step S12 because an unsteady sound has occurred. The unsteady sound detection unit 106 continuously monitors the frequency spectrum and determines whether or not there is a large fluctuation in the spectrum. By doing in this way, the unsteady sound detection part 106 can detect the presence or absence of unsteady sound. And when the unsteady sound has generate | occur | produced back, the volume level of ambient sound can be made high. Therefore, it is possible to appropriately alert the rear.
実施の形態4.
 本実施の形態では、ステレオマイクを用いて非定常音の音源方向を特定している。そのため、図5に示すような、指向性マイク101L、101Rをネックバンド110に設けている。左右の指向性マイク101L、101Rからの収音信号に基づいて、非定常音検出部106が非定常音の方向を検出している。例えば、非定常音検出部106が、2つの収音信号の音量差や到達時間差を用いて、非定常音の音源方向を特定する。なお、再生装置100の基本的な構成、及び処理については、上記した実施の形態と同様であるため、説明を省略する。
Embodiment 4 FIG.
In this embodiment, the sound source direction of the unsteady sound is specified using a stereo microphone. For this reason, directional microphones 101L and 101R as shown in FIG. Based on the collected sound signals from the left and right directional microphones 101L and 101R, the unsteady sound detection unit 106 detects the direction of the unsteady sound. For example, the unsteady sound detection unit 106 specifies the sound source direction of the unsteady sound by using the volume difference or arrival time difference between the two collected sound signals. Note that the basic configuration and processing of the playback apparatus 100 are the same as those in the above-described embodiment, and thus description thereof is omitted.
 図8に、ユーザUを基準とする方向を示す。図8では、ユーザUを原点として、ユーザUの前方を0°、右方向を90°、後方を180°、左方向を270°としている。90°~120°をR3方向、120°~150°をR2方向、150°~180°をR1方向、180°~210°をL1方向、210°~240°をL2方向、240°~270°をL3方向とする。 FIG. 8 shows a direction based on the user U. In FIG. 8, the user U is the origin, the front of the user U is 0 °, the right direction is 90 °, the rear is 180 °, and the left direction is 270 °. 90 ° -120 ° in R3 direction, 120 ° -150 ° in R2 direction, 150 ° -180 ° in R1 direction, 180 ° -210 ° in L1 direction, 210 ° -240 ° in L2 direction, 240 ° -270 ° Is the L3 direction.
 非定常音検出部106は、方向別に音圧の変化を監視する。図9では、R2方向において、非定常音が大きく変化している。この場合、非定常音検出部106は、非定常音の音源方向をR2方向と特定する。そして、図10に示すように、収音信号の音源方向R2の音圧を上げて、強調する。このようにすることで、より周囲音を適切に聴くことができる。例えば、車両が接近している方向から車両の走行音が聴こえる。よって、より効果的にユーザが車両の接近を察知することができる。 The unsteady sound detector 106 monitors the change in sound pressure for each direction. In FIG. 9, the unsteady sound changes greatly in the R2 direction. In this case, the unsteady sound detection unit 106 specifies the sound source direction of the unsteady sound as the R2 direction. Then, as shown in FIG. 10, the sound pressure in the sound source direction R2 of the collected sound signal is increased and emphasized. By doing so, it is possible to listen to the ambient sound more appropriately. For example, the running sound of the vehicle can be heard from the direction in which the vehicle is approaching. Therefore, the user can perceive the approach of the vehicle more effectively.
 具体的には、非定常音検出部106が、音源方向に応じて、フィルタを切り替える。すなわち、頭外定位処理部102は、R3~L3の方向毎にフィルタを格納している。そして、頭外定位処理部102が強調する音源方向のフィルタを収音信号に畳み込めばよい。このようにすることで、音源方向に周囲音を定位させることができるため、音源方向を強調することができる。 Specifically, the unsteady sound detection unit 106 switches the filter according to the sound source direction. That is, the out-of-head localization processing unit 102 stores a filter for each direction of R3 to L3. A sound source direction filter emphasized by the out-of-head localization processing unit 102 may be convoluted with the collected sound signal. By doing so, the ambient sound can be localized in the sound source direction, so that the sound source direction can be emphasized.
 具体的には、音源がL3方向である場合、L3方向を強調するフィルタHls_L3、Hlo_L3、Hro_L3、Hrs_L3を収音信号に畳み込む。同様に、音源がL2方向である場合、L2方向を強調するフィルタHls_L2、Hlo_L2、Hro_L2、Hrs_L2を収音信号に畳み込む。音源がL1方向である場合、L1方向を強調するフィルタHls_L1、Hlo_L1、Hro_L1、Hrs_L1を収音信号に畳み込む。音源がR1方向である場合、R1方向を強調するフィルタHls_R1、Hlo_R1、Hro_R1、Hrs_R1を収音信号に畳み込む。音源がR2方向である場合、R2方向を強調するフィルタHls_R2、Hlo_R2、Hro_R2、Hrs_R2を収音信号に畳み込む。音源がR3方向である場合、R3方向を強調するフィルタHls_R3、Hlo_R3、Hro_R3、Hrs_R3とする。 Specifically, when the sound source is in the L3 direction, filters Hls_L3, Hlo_L3, Hro_L3, and Hrs_L3 that emphasize the L3 direction are convoluted with the collected sound signal. Similarly, when the sound source is in the L2 direction, filters Hls_L2, Hlo_L2, Hro_L2, and Hrs_L2 that emphasize the L2 direction are convoluted with the collected sound signal. When the sound source is in the L1 direction, filters Hls_L1, Hlo_L1, Hro_L1, and Hrs_L1 that emphasize the L1 direction are convolved with the collected sound signal. When the sound source is in the R1 direction, filters Hls_R1, Hlo_R1, Hro_R1, and Hrs_R1 that emphasize the R1 direction are convoluted with the collected sound signal. When the sound source is in the R2 direction, filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 that emphasize the R2 direction are convoluted with the collected sound signal. When the sound source is in the R3 direction, the filters Hls_R3, Hlo_R3, Hro_R3, and Hrs_R3 are used to emphasize the R3 direction.
 次に、図11を用いて、音源方向を特定するための処理について説明する。図11は、音源方向を特定するための処理を示すフローチャートである。非定常音検出部106が、一定時間毎に、指向性マイク101が収音信号を周波数領域に変換する(S21)。なお、非定常音検出部106は、複数の指向性マイク101が取得した収音信号をそれぞれ周波数領域に変換している。ここでは、頭外定位処理部102でフィルタが畳み込まれる前の収音信号が周波数領域に変換されている。 Next, processing for specifying the sound source direction will be described with reference to FIG. FIG. 11 is a flowchart showing a process for specifying the sound source direction. The non-stationary sound detection unit 106 converts the collected sound signal into the frequency domain at regular time intervals (S21). Note that the unsteady sound detection unit 106 converts the collected sound signals acquired by the plurality of directional microphones 101 into the frequency domain. Here, the collected sound signal before the filter is convoluted by the out-of-head localization processing unit 102 is converted into the frequency domain.
 次に、非定常音検出部106は、スペクトルの一部が大きく変化したか否かを判定する(S22)。スペクトルの一部が大きく変化していない場合(S22のNO)、ステップS21に戻る。スペクトルの一部が大きく変化した場合(S22のYES)、非定常音と判定する(S23)。 Next, the unsteady sound detection unit 106 determines whether or not a part of the spectrum has changed significantly (S22). If a part of the spectrum has not changed significantly (NO in S22), the process returns to step S21. When a part of the spectrum changes greatly (YES in S22), it is determined as an unsteady sound (S23).
 次に、非定常音検出部106は、左の指向性マイク101Lの音量と右の指向性マイク101Rの音量の差分Mを求める(S24)。非定常音検出部106は、指向性マイク101Lの収音信号のスペクトルと、指向性マイク101Rの収音信号のスペクトルの差分Mを求める。ここで、差分を求める周波数帯は特に限定されるものではない。例えば、スペクトルが大きく変化したピーク周波数のみの差分を算出してもよく、スペクトル全体の差分を算出してもよい。あるいは、非定常音検出部106は時間領域における左右の収音信号の差分を求めてもよい。 Next, the unsteady sound detection unit 106 obtains a difference M between the volume of the left directional microphone 101L and the volume of the right directional microphone 101R (S24). The unsteady sound detection unit 106 obtains a difference M between the spectrum of the collected sound signal of the directional microphone 101L and the spectrum of the collected sound signal of the directional microphone 101R. Here, the frequency band for obtaining the difference is not particularly limited. For example, the difference of only the peak frequency at which the spectrum has changed greatly may be calculated, or the difference of the entire spectrum may be calculated. Alternatively, the unsteady sound detection unit 106 may obtain the difference between the left and right collected sound signals in the time domain.
 そして、非定常音検出部106は、差分Mの値に応じて音源方向を特定する。例えば、差分Mが+6dB以上である場合、非定常音検出部106はL3方向に音源があると判定する(S25)。すなわち、指向性マイク101Lの音量が指向性マイク101Rの音量に比べてかなり大きい場合、非定常音検出部106は、非定常音の音源がL3方向にあると判定する。 And the unsteady sound detection unit 106 specifies the sound source direction according to the value of the difference M. For example, when the difference M is +6 dB or more, the unsteady sound detection unit 106 determines that there is a sound source in the L3 direction (S25). That is, when the volume of the directional microphone 101L is considerably larger than the volume of the directional microphone 101R, the unsteady sound detection unit 106 determines that the unsteady sound source is in the L3 direction.
 差分Mが+3dB以上、+6dB未満である場合、非定常音検出部106はL2方向に音源があると判定する(S26)。差分Mが0dB以上、+3dB未満である場合、非定常音検出部106はL1方向に音源があると判定する(S27)。差分Mが-3dB以上、0dB未満である場合、非定常音検出部106はR1方向に音源があると判定する(S28)。差分Mが-6dB以上、-3dB未満である場合、非定常音検出部106はR2方向に音源があると判定する(S29)。差分Mが-6dB未満である場合、非定常音検出部106はR3方向に音源があると判定する(S30)。 If the difference M is +3 dB or more and less than +6 dB, the unsteady sound detection unit 106 determines that there is a sound source in the L2 direction (S26). When the difference M is 0 dB or more and less than +3 dB, the unsteady sound detection unit 106 determines that there is a sound source in the L1 direction (S27). When the difference M is not less than −3 dB and less than 0 dB, the unsteady sound detection unit 106 determines that there is a sound source in the R1 direction (S28). When the difference M is not less than −6 dB and less than −3 dB, the unsteady sound detection unit 106 determines that there is a sound source in the R2 direction (S29). When the difference M is less than −6 dB, the unsteady sound detection unit 106 determines that there is a sound source in the R3 direction (S30).
 このように、非定常音検出部106はマイク音量の差分Mに応じて音源方向を特定している。具体的には、差分Mが大きいほど、非定常音検出部106は、左方向(270°)に音源があると判定する。なお、音量差に限らず、左右のマイクで収音した信号の位相差を用いて音源方向を特定してもよい。音量差(または音量比)と位相差とを組み合わせて音源方向を特定してもよい。 In this way, the unsteady sound detection unit 106 specifies the sound source direction according to the difference M in the microphone volume. Specifically, as the difference M is larger, the unsteady sound detection unit 106 determines that there is a sound source in the left direction (270 °). Note that the direction of the sound source may be specified using the phase difference between signals collected by the left and right microphones, not limited to the volume difference. The sound source direction may be specified by combining the volume difference (or volume ratio) and the phase difference.
 次に、音源方向に応じてフィルタを切り替える処理について、図12を用いて説明する。図12は、フィルタを切り替える処理を示すフローチャートである。 Next, the process of switching the filter according to the sound source direction will be described with reference to FIG. FIG. 12 is a flowchart illustrating processing for switching filters.
 非定常音を検出していない平常時は、頭外定位処理部102が後方正面のフィルタ(Hls、Hlo、Hro、Hrs)を収音信号に畳み込む(S41)。なお、後方正面のフィルタ(Hls、Hlo、Hro、Hrs)は、図4に示す測定により取得することができる。そして、上記したように、非定常音検出部106が非定常音を検出したか否かを判定する(S42)。非定常音を検出していない場合(S42のNO)、ステップS41に戻る。すなわち、頭外定位処理部102が引き続き、後方正面のフィルタ(Hls、Hlo、Hro、Hrs)を収音信号に畳み込む。 During normal times when no unsteady sound is detected, the out-of-head localization processing unit 102 convolves the rear front filters (Hls, Hlo, Hro, Hrs) with the collected sound signal (S41). In addition, the filter (Hls, Hlo, Hro, Hrs) of the front front can be acquired by the measurement shown in FIG. Then, as described above, it is determined whether or not the unsteady sound detection unit 106 has detected unsteady sound (S42). If an unsteady sound is not detected (NO in S42), the process returns to step S41. That is, the out-of-head localization processing unit 102 continues to convolve the rear front filters (Hls, Hlo, Hro, Hrs) into the collected sound signal.
 非定常音を検出した場合(S42のYES)、非定常音検出部106は、音源方向を判定する(S43)。すなわち、図11に示した処理により音源方向がL3~R3のいずれにあるかを非定常音検出部106が判定する。ステップS43は、図11に示す処理に対応している。 When an unsteady sound is detected (YES in S42), the unsteady sound detection unit 106 determines the sound source direction (S43). That is, the unsteady sound detection unit 106 determines which of the sound source directions is L3 to R3 by the process shown in FIG. Step S43 corresponds to the process shown in FIG.
 音源方向がL3方向である場合(S43のL3方向)、頭外定位処理部102は、L3方向のフィルタHls_L3、Hlo_L3、Hro_L3、Hrs_L3を畳み込む(S44)。音源方向がL2方向である場合(S43のL2方向)、頭外定位処理部102は、L2方向のフィルタHls_L2、Hlo_L2、Hro_L2、Hrs_L2を畳み込む(S45)。音源方向がL1方向である場合(S43のL1方向)、頭外定位処理部102は、L1方向のフィルタHls_L1、Hlo_L1、Hro_L1、Hrs_L1を畳み込む(S46)。 When the sound source direction is the L3 direction (L3 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_L3, Hlo_L3, Hro_L3, and Hrs_L3 in the L3 direction (S44). When the sound source direction is the L2 direction (L2 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_L2, Hlo_L2, Hro_L2, and Hrs_L2 in the L2 direction (S45). When the sound source direction is the L1 direction (L1 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_L1, Hlo_L1, Hro_L1, and Hrs_L1 in the L1 direction (S46).
 音源方向がR1方向である場合(S43のR1方向)、頭外定位処理部102は、R1方向のフィルタHls_R1、Hlo_R1、Hro_R1、Hrs_R1を畳み込む(S47)。音源方向がR2方向である場合(S43のR2方向)、頭外定位処理部102は、R2方向のフィルタHls_R2、Hlo_R2、Hro_R2、Hrs_R2を畳み込む(S48)。音源方向がR3方向である場合(S43のR3方向)、頭外定位処理部102は、R3方向のフィルタHls_R3、Hlo_R3、Hro_R3、Hrs_R3を畳み込む(S49)。 When the sound source direction is the R1 direction (R1 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_R1, Hlo_R1, Hro_R1, and Hrs_R1 in the R1 direction (S47). When the sound source direction is the R2 direction (R2 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction (S48). When the sound source direction is the R3 direction (R3 direction in S43), the out-of-head localization processing unit 102 convolves the filters Hls_R3, Hlo_R3, Hro_R3, and Hrs_R3 in the R3 direction (S49).
 このように、非定常音検出部106が非定常音の音源方向を特定する。そして、頭外定位処理部102は、方向毎に設定されたフィルタを畳み込む。このようにすることで、収音信号が音源方向に定位されるため、音源方向の非定常音を強調することができる。 Thus, the unsteady sound detection unit 106 identifies the sound source direction of the unsteady sound. Then, the out-of-head localization processing unit 102 convolves the filter set for each direction. By doing so, since the collected sound signal is localized in the sound source direction, it is possible to emphasize the unsteady sound in the sound source direction.
 なお、方向毎にフィルタを生成する方法について、図13を用いて説明する。図13は、フィルタを生成するための測定構成を模式的に示す図である。図13では、後方正面のフィルタHls、Hlo、Hro、Hrsと、R2方向のフィルタHls_R2、Hlo_R2、Hro_R2、Hrs_R2と、を生成するための測定構成が示されている。 A method for generating a filter for each direction will be described with reference to FIG. FIG. 13 is a diagram schematically illustrating a measurement configuration for generating a filter. FIG. 13 shows a measurement configuration for generating the filters Hls, Hlo, Hro, and Hrs in the rear front and the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction.
 図13に示すように、生成するフィルタに応じて、ステレオスピーカ5の設置位置を変えている。例えば、非定常音を検出していない平常時のフィルタHls、Hlo、Hro、Hrsを生成する場合、左スピーカ5Lと右スピーカ5Rとの中間点が180°の方向になっている。R2方向のフィルタHls_R2、Hlo_R2、Hro_R2、Hrs_R2を生成する場合、左スピーカ5Lと右スピーカ5Rとの中間点が135°の方向になっている。なお、ここでの角度は、図8で示した角度に対応している。 As shown in FIG. 13, the installation position of the stereo speaker 5 is changed according to the filter to be generated. For example, when generating normal filters Hls, Hlo, Hro, and Hrs that do not detect unsteady sound, the midpoint between the left speaker 5L and the right speaker 5R is in the direction of 180 °. When the filters Hls_R2, Hlo_R2, Hro_R2, and Hrs_R2 in the R2 direction are generated, the midpoint between the left speaker 5L and the right speaker 5R is in the direction of 135 °. Here, the angle corresponds to the angle shown in FIG.
 ステレオスピーカ5の位置を変えて測定を行うことで、方向別のフィルタを生成することができる。すなわち、受聴者1を回転中心として、左スピーカ5Lと右スピーカ5RをL3~R3方向に対応する角度だけ回転させる。これにより、方向別のフィルタを簡便に生成することができる。 By changing the position of the stereo speaker 5 and performing measurement, a filter for each direction can be generated. That is, the left speaker 5L and the right speaker 5R are rotated by an angle corresponding to the directions L3 to R3 with the listener 1 as the rotation center. Thereby, the filter according to direction can be easily generated.
 なお、上記の説明では、特定する音源方向をL3方向~R3方向の6つに分割したが、分割数は6個に限定されるものではない。音源方向の特定精度に応じて、音源方向を2以上に分割していればよい。例えば、音源方向の特定精度が低い場合、左斜め後方と右斜め後方の2方向にのみ音源方向を特定できればよい。もちろん、音源の特定方向は6以上であってもよく、6未満であってもよい。 In the above description, the specified sound source direction is divided into six directions from the L3 direction to the R3 direction, but the number of divisions is not limited to six. The sound source direction only needs to be divided into two or more according to the accuracy of the sound source direction. For example, when the accuracy of the sound source direction is low, it is sufficient that the sound source direction can be specified only in the two directions of the left diagonal rear and the right diagonal rear. Of course, the specific direction of the sound source may be 6 or more, or may be less than 6.
定位位置の実施例
 なお、上記の説明では収音信号(周囲音)を後方に定位し、再生信号(音楽)を耳元で鳴らす(頭内に定位させる)例について説明したが、定位位置は特に限定されるものではない。以下、定位位置A、Bの実施例について、図14~図16を用いて説明する。図14~図16はそれぞれ実施例1~3にかかる定位位置を示す図である。
Examples of localization positions In the above description, the example in which the sound pickup signal (ambient sound) is localized backward and the reproduction signal (music) is played at the ear (localization in the head) has been described. It is not limited. Examples of the localization positions A and B will be described below with reference to FIGS. 14 to 16 are diagrams showing localization positions according to Examples 1 to 3, respectively.
 図14に示す実施例1では、再生信号の定位位置AがユーザUの前方となっており、収音信号の定位位置BがユーザUの耳元になっている(頭内に定位している)。この場合、頭外定位処理部102が再生信号に対してのみ頭外定位処理を行う。すなわち、頭外定位処理部102が再生信号にのみフィルタを畳み込み、収音信号にはフィルタを畳み込まない。このようにすることで、図14に示すような定位位置A、Bを実現することができる。 In Example 1 shown in FIG. 14, the localization position A of the reproduction signal is in front of the user U, and the localization position B of the sound pickup signal is in the ear of the user U (localized in the head). . In this case, the out-of-head localization processing unit 102 performs out-of-head localization processing only on the reproduction signal. That is, the out-of-head localization processing unit 102 convolves the filter only with the reproduction signal and does not convolve the filter with the collected sound signal. By doing so, localization positions A and B as shown in FIG. 14 can be realized.
 図15に示す実施例2では、再生信号の定位位置AがユーザUの前方となっており、収音信号の定位位置BがユーザUの後方になっている。この場合、頭外定位処理部102が再生信号、及び収音信号の両方に対して頭外定位処理を行う。すなわち、頭外定位処理部102が再生信号に前方定位用のフィルタを畳み込み、収音信号には後方定位用のフィルタを畳み込む。このようにすることで、図15に示すような定位位置A、Bを実現することができる。なお、前方定位用のフィルタは、ステレオスピーカ5を受聴者1の前方に設置した測定に基づいて生成される。 15, the localization position A of the reproduction signal is in front of the user U, and the localization position B of the collected sound signal is in the rear of the user U. In this case, the out-of-head localization processing unit 102 performs out-of-head localization processing on both the reproduction signal and the collected sound signal. That is, the out-of-head localization processing unit 102 convolves a reproduced signal with a front localization filter, and convolves a sound localization signal with a rear localization filter. By doing so, localization positions A and B as shown in FIG. 15 can be realized. The front localization filter is generated based on a measurement in which the stereo speaker 5 is installed in front of the listener 1.
 図16に示す実施例3では、再生信号の定位位置AがユーザUの前方となっている。そして、非定常音の有無に応じて、収音信号の定位位置Bが変化している。非定常音が検出されていない場合、収音信号の定位位置Bが耳元になっている(頭内に定位している)。非定常音が検出された場合、収音信号の定位位置B’が後方になっている。すなわち、非定常音に応じて、フィルタを切り替えることで、収音信号の定位位置Bを変えている。このようにすることで、図16に示すような定位位置A、Bを実現することができる。 In Example 3 shown in FIG. 16, the localization position A of the reproduction signal is in front of the user U. Then, the localization position B of the collected sound signal changes depending on the presence or absence of the unsteady sound. When an unsteady sound is not detected, the localization position B of the collected sound signal is near the ear (localized in the head). When an unsteady sound is detected, the localization position B ′ of the collected sound signal is rearward. That is, the localization position B of the collected sound signal is changed by switching the filter according to the unsteady sound. By doing so, the localization positions A and B as shown in FIG. 16 can be realized.
 もちろん、再生信号の定位位置Aと収音信号の定位位置Bは上記したものに限定されるものではない。適切なフィルタを用いることで、任意の位置に定位させることができる。再生信号のみフィルタ処理を行ってもよく、収音信号のみフィルタ処理を行ってもよい。あるいは、再生信号と収音信号の両方にフィルタ処理を行ってもよい。また、非定常音の有無に応じて、頭外定位処理部102が再生信号の定位位置Aを切り替えてもよい。さらに、非定常音の有無に応じて、頭外定位処理部102が再生信号の定位位置A、及び収音信号の定位位置Bの両方を切り替えてもよい。 Of course, the localization position A of the reproduction signal and the localization position B of the sound pickup signal are not limited to those described above. By using an appropriate filter, it can be localized at an arbitrary position. Only the reproduction signal may be filtered, or only the collected sound signal may be filtered. Alternatively, filter processing may be performed on both the reproduction signal and the collected sound signal. Further, the out-of-head localization processing unit 102 may switch the localization position A of the reproduction signal according to the presence or absence of an unsteady sound. Furthermore, the out-of-head localization processing unit 102 may switch both the localization position A of the reproduction signal and the localization position B of the collected sound signal in accordance with the presence or absence of an unsteady sound.
 なお、上記の説明では、合成信号を出力する出力部をイヤホン105として説明したが、出力部はヘッドホンであってもよい。また、再生信号により再生される音は音楽プレーヤ103から出力される音楽として説明したが、動画と共に再生される再生音であってもよい。例えば、合成部104で収音信号に合成される再生信号は、映画の音声などに基づく再生信号であってもよい。さらに、マイクの数は3以上であってもよい。周囲音を収音するマイクは、指向性マイクに限られるものではない。例えば、音楽プレーヤ103としてスマートフォンのアプリを用いた場合、スマートフォンのマイクを用いて、周囲音を収音してもよい。 In the above description, the output unit that outputs the synthesized signal is described as the earphone 105, but the output unit may be a headphone. Moreover, although the sound reproduced by the reproduction signal has been described as music output from the music player 103, it may be a reproduction sound reproduced together with a moving image. For example, the reproduction signal combined with the collected sound signal by the synthesis unit 104 may be a reproduction signal based on movie sound or the like. Further, the number of microphones may be three or more. A microphone that collects ambient sounds is not limited to a directional microphone. For example, when a smartphone application is used as the music player 103, ambient sounds may be collected using a smartphone microphone.
 さらに、上記の処理は、スマートフォンなどの携帯端末で実行されてもよく、イヤホンやヘッドホンに内蔵されたDSP(Digital Signal Processor)等で実行されてもよい。もちろん、上記の処理に一部が携帯端末で実行され、残りがイヤホンやヘッドホンに内蔵されたDSPで実行させてもよい。スマートフォンなどの携帯端末が頭外定位処理を行う場合、指向性マイク101が携帯端末の音声入力端子に接続される。 Further, the above processing may be executed by a mobile terminal such as a smartphone, or may be executed by an earphone or a DSP (Digital Signal Processor) built in the headphones. Of course, a part of the above processing may be executed by the mobile terminal, and the rest may be executed by the DSP built in the earphone or the headphone. When a portable terminal such as a smartphone performs out-of-head localization processing, the directional microphone 101 is connected to the voice input terminal of the portable terminal.
 上記信号処理のうちの一部又は全部は、コンピュータプログラムによって実行されてもよい。上述したプログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体(例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ)、光磁気記録媒体(例えば光磁気ディスク)、CD-ROM(Read Only Memory)、CD-R、CD-R/W、半導体メモリ(例えば、マスクROM、PROM(Programmable ROM)、EPROM(Erasable PROM)、フラッシュROM、RAM(Random Access Memory))を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体(transitory computer readable medium)によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 Some or all of the above signal processing may be executed by a computer program. The programs described above can be stored using various types of non-transitory computer readable media and supplied to a computer. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)). In addition, the program may be supplied to a computer by various types of temporary computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
 以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は上記実施の形態に限られたものではなく、その要旨を逸脱しない範囲で種々変更可能であることは言うまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the above embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.
 この出願は、2016年3月24日に出願された日本出願特願2016-059755を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2016-059755 filed on Mar. 24, 2016, the entire disclosure of which is incorporated herein.
 本願は、頭外定位処理を行う再生装置に適用可能である。 This application is applicable to a playback device that performs out-of-head localization processing.
 U ユーザ
 1 受聴者
 2L 左マイク
 2R 右マイク
 5L 左スピーカ
 5R 右スピーカ
 9L 左耳
 9R 右耳
 101 指向性マイク
 101L 左の指向性マイク
 101R 右の指向性マイク
 102 頭外定位処理部
 103 音楽プレーヤ
 104 合成部
 105 イヤホン
 105L 左出力ユニット
 105R 右出力ユニット
 106 非定常音検出部
 110 ネックバンド
 
U user 1 listener 2L left microphone 2R right microphone 5L left speaker 5R right speaker 9L left ear 9R right ear 101 directional microphone 101L left directional microphone 101R right directional microphone 102 out-of-head localization processing unit 103 music player 104 synthesis Part 105 Earphone 105L Left output unit 105R Right output unit 106 Unsteady sound detection part 110 Neckband

Claims (6)

  1.  再生音を再生するための再生信号を出力する再生部と、
     ユーザの左右の耳に向けて音をそれぞれ出力する左右の出力ユニットを有する出力部と、
     周囲音を収音して、収音信号を取得する1つ以上のマイクと、
     前記収音信号及び前記再生信号の少なくとも一方に対して頭外定位処理を行うことで、前記出力部から出力される際の前記収音信号の定位位置と前記再生信号の定位位置とを異ならせる定位処理部と、
     前記収音信号と前記再生信号とを合成して、前記出力部に出力する合成部と、を備えた再生装置。
    A playback unit that outputs a playback signal for playing back the playback sound;
    An output unit having left and right output units that respectively output sound toward the left and right ears of the user;
    One or more microphones that pick up ambient sound and obtain a picked up signal;
    By performing out-of-head localization processing on at least one of the sound pickup signal and the reproduction signal, the localization position of the sound collection signal and the localization position of the reproduction signal when output from the output unit are made different. A localization processing unit;
    A reproduction apparatus comprising: a synthesis unit that synthesizes the collected sound signal and the reproduction signal and outputs the synthesized signal to the output unit.
  2.  前記定位処理部は、前記収音信号をユーザの後方に定位させる請求項1に記載の再生装置。 The playback apparatus according to claim 1, wherein the localization processing unit localizes the sound pickup signal to the rear of the user.
  3.  前記マイクが前記ユーザの後方の音を収音する指向性マイクである請求項1、又は2に記載の再生装置。 The playback apparatus according to claim 1 or 2, wherein the microphone is a directional microphone that collects sound behind the user.
  4.  前記収音信号に、非定常音が含まれるか否かを検出する非定常音検出部をさらに備え、
     前記非定常音検出部が、前記非定常音を検出した場合に、前記再生信号に対する前記収音信号の相対的な音量レベルを変化させる請求項1~3のいずれか1項に記載の再生装置。
    The sound collection signal further comprises an unsteady sound detection unit for detecting whether or not unsteady sound is included,
    The reproduction apparatus according to any one of claims 1 to 3, wherein the non-stationary sound detection unit changes a relative volume level of the collected sound signal with respect to the reproduction signal when the unsteady sound is detected. .
  5.  前記マイクが2つ以上設けられ、
     前記2つ以上のマイクが収音した収音信号に基づいて、前記収音信号に含まれる非定常音の音源方向を特定し、
     前記非定常音の音源方向に基づいて、前記収音信号の定位位置を変化させる請求項1~4のいずれか1項に記載の再生装置。
    Two or more microphones are provided,
    Based on the collected sound signals collected by the two or more microphones, specify the direction of the sound source of the unsteady sound included in the collected sound signal,
    The reproduction apparatus according to any one of claims 1 to 4, wherein a localization position of the collected sound signal is changed based on a sound source direction of the unsteady sound.
  6.  再生音を再生するための再生信号を出力するステップと、
     1つ以上のマイクを用いて周囲音を収音して、収音信号を取得するステップと、
     前記収音信号及び前記再生信号の少なくとも一方に対して定位処理を行うステップと、
     前記定位処理を行った後、前記収音信号と前記再生信号とを合成して、合成信号を生成するステップと、
     左右の出力ユニットを有する出力部から、ユーザの左右の耳に向けて、前記合成信号をそれぞれ出力するステップと、を備えた再生方法。
    Outputting a reproduction signal for reproducing the reproduction sound;
    Collecting ambient sounds using one or more microphones to obtain a collected signal;
    Performing localization processing on at least one of the collected sound signal and the reproduction signal;
    After performing the localization processing, synthesizing the sound pickup signal and the reproduction signal to generate a synthesized signal;
    A step of outputting the synthesized signal from an output unit having left and right output units toward the left and right ears of the user, respectively.
PCT/JP2017/001957 2016-03-24 2017-01-20 Playback apparatus and playback method WO2017163572A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-059755 2016-03-24
JP2016059755A JP2017175405A (en) 2016-03-24 2016-03-24 Device and method for playback

Publications (1)

Publication Number Publication Date
WO2017163572A1 true WO2017163572A1 (en) 2017-09-28

Family

ID=59901045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/001957 WO2017163572A1 (en) 2016-03-24 2017-01-20 Playback apparatus and playback method

Country Status (2)

Country Link
JP (1) JP2017175405A (en)
WO (1) WO2017163572A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022009722A1 (en) * 2020-07-09 2022-01-13 ソニーグループ株式会社 Acoustic output device and control method for acoustic output device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933184B (en) * 2020-09-29 2021-01-08 平安科技(深圳)有限公司 Voice signal processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007036610A (en) * 2005-07-26 2007-02-08 Yamaha Corp Sound production device
JP2013162332A (en) * 2012-02-06 2013-08-19 Nippon Sharyo Seizo Kaisha Ltd Headphone
JP2015198297A (en) * 2014-03-31 2015-11-09 株式会社東芝 Acoustic controller, electronic apparatus and acoustic control method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007036610A (en) * 2005-07-26 2007-02-08 Yamaha Corp Sound production device
JP2013162332A (en) * 2012-02-06 2013-08-19 Nippon Sharyo Seizo Kaisha Ltd Headphone
JP2015198297A (en) * 2014-03-31 2015-11-09 株式会社東芝 Acoustic controller, electronic apparatus and acoustic control method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022009722A1 (en) * 2020-07-09 2022-01-13 ソニーグループ株式会社 Acoustic output device and control method for acoustic output device

Also Published As

Publication number Publication date
JP2017175405A (en) 2017-09-28

Similar Documents

Publication Publication Date Title
US8199942B2 (en) Targeted sound detection and generation for audio headset
JP3435156B2 (en) Sound image localization device
JP6645437B2 (en) Sound reproduction device
RU2559742C2 (en) Method and apparatus for providing information on audio source through audio device
US6937737B2 (en) Multi-channel audio surround sound from front located loudspeakers
JP3435141B2 (en) SOUND IMAGE LOCALIZATION DEVICE, CONFERENCE DEVICE USING SOUND IMAGE LOCALIZATION DEVICE, MOBILE PHONE, AUDIO REPRODUCTION DEVICE, AUDIO RECORDING DEVICE, INFORMATION TERMINAL DEVICE, GAME MACHINE, COMMUNICATION AND BROADCASTING SYSTEM
JP3657120B2 (en) Processing method for localizing audio signals for left and right ear audio signals
WO2019017036A1 (en) Sound output device
JP6790654B2 (en) Filter generator, filter generator, and program
KR20180021368A (en) Sports headphones with situational awareness
JP2005223713A (en) Apparatus and method for acoustic reproduction
KR20060041735A (en) Sound pickup apparatus, sound pickup method, and recording medium
CN106792365B (en) Audio playing method and device
WO2019049409A1 (en) Audio signal processing device and audio signal processing system
WO2017163572A1 (en) Playback apparatus and playback method
US20150086023A1 (en) Audio control apparatus and method
JP2008228198A (en) Apparatus and method for adjusting playback sound
JP6500664B2 (en) Sound field reproduction apparatus, sound field reproduction method, and program
JP5281695B2 (en) Acoustic transducer
JP2019169835A (en) Out-of-head localization processing apparatus, out-of-head localization processing method, and program
US10212509B1 (en) Headphones with audio cross-connect
KR101526014B1 (en) Multi-channel surround speaker system
JPH06269097A (en) Acoustic equipment
US11228837B2 (en) Processing device, processing method, reproduction method, and program
JP2018139345A (en) Filter generation device, filter generation method, and program

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17769619

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17769619

Country of ref document: EP

Kind code of ref document: A1