EP4024705A1 - Sprachlautreaktionsvorrichtung und sprachlautreaktionsverfahren - Google Patents

Sprachlautreaktionsvorrichtung und sprachlautreaktionsverfahren Download PDF

Info

Publication number: EP4024705A1
Authority: EP; European Patent Office
Prior art keywords: volume; response; sound; environmental; processor
Prior art date: 2021-01-04
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Pending

Application number

EP21211929.1A

Other languages

English (en)

French (fr)

Inventor

Naoki Sekine

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Toshiba TEC Corp

Original Assignee

Toshiba TEC Corp

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2021-01-04

Filing date

2021-12-02

Publication date

2022-07-06

2021-12-02 Application filed by Toshiba TEC Corp filed Critical Toshiba TEC Corp

2022-07-06 Publication of EP4024705A1 publication Critical patent/EP4024705A1/de

Status Pending legal-status Critical Current

Links

230000004044 response Effects 0.000 title claims abstract description 307
238000000034 method Methods 0.000 title claims description 49
230000007613 environmental effect Effects 0.000 claims abstract description 147
230000006870 function Effects 0.000 claims description 135
238000004590 computer program Methods 0.000 claims description 3
238000012544 monitoring process Methods 0.000 claims description 2
230000008569 process Effects 0.000 description 32
238000012545 processing Methods 0.000 description 13
238000010586 diagram Methods 0.000 description 9
238000004364 calculation method Methods 0.000 description 5
238000001514 detection method Methods 0.000 description 4
238000013473 artificial intelligence Methods 0.000 description 2
239000000284 extract Substances 0.000 description 2
238000005259 measurement Methods 0.000 description 2
230000008859 change Effects 0.000 description 1
238000009434 installation Methods 0.000 description 1
238000012986 modification Methods 0.000 description 1
230000004048 modification Effects 0.000 description 1
239000007787 solid Substances 0.000 description 1
238000006467 substitution reaction Methods 0.000 description 1

Images

Classifications

- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03F—AMPLIFIERS
- H03F3/00—Amplifiers with only discharge tubes or only semiconductor devices as amplifying elements
- H03F3/181—Low-frequency amplifiers, e.g. audio preamplifiers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0324—Details of processing therefor
- G10L21/034—Automatic adjustment
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/20—Automatic control
- H03G3/30—Automatic control in amplifiers having semiconductor devices
- H03G3/32—Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/01—Aspects of volume control, not necessarily automatic, in sound systems

Definitions

the main storage device 12 functions as a memory that stores, for example, information indicating the volume of the environmental sound calculated from the sound collected with the microphone.
the main storage device 12 stores data of the speech sound obtained by processing the sound collected with the microphone 2 by the speech sound processing circuit 14. Further, the main storage device 12 may store the calculation result of the volume of the voice uttered by talker (input speech sound) included in the sound collected with the microphone 2.
the main storage device 12 may store the information indicating the volume of the response speech sound determined in response to the volume of the input speech sound and the volume of the environmental sound.
the processor 11 performs the process of generating the response content (response sentence) (ACTS 15 to 17) and the process of calculating the response volume (ACTS 18 to 19).
the processor 11 If the meaning of the voice uttered by the talker (input sentence) is analyzed, the processor 11 generates the response content (response sentence) with respect to the input sentence (ACT 17). For example, if the question content included in the input sentence is specified, the processor 11 generates the response sentence in response to the question content. If the request of the talker included in the input sentence is specified, the processor 11 generates the response sentence according to the request of the talker. If the greeting included in the input sentence is specified (if it is understood that the input sentence is a greeting from the talker), the processor 11 generates a response sentence as the greeting in response to the greeting from the talker.
the processor 11 performs the calculating process of the input volume V and the calculating process of the response volume as the process of calculating the response volume.
the processor 11 calculates the volume V of the talker's voice detected from the input sound (input speech sound) (ACT 18). For example, the processor 11 extracts the component of the talker's voice from the sound data of the input sound (input speech sound) and calculates the volume V of the extracted input speech sound (input volume).
the processor 11 generates the response waveform to be the response speech sound uttered from the speaker 3 based on the response sentence generated in ACT 17 and the response volume calculated in ACT 19 (ACT 20). For example, the processor 11 generates the response waveform for uttering the response sentence generated in ACT 17 as the response speech sound. The processor 11 adjusts the amplitude of the response waveform for uttering the generated response speech sound in response to the response volume calculated in ACT 19. If the response waveform is generated, the processor 11 outputs the generated response waveform from the speaker 3 (ACT 21).
FIGS. 7 and 8 are flowcharts for explaining the calculating process of the response volume in the speech sound response device 1 according to the embodiment.
the processor 11 acquires the input volume V in the present to be calculated in ACT 18 described above (ACT 31).
the processor 11 acquires the environmental volume S stored in the main storage device 12 or the auxiliary storage device 13 (ACT 32).
the processor 11 refers to the table illustrated in FIG. 5 and determines whether the environmental volume S is less than the threshold value Ts (ACT 33). If the environmental volume S is less than the threshold value Ts (S ⁇ Ts) (ACT 33, YES), the processor 11 applies the function FA (i.e., if the environmental volume S is low).
the function FA includes five functions FAa, FAb, FAc, FAd, and FAe separated by the threshold values Tva, Tvb, Tvc, and Tvd.
the processor 11 compares the input volume V with the threshold values Tva, Tvb, Tvc, and Tvd and selects one function from the functions FAa, FAb, FAc, FAd, and FAe.
the processor 11 determines whether the input volume V is less than the threshold value Tva (ACT 41). If it is determined that the input volume V is less than the threshold value Tva (ACT 41, YES), the processor 11 specifies that S ⁇ Ts, and V ⁇ Tva. If S ⁇ Ts, and V ⁇ Tva, the processor 11 selects the function FAa (ACT 42).
the processor 11 determines whether the input volume V is less than the threshold value Tvd (ACT 47). If it is determined that the input volume V is less than the threshold value Tvd (ACT 47, YES), the processor 11 specifies that S ⁇ Ts, and Tvc ⁇ V ⁇ Tvd. If S ⁇ Ts, and Tvc ⁇ V ⁇ Tvd, the processor 11 selects the function FAd (ACT 48).
the processor 11 If it is determined that the input volume V is not less than the threshold value Tvd (ACT 47, NO), since the input volume V is the threshold value Tvd or more, the processor 11 specifies that S ⁇ Ts, and Tvd ⁇ V. If S ⁇ Ts, and Tvd ⁇ V, the processor 11 selects the function FAe (ACT 49).
the processor 11 applies the function FB (i.e., when the environmental volume S is high).
the function FB includes four functions FBa, FBb, FBc, and FBd separated by the threshold values Tvi, Tvj, and Tvk with respect to the input volume V.
the processor 11 compares the input volume V with the threshold values Tvi, Tvj, and Tvk and selects one function from the functions FBa, FBb, FBc, and FBd.
the processor 11 determines whether the input volume V is less than the threshold value Tvk (ACT 55). If it is determined that the input volume V is less than the threshold value Tvk (ACT 55, YES), the processor 11 specifies that S ⁇ Ts, and Tvj ⁇ V ⁇ Tvk. If S ⁇ Ts, and Tvj ⁇ V ⁇ Tvk, the processor 11 selects the function FBc (ACT 56).
the processor 11 If it is determined that the input volume V is not less than the threshold value Tvk (ACT 55, NO), since the input volume V is the threshold value Tvk or more, the processor 11 specifies that S ⁇ Ts, and Tvk ⁇ V. If S ⁇ Ts, and Tvk ⁇ V, the processor 11 selects the function FBd (ACT 57).
the processor 11 determines the response speech sound based on the selected function (ACT 60). That is, the processor 11 calculates the response volume in response to the input volume V in the selected function. Accordingly, the processor 11 can add the environmental volume and calculate the response volume in response to the input volume.
the speech sound response device detects the voice uttered by the user from the sound input to the microphone.
the speech sound response device generates the response content (response sentence) to be output as the response speech sound with respect to the voice uttered by the user.
the speech sound response device calculates the response volume in response to the input volume as the volume of the voice uttered by the user and the volume of the environmental sound other than the voice uttered by the user.
the speech sound response device outputs the response speech sound from the speaker in the calculated response volume.
the speech sound response device can add the loudness of the environmental sound and output the response speech sound of the response volume in response to the input volume. Accordingly, it can be expected that the loudness of the voice uttered by the talker (user) is controlled in response to the volume of the response speech sound output by the speech sound response device.
the speech sound response device can guide the loudness of the voice uttered by the user to the volume appropriate for the speech sound recognition so that the speech sound recognition with the high accuracy can be realized.
the speech sound response device holds the plurality of functions selected in response to the loudness of the environmental volume.
the speech sound response device determines the volume of the response speech sound from the input volume based on the first function if the environmental volume is less than the threshold value and determines the volume of the response speech sound from the input volume based on the second function different from the first function if the environmental volume is less than the threshold value. Accordingly, the speech sound response device according to the embodiment can set the response volume in response to the loudness of the environmental sound. As a result, even in an environment where the environmental volume cannot be predicted in advance, the speech sound response device can guide the loudness of the voice uttered by the user to the volume appropriate for the speech sound recognition.
the speech sound response device stores the plurality of functions to be selected in response to the loudness of the environmental volume and the loudness of the input volume in the storage device.
the speech sound response device determines the volume of the response speech sound from the input volume based on one function selected in response to the environmental volume and the input volume from the plurality of functions. Accordingly, the speech sound response device can select the function in response to the environmental volume and the input volume and can guide the loudness of the voice uttered by the user to the volume appropriate for the speech sound recognition.
the program executed by the processor may be downloaded from the network to the device or may be installed from the storage medium to the device.
the storage medium may be a storage medium that can store a program such as a CD-ROM and can be read by the device. Further, the functions obtained by installation or download in advance may be realized in cooperation with the operating system (OS) or the like inside the device.
OS operating system

Landscapes

Engineering & Computer Science (AREA)
Multimedia (AREA)
Physics & Mathematics (AREA)
Human Computer Interaction (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Acoustics & Sound (AREA)
Computational Linguistics (AREA)
Theoretical Computer Science (AREA)
Signal Processing (AREA)
General Health & Medical Sciences (AREA)
General Engineering & Computer Science (AREA)
General Physics & Mathematics (AREA)
Quality & Reliability (AREA)
Power Engineering (AREA)
Telephone Function (AREA)
Circuit For Audible Band Transducer (AREA)

EP21211929.1A 2021-01-04 2021-12-02 Sprachlautreaktionsvorrichtung und sprachlautreaktionsverfahren Pending EP4024705A1 (de)

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
JP2021000096A JP2022105372A (ja)	2021-01-04	2021-01-04	音声応答装置、音声応答方法および音声応答プログラム

Publications (1)

Publication Number	Publication Date
EP4024705A1 true EP4024705A1 (de)	2022-07-06

Family

ID=78851165

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP21211929.1A Pending EP4024705A1 (de)	2021-01-04	2021-12-02	Sprachlautreaktionsvorrichtung und sprachlautreaktionsverfahren

Country Status (4)

Country	Link
US (1)	US20220215854A1 (de)
EP (1)	EP4024705A1 (de)
JP (1)	JP2022105372A (de)
CN (1)	CN114724537A (de)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20120271630A1 (en) *	2011-02-04	2012-10-25	Nec Corporation	Speech signal processing system, speech signal processing method and speech signal processing method program
US20200034108A1 (en) *	2018-07-25	2020-01-30	Sensory, Incorporated	Dynamic Volume Adjustment For Virtual Assistants
US20200075036A1 (en) *	2017-05-12	2020-03-05	Naver Corporation	User command processing method and system for adjusting output volume of sound to be output, on basis of input volume of received voice input
US20200388268A1 (en) *	2018-01-10	2020-12-10	Sony Corporation	Information processing apparatus, information processing system, and information processing method, and program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US9830924B1 (en) *	2013-12-04	2017-11-28	Amazon Technologies, Inc.	Matching output volume to a command volume
US9508344B2 (en) *	2014-10-15	2016-11-29	Delphi Technologies, Inc.	Automatic volume control based on speech recognition

2021
- 2021-01-04 JP JP2021000096A patent/JP2022105372A/ja active Pending
- 2021-10-08 CN CN202111169732.XA patent/CN114724537A/zh active Pending
- 2021-10-18 US US17/503,837 patent/US20220215854A1/en not_active Abandoned
- 2021-12-02 EP EP21211929.1A patent/EP4024705A1/de active Pending

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20120271630A1 (en) *	2011-02-04	2012-10-25	Nec Corporation	Speech signal processing system, speech signal processing method and speech signal processing method program
US20200075036A1 (en) *	2017-05-12	2020-03-05	Naver Corporation	User command processing method and system for adjusting output volume of sound to be output, on basis of input volume of received voice input
US20200388268A1 (en) *	2018-01-10	2020-12-10	Sony Corporation	Information processing apparatus, information processing system, and information processing method, and program
US20200034108A1 (en) *	2018-07-25	2020-01-30	Sensory, Incorporated	Dynamic Volume Adjustment For Virtual Assistants

Also Published As

Publication number	Publication date
CN114724537A (zh)	2022-07-08
JP2022105372A (ja)	2022-07-14
US20220215854A1 (en)	2022-07-07

Legal Events

Date	Code	Title	Description
2022-06-03	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2022-06-03	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED
2022-07-06	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2023-01-13	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE
2023-02-15	17P	Request for examination filed	Effective date: 20230109
2023-02-15	RBV	Designated contracting states (corrected)	Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

Publication	Publication Date	Title
US11037574B2 (en)	2021-06-15	Speaker recognition and speaker change detection
US20180275951A1 (en)	2018-09-27	Speech recognition device, speech recognition method and storage medium
JP2020012954A (ja)	2020-01-23	情報処理装置、情報処理方法、およびプログラム
JP6654611B2 (ja)	2020-02-26	成長型対話装置
JP2009192942A (ja)	2009-08-27	音声対話装置及び支援方法
JP4246703B2 (ja)	2009-04-02	自動音声認識の方法
US10861447B2 (en)	2020-12-08	Device for recognizing speeches and method for speech recognition
CN104240718A (zh)	2014-12-24	转录支持设备和方法
US20190180758A1 (en)	2019-06-13	Voice processing apparatus, voice processing method, and non-transitory computer-readable storage medium for storing program
JP6969491B2 (ja)	2021-11-24	音声対話システム、音声対話方法及びプログラム
JP2012163692A (ja)	2012-08-30	音声信号処理システム、音声信号処理方法および音声信号処理方法プログラム
EP4024705A1 (de)	2022-07-06	Sprachlautreaktionsvorrichtung und sprachlautreaktionsverfahren
JP2011039222A (ja)	2011-02-24	音声認識システム、音声認識方法および音声認識プログラム
JPH06236196A (ja)	1994-08-23	音声認識方法および装置
JP5961530B2 (ja)	2016-08-02	音響モデル生成装置とその方法とプログラム
JP2008028532A (ja)	2008-02-07	音声処理装置および音声処理方法
JPH0950288A (ja)	1997-02-18	音声認識装置及び音声認識方法
JP2003263193A (ja)	2003-09-19	音声認識システムで話者の交代を自動検出する方法
JPH11126093A (ja)	1999-05-11	音声入力調整方法および音声入力システム
US20200168221A1 (en)	2020-05-28	Voice recognition apparatus and method of voice recognition
US11195545B2 (en)	2021-12-07	Method and apparatus for detecting an end of an utterance
JP7323936B2 (ja)	2023-08-09	疲労推定装置
KR20140059662A (ko)	2014-05-16	음성인식 데이터 처리 장치 및 그 방법
US20220230656A1 (en)	2022-07-21	Server, terminal device, and method for online conferencing
CN111354358B (zh)	2023-04-25	控制方法、语音交互装置、语音识别服务器、存储介质和控制***