WO2010105396A1 - Apparatus and method for recognizing speech emotion change - Google Patents

Apparatus and method for recognizing speech emotion change Download PDF

Info

Publication number
WO2010105396A1
WO2010105396A1 PCT/CN2009/070801 CN2009070801W WO2010105396A1 WO 2010105396 A1 WO2010105396 A1 WO 2010105396A1 CN 2009070801 W CN2009070801 W CN 2009070801W WO 2010105396 A1 WO2010105396 A1 WO 2010105396A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
speech emotion
window
change
speaker
Prior art date
Application number
PCT/CN2009/070801
Other languages
French (fr)
Inventor
Yingliang Lu
Qing Guo
Bin Wang
Original Assignee
Fujitsu Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Limited filed Critical Fujitsu Limited
Priority to PCT/CN2009/070801 priority Critical patent/WO2010105396A1/en
Priority to CN2009801279599A priority patent/CN102099853B/en
Publication of WO2010105396A1 publication Critical patent/WO2010105396A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Definitions

  • the present invention relates to the field of speech signal processing and in particular to an apparatus and a method for recognizing a speech emotion change of a speaker from speech data of the speaker.
  • the speech emotion recognition technology may be applied to the field of human-machine interaction, and thus may greatly improve friendliness and accuracy of human-machine interaction.
  • the conventional solutions only focus on recognizing a speech emotion of a speaker by extracting speech emotion features such as pitch, energy and formant from speech data of the speaker.
  • speech emotion features of different speakers are different and even speech emotion features of the same speaker are also different at different time periods, it is difficult to accurately recognize speech emotions of personalized speech data in the conventional solutions.
  • the emotion change recognition from a speech of a speaker rather than the emotion recognition from a speech is more interesting in many applications.
  • a time point at which an emotion of an actor is changed from "calm” to "exciting” in a video is an appropriate time point of inserting an advertisement into the video. Therefore, in such applications, it is enough to accurately recognize a speech emotion change of a speaker from speech data of the speaker.
  • speech emotion recognition due to inaccuracy on speech emotion recognition in the conventional solutions, it is difficult to accurately recognize speech emotion changes of personalized speech data according to speech emotion recognition results of the conventional solutions.
  • an object of the invention is to provide an apparatus and a method for recognizing a speech emotion change of a speaker from speech data of the speaker, which are capable of providing good performance on speech emotion change recognition of personalized speech data.
  • an embodiment of the invention provides a method of recognizing a speech emotion change of a speaker from speech data of the speaker, which may comprise the following steps: a window dividing step of dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating step of calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing step of recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
  • an embodiment of the invention provides an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker, which may comprise: a window dividing means for dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating means for calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing means for recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
  • an embodiment of the invention provides a computer-readable storage medium with a computer program stored thereon, wherein said computer program, when being executed, causes a computer to execute the above method of recognizing a speech emotion change of a speaker from speech data of the speaker.
  • Figure 1 is a flow chart illustrating a method of recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention
  • Figure 2 is a flow chart illustrating an implementing example of the speech emotion change recognizing step S 130 of Figure 1 ;
  • Figure 3 schematically illustrates waveform graphs of two speech segments of speaker A extracted from dialogue data between speakers A and B;
  • Figure 4 schematically illustrates pitch change graphs respectively extracted from two speech segments of Figure 3
  • Figure 5 schematically illustrates a pitch change graph of two windows corresponding to two speech segments of Figure 3, where the window width is the minimum length of the two speech segments and the singularities are removed;
  • Figure 6 schematically illustrates a pitch change graph of many windows corresponding to two speech segments of Figure 3, where the window width is 10ms and the singularities are removed;
  • Figure 7 illustrates an exemplary structure of a speech emotion feature change database employed in the embodiment of the invention
  • Figure 8 is a block diagram illustrating a construction of an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention
  • Figure 9 is a block diagram illustrating an exemplary construction of the speech emotion change recognizing means 830 of Figure 8; and Figure 10 is a block diagram illustrating an exemplary construction of a computer in which the invention may be implemented.
  • Figure 1 is a flow chart illustrating a method of recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention.
  • the speech data of the speaker may be inputted via an external device such as a sound recording device, a phone, a PDA or the like.
  • the speech data of the speaker may be a whole piece of continuous speech data from the speaker, for example, an oral lecture made by a lecturer.
  • the speech data of the speaker may be constituted by one or more continuous speech segments of the speaker extracted from dialogue data of a plurality of speakers comprising the speaker, for example, one or more continuous speech segments of a customer extracted from telephone conversation data between the customer and a call center agent in the application of call center.
  • the discrimination of different speakers may be implemented using sndpeek or the like.
  • Figure 3 schematically illustrates waveform graphs of two speech segments (a) and (b) of speaker A extracted from dialogue data between speakers A and B.
  • the speech data of the speaker is constituted by two speech segments (a) and (b) of the speaker A.
  • the method may include a window dividing step SIlO, a window speech emotion feature calculating step S 120 and a speech emotion change recognizing step S 130.
  • the window dividing step SIlO the speech data of the speaker is divided into a plurality of windows by a window width.
  • the window width may be a predetermined time width such as 10ms, 100ms, Is or the like.
  • the window width may be a predetermined time width such as 10ms, 100ms, Is or the like, or may be determined by a larger one of the minimum length of the one or more continuous speech segments and a predetermined time width such as 10ms, 100ms, Is or the like.
  • the speech data of the speaker is constituted by one or more continuous speech segments of the speaker
  • one window covers only one speech segment at most, and when one speech segment can not be fully divided, the final reminder whose length is less than the window width may be omitted.
  • a speech emotion feature is calculated for each of the plurality of windows.
  • the speech emotion feature may comprise one or more of speech pitch, speech energy and speech speed.
  • an average value of the speech emotion features of respective feature extraction intervals in the window is calculated as the speech emotion feature of the window.
  • the feature extraction interval may be set to 10ms or another value depending on a specific design.
  • the speech emotion feature of the window may be calculated in another manner depending on a specific design.
  • speech emotion feature singularities are removed from the speech emotion features of respective feature extraction intervals in the window.
  • the speech emotion feature singularities refer to those feature values equal to or approximate to zero (for example, caused by a silence period or the like), those feature values having a large fluctuation compared with their neighboring feature values (for example, caused by a noise or the like), and so on.
  • the window may be removed.
  • the calculated speech emotion features of the respective windows are schematically shown in Figure 6, wherein one point in the time axis represents one window and those windows whose speech emotion features are equal to or approximate to zero are removed.
  • the speech emotion change of the speaker for a window set consisting of at least two contiguous windows is recognized by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
  • the window set may include a predetermined number of windows, and may be sequentially selected with a moving step whose window number is less than the predetermined number.
  • the window set may include all the windows of two successive speech segments, and may be sequentially selected with a move step of one speech segment.
  • one type of speech emotion change may have a predetermined number of speech emotion feature change templates, each speech emotion feature change template associates one or more representative speech emotion feature change curves (e.g., speech pitch change curve, speech energy change curve, or the like) with one type of speech emotion change, and the speech emotion feature change templates may be generated in advance through a clustering algorithm by statistical analysis of a large corpus of representative speech data from different speakers.
  • each speech emotion feature change template associates one or more representative speech emotion feature change curves (e.g., speech pitch change curve, speech energy change curve, or the like) with one type of speech emotion change
  • the speech emotion feature change templates may be generated in advance through a clustering algorithm by statistical analysis of a large corpus of representative speech data from different speakers.
  • Figure 7 illustrates an exemplary structure of a speech emotion feature change database employed in the embodiment of the invention.
  • the speech emotion feature change database includes the following two tables: a speech motion feature change type table (a) and a speech emotion feature template table (b).
  • the speech motion feature change type table (a) in Figure 7 has two field of "Change type ID” and "Change type name” and schematically shows four types of exemplary speech emotion changes: "Calm -> Angry”, “Angry -> Calm”, “Calm -> Happy”, and
  • the speech emotion feature template table (b) in Figure 7 has three fields of "ID”, "Feature value (pitch)" and “Change type ID” and schematically shows one exemplary speech emotion feature curve associated with the speech emotion change of "Calm -> Angry".
  • Figure 2 is a flow chart illustrating an implementing example of the speech emotion change recognizing step S 130 of Figure 1.
  • the speech emotion features of the window set are normalized.
  • the Euclidean distance calculating step S220 an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database is calculated.
  • a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold is determined as the matching speech emotion feature change template.
  • the exemplary speech emotion change template in the speech emotion change template table (b) of Figure 7 is determined as the matching speech emotion feature change template of the speech data in Figure 3 through the above matching process, and thus the speech emotion feature change of the speech data in Figure 3 is recognized as "Calm -> Angry".
  • the speech emotion change recognizing step S 130 in Figure 1 may be performed only if there is any one of speech emotion feature changes between neighboring windows in the window set exceeding a predetermined threshold.
  • the method may further comprise a speech emotion recognizing step of recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
  • a speech emotion recognizing step of recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set. For example, when the speech emotion feature change of the speech data in Figure 3 is recognized as "Calm -> Angry", the speech emotion features of respective windows of the speech segment (a) may be recognized as "Calm” and the speech emotion features of respective windows of the speech segment (b) may be recognized as "Angry”.
  • Figure 8 is a block diagram illustrating a construction of an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention.
  • the apparatus 800 may include a window dividing means 810, a window speech emotion feature calculating means 820 and a speech emotion change recognizing means 830.
  • the window dividing means 810 may divide the speech data of the speaker into a plurality of windows by a window width.
  • the window speech emotion feature calculating means 820 may calculate a speech emotion feature for each of the plurality of windows.
  • the speech emotion change recognizing means 830 may recognize the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
  • Figure 9 is a block diagram illustrating an exemplary construction of the speech emotion change recognizing means 830 of Figure 8.
  • the speech emotion change recognizing means 830 may include a normalizing means 910, an Euclidean distance calculating means 920 and a determining means 930.
  • the normalizing means 910 may normalize the speech emotion features of the window set.
  • the Euclidean distance calculating means 920 may calculate an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database.
  • the determining means 930 may determine a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold as the matching speech emotion feature change template.
  • the apparatus 800 may further comprise a speech emotion recognizing means for recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
  • a speech emotion recognizing means for recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
  • the above apparatus and method for recognizing a speech emotion change of a speaker from speech data of the speaker may be applied to many applications.
  • a speech emotion change recognition result of a customer may be provided to a call center agent in the form of speech or image during the telephone conversion between the customer and the call center agent so that the call center agent may respond to the speech emotion change of the customer appropriately and rapidly.
  • the desired contents of the lecture can be extracted according to a speech emotion change recognition result of the lecturer. For example, the portions of the lecture which exhibit the speech emotion of "sad" may be filtered out so as to extract the optimistic contents of the lecture.
  • the above method and apparatus may be implemented by hardware.
  • Such hardware may be a single processing device or a plurality of processing devices.
  • Such processing device may be a microprocessor, a microcontroller, a digital processor, a microcomputer, a part of a central processing unit, a state machine, a logic circuit and/or any device capable of manipulating a signal.
  • the above method and apparatus may be implemented by either software or firmware.
  • a program that constitutes the software is installed, from a storage medium or a network, into a computer having a dedicated hardware configuration, e. g., a general - purpose personal computer 1000 as illustrated in Figure 10, that when various programs are installed therein, becomes capable of performing various functions, or the like.
  • a central processing unit (CPU) 1001 performs various processes in accordance with a program stored in a read only memory (ROM) 1002 or a program loaded from a storage section 1008 to a random access memory (RAM) 1003.
  • ROM read only memory
  • RAM random access memory
  • data required when the CPU 1001 performs the various processes or the like is also stored as required.
  • the CPU 1001, the ROM 1002 and the RAM 1003 are connected to one another via a bus 1004.
  • An input / output interface 1005 is also connected to the bus 1004.
  • the following components are connected to input / output interface 1005:
  • An input section 1006 including a keyboard, a mouse, or the like;
  • An output section 1007 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like;
  • the storage section 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like.
  • the communication section 1009 performs a communication process via the network such as the internet.
  • a drive 1010 is also connected to the input / output interface 1005 as required.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto - optical disk, a semiconductor memory, or the like, is mounted on the drive 1010 as required, so that a computer program read therefrom is installed into the storage section 1008 as required.
  • the program that constitutes the software is installed from the network such as the internet or the storage medium such as the removable medium 1011.
  • the network such as the internet or the storage medium such as the removable medium 1011.
  • this storage medium is not limit to the removable medium 1011 having the program stored therein as illustrated in Figure 10, which is delivered separately from the device for providing the program to the user.
  • the removable medium 1011 examples include the magnetic disk (including a floppy disk (register trademark )), the optical disk (including a compact disk - read only memory (CD-ROM) and a digital versatile disk (DVD)), the magneto - optical disk
  • the storage medium may be the ROM 1002, the hard disk contained in the storage section 1008, or the like, which have the program stored therein and is delivered to the user together with the device that containing them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An apparatus and a method for recognizing a speech emotion change of a speaker from speech data of the speaker are provided, wherein the method comprises the following steps: a window dividing step (S110) of dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating step (S120) of calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing step (S130) of recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.

Description

APPARATUS AND METHOD FOR RECOGNIZING SPEECH EMOTION CHANGE
Field of the Invention The present invention relates to the field of speech signal processing and in particular to an apparatus and a method for recognizing a speech emotion change of a speaker from speech data of the speaker.
Background of the Invention At present, it has become important to analyze speech data of a speaker to recognize a speech emotion of the speaker. For example, the speech emotion recognition technology may be applied to the field of human-machine interaction, and thus may greatly improve friendliness and accuracy of human-machine interaction.
Thus, various solutions for recognizing a speech emotion of a speaker from speech data of the speaker have been proposed in the prior art. For example, please see Japanese Patent Application Laid-Open No. 2008-076905 and Chinese Patent Application No. 200610097301.6.
The conventional solutions only focus on recognizing a speech emotion of a speaker by extracting speech emotion features such as pitch, energy and formant from speech data of the speaker. However, because speech emotion features of different speakers are different and even speech emotion features of the same speaker are also different at different time periods, it is difficult to accurately recognize speech emotions of personalized speech data in the conventional solutions.
On the other hand, the emotion change recognition from a speech of a speaker rather than the emotion recognition from a speech is more interesting in many applications. For example, in the application of video advertising, a time point at which an emotion of an actor is changed from "calm" to "exciting" in a video is an appropriate time point of inserting an advertisement into the video. Therefore, in such applications, it is enough to accurately recognize a speech emotion change of a speaker from speech data of the speaker. However, due to inaccuracy on speech emotion recognition in the conventional solutions, it is difficult to accurately recognize speech emotion changes of personalized speech data according to speech emotion recognition results of the conventional solutions. Summary of the Invention
Summary of the invention will be given below to provide basic understanding of some aspects of the invention. It shall be appreciated that this summary is neither exhaustively descriptive of the invention nor intended to define essential or important parts or the scope of the invention, but is merely for the purpose of presenting some concepts in a simplified form and hereby acts as a preamble of detailed description which will be discussed later.
In view of the above circumstances in the prior art, an object of the invention is to provide an apparatus and a method for recognizing a speech emotion change of a speaker from speech data of the speaker, which are capable of providing good performance on speech emotion change recognition of personalized speech data.
In order to achieve the above object, an embodiment of the invention provides a method of recognizing a speech emotion change of a speaker from speech data of the speaker, which may comprise the following steps: a window dividing step of dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating step of calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing step of recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
Furthermore, an embodiment of the invention provides an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker, which may comprise: a window dividing means for dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating means for calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing means for recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set. Furthermore, an embodiment of the invention provides a computer-readable storage medium with a computer program stored thereon, wherein said computer program, when being executed, causes a computer to execute the above method of recognizing a speech emotion change of a speaker from speech data of the speaker.
According to the above technical solutions of the invention, in view of the fact that a change of speech emotion such as "happy", "angry", "sad", "joy", "fearsome" and the like is always accompanied by a substantial change of speech emotion feature such as speech pitch, speech energy, speech speed or the like, by directly analyzing a speech emotion feature change in speech data of a speaker, it is possible to accurately recognize a speech emotion change of the speaker from speech data of the speaker. These and other advantages of the invention will become more apparent from the following detailed descriptions of preferred embodiments of the invention taken in conjunction with the drawings.
Brief Description of the Drawings The invention can be better understood with reference to the description given below in conjunction with the accompanying drawings, throughout which identical or like components are denoted by identical or like reference signs, and together with which the following detailed description are incorporated into and form a part of the specification and serve to further illustrate preferred embodiments of the invention and to explain principles and advantages of the invention. In the drawings:
Figure 1 is a flow chart illustrating a method of recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention;
Figure 2 is a flow chart illustrating an implementing example of the speech emotion change recognizing step S 130 of Figure 1 ;
Figure 3 schematically illustrates waveform graphs of two speech segments of speaker A extracted from dialogue data between speakers A and B;
Figure 4 schematically illustrates pitch change graphs respectively extracted from two speech segments of Figure 3; Figure 5 schematically illustrates a pitch change graph of two windows corresponding to two speech segments of Figure 3, where the window width is the minimum length of the two speech segments and the singularities are removed;
Figure 6 schematically illustrates a pitch change graph of many windows corresponding to two speech segments of Figure 3, where the window width is 10ms and the singularities are removed;
Figure 7 illustrates an exemplary structure of a speech emotion feature change database employed in the embodiment of the invention; Figure 8 is a block diagram illustrating a construction of an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention;
Figure 9 is a block diagram illustrating an exemplary construction of the speech emotion change recognizing means 830 of Figure 8; and Figure 10 is a block diagram illustrating an exemplary construction of a computer in which the invention may be implemented.
Detailed Description of the Invention
Exemplary embodiments of the present invention will be described in conjunction with the accompanying drawings hereinafter. For the sake of clarity and conciseness, not all the features of actual implementations are described in the specification. However, it is to be appreciated that, during developing any of such actual implementations, numerous implementation-specific decisions must be made to achieve the developer's specific goals. It shall further be noted that only device structures and/or processing steps closely relevant to solutions of the invention will be illustrated in the drawings while omitting other details less relevant to the invention so as not to obscure the invention due to those unnecessary details.
Figure 1 is a flow chart illustrating a method of recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention. The speech data of the speaker may be inputted via an external device such as a sound recording device, a phone, a PDA or the like. Further, the speech data of the speaker may be a whole piece of continuous speech data from the speaker, for example, an oral lecture made by a lecturer. Alternatively, the speech data of the speaker may be constituted by one or more continuous speech segments of the speaker extracted from dialogue data of a plurality of speakers comprising the speaker, for example, one or more continuous speech segments of a customer extracted from telephone conversation data between the customer and a call center agent in the application of call center. Herein, the discrimination of different speakers may be implemented using sndpeek or the like.
For example, Figure 3 schematically illustrates waveform graphs of two speech segments (a) and (b) of speaker A extracted from dialogue data between speakers A and B. In this case, the speech data of the speaker is constituted by two speech segments (a) and (b) of the speaker A.
As illustrated in Figure 1, the method may include a window dividing step SIlO, a window speech emotion feature calculating step S 120 and a speech emotion change recognizing step S 130. First, in the window dividing step SIlO, the speech data of the speaker is divided into a plurality of windows by a window width. When the speech data of the speaker is a whole piece of continuous speech data from the speaker, the window width may be a predetermined time width such as 10ms, 100ms, Is or the like. When the speech data of the speaker is constituted by one or more continuous speech segments of the speaker, the window width may be a predetermined time width such as 10ms, 100ms, Is or the like, or may be determined by a larger one of the minimum length of the one or more continuous speech segments and a predetermined time width such as 10ms, 100ms, Is or the like.
Generally, when the speech data of the speaker is constituted by one or more continuous speech segments of the speaker, one window covers only one speech segment at most, and when one speech segment can not be fully divided, the final reminder whose length is less than the window width may be omitted.
Next, in the window speech emotion feature calculating step S 120, a speech emotion feature is calculated for each of the plurality of windows. Preferably, the speech emotion feature may comprise one or more of speech pitch, speech energy and speech speed. However, those skilled in the art should appreciate that the present invention is not limited thereto and the other speech emotion features such as formant or the like are also applicable to the present invention.
Preferably, in the window speech emotion feature calculating step S 120, an average value of the speech emotion features of respective feature extraction intervals in the window is calculated as the speech emotion feature of the window. Herein, the feature extraction interval may be set to 10ms or another value depending on a specific design. Further, those skilled in the art should appreciate that the speech emotion feature of the window may be calculated in another manner depending on a specific design. Further preferably, in order to more accurately calculate the speech emotion feature of the window, before the above average value calculating process, speech emotion feature singularities are removed from the speech emotion features of respective feature extraction intervals in the window. Herein, the speech emotion feature singularities refer to those feature values equal to or approximate to zero (for example, caused by a silence period or the like), those feature values having a large fluctuation compared with their neighboring feature values (for example, caused by a noise or the like), and so on.
Further preferably, when the calculated speech emotion feature of a window is equal to or approximate to zero (for example, the window only contain a silence period), the window may be removed.
For example, assuming that speech pitch is adopted as the speech emotion feature, taking the speech data of a speaker constituted by the speech segments (a) and (b) shown in Figure 3 as an example, the pitch graphs respectively corresponding to the speech segments (a) and (b) are schematically shown in Figure 4. When the window width is set to the minimum length of the speech segments (a) and (b), the calculated speech emotion features of the light-colored window corresponding to the speech segment (a) and the dark-colored window corresponding to the speech segment (b) is schematically shown in Figure 5. When the window width is set to a predetermined time width of 10ms, the calculated speech emotion features of the respective windows are schematically shown in Figure 6, wherein one point in the time axis represents one window and those windows whose speech emotion features are equal to or approximate to zero are removed.
Finally, in the speech emotion change recognizing step S 130, the speech emotion change of the speaker for a window set consisting of at least two contiguous windows is recognized by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set. Generally, the window set may include a predetermined number of windows, and may be sequentially selected with a moving step whose window number is less than the predetermined number. Preferably, when the speech data of the speaker is constituted by at least two continuous speech segments of the speaker, the window set may include all the windows of two successive speech segments, and may be sequentially selected with a move step of one speech segment. Further, for example, in a particular implementation of the speech emotion feature change database, one type of speech emotion change may have a predetermined number of speech emotion feature change templates, each speech emotion feature change template associates one or more representative speech emotion feature change curves (e.g., speech pitch change curve, speech energy change curve, or the like) with one type of speech emotion change, and the speech emotion feature change templates may be generated in advance through a clustering algorithm by statistical analysis of a large corpus of representative speech data from different speakers.
Figure 7 illustrates an exemplary structure of a speech emotion feature change database employed in the embodiment of the invention. As shown in Figure 7, the speech emotion feature change database includes the following two tables: a speech motion feature change type table (a) and a speech emotion feature template table (b).
The speech motion feature change type table (a) in Figure 7 has two field of "Change type ID" and "Change type name" and schematically shows four types of exemplary speech emotion changes: "Calm -> Angry", "Angry -> Calm", "Calm -> Happy", and
"Happy -> Calm". The speech emotion feature template table (b) in Figure 7 has three fields of "ID", "Feature value (pitch)" and "Change type ID" and schematically shows one exemplary speech emotion feature curve associated with the speech emotion change of "Calm -> Angry". Those skilled in the art should appreciate that the structure of the speech emotion feature change database in Figure 7 is only exemplary and the present invention is not limited thereto, and another structure may be adopted for the speech emotion feature change database depending on a specific design.
Further, the process in the speech emotion change recognizing step S 130 may be implemented by various matching algorithms. For example, Figure 2 is a flow chart illustrating an implementing example of the speech emotion change recognizing step S 130 of Figure 1. As shown in Figure 2, at the normalizing step S210, the speech emotion features of the window set are normalized. Next, at the Euclidean distance calculating step S220, an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database is calculated. Then, at the determining step S230, a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold is determined as the matching speech emotion feature change template. For example, the exemplary speech emotion change template in the speech emotion change template table (b) of Figure 7 is determined as the matching speech emotion feature change template of the speech data in Figure 3 through the above matching process, and thus the speech emotion feature change of the speech data in Figure 3 is recognized as "Calm -> Angry".
Preferably, in order to enhance the matching performance, the speech emotion change recognizing step S 130 in Figure 1 may be performed only if there is any one of speech emotion feature changes between neighboring windows in the window set exceeding a predetermined threshold.
Optionally, the method may further comprise a speech emotion recognizing step of recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set. For example, when the speech emotion feature change of the speech data in Figure 3 is recognized as "Calm -> Angry", the speech emotion features of respective windows of the speech segment (a) may be recognized as "Calm" and the speech emotion features of respective windows of the speech segment (b) may be recognized as "Angry".
The method of recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention has been detailed above with reference to the drawings. An apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention will be described below with reference to the drawings.
Figure 8 is a block diagram illustrating a construction of an apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker according to an embodiment of the invention. As shown in Figure 8, the apparatus 800 may include a window dividing means 810, a window speech emotion feature calculating means 820 and a speech emotion change recognizing means 830.
The window dividing means 810 may divide the speech data of the speaker into a plurality of windows by a window width.
The window speech emotion feature calculating means 820 may calculate a speech emotion feature for each of the plurality of windows.
The speech emotion change recognizing means 830 may recognize the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
Figure 9 is a block diagram illustrating an exemplary construction of the speech emotion change recognizing means 830 of Figure 8. In the example, the speech emotion change recognizing means 830 may include a normalizing means 910, an Euclidean distance calculating means 920 and a determining means 930. The normalizing means 910 may normalize the speech emotion features of the window set. The Euclidean distance calculating means 920 may calculate an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database. The determining means 930 may determine a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold as the matching speech emotion feature change template.
Optionally, the apparatus 800 may further comprise a speech emotion recognizing means for recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
How to implement the functions of the respective components of the apparatus 800 in Figure 8 can be apparent upon the review of the descriptions of the corresponding processes as presented above, and therefore repeated descriptions thereof will be omitted here. As can be apparent from the above, according to the technical solution of the present invention, it is possible to accurately recognize a speech emotion change of a speaker from speech data of the speaker.
The above apparatus and method for recognizing a speech emotion change of a speaker from speech data of the speaker according to embodiments of the invention may be applied to many applications. For example, when the above apparatus and method are applied to the application of call center, a speech emotion change recognition result of a customer may be provided to a call center agent in the form of speech or image during the telephone conversion between the customer and the call center agent so that the call center agent may respond to the speech emotion change of the customer appropriately and rapidly. Furthermore, when the above apparatus and method are applied to the application of oral lecture, the desired contents of the lecture can be extracted according to a speech emotion change recognition result of the lecturer. For example, the portions of the lecture which exhibit the speech emotion of "sad" may be filtered out so as to extract the optimistic contents of the lecture. The above method and apparatus may be implemented by hardware. Such hardware may be a single processing device or a plurality of processing devices. Such processing device may be a microprocessor, a microcontroller, a digital processor, a microcomputer, a part of a central processing unit, a state machine, a logic circuit and/or any device capable of manipulating a signal. Also, it should be noted that, the above method and apparatus may be implemented by either software or firmware. In the case where the above method and apparatus are implemented by software, a program that constitutes the software is installed, from a storage medium or a network, into a computer having a dedicated hardware configuration, e. g., a general - purpose personal computer 1000 as illustrated in Figure 10, that when various programs are installed therein, becomes capable of performing various functions, or the like.
In Figure 10, a central processing unit (CPU) 1001 performs various processes in accordance with a program stored in a read only memory (ROM) 1002 or a program loaded from a storage section 1008 to a random access memory (RAM) 1003. In the RAM 1003, data required when the CPU 1001 performs the various processes or the like is also stored as required.
The CPU 1001, the ROM 1002 and the RAM 1003 are connected to one another via a bus 1004. An input / output interface 1005 is also connected to the bus 1004. The following components are connected to input / output interface 1005: An input section 1006 including a keyboard, a mouse, or the like; An output section 1007 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like; The storage section 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs a communication process via the network such as the internet.
A drive 1010 is also connected to the input / output interface 1005 as required.
A removable medium 1011, such as a magnetic disk, an optical disk, a magneto - optical disk, a semiconductor memory, or the like, is mounted on the drive 1010 as required, so that a computer program read therefrom is installed into the storage section 1008 as required.
In the case where the above - described series of processes are implemented by the software, the program that constitutes the software is installed from the network such as the internet or the storage medium such as the removable medium 1011. One skilled in the art should note that, this storage medium is not limit to the removable medium 1011 having the program stored therein as illustrated in Figure 10, which is delivered separately from the device for providing the program to the user.
Examples of the removable medium 1011 include the magnetic disk (including a floppy disk (register trademark )), the optical disk (including a compact disk - read only memory (CD-ROM) and a digital versatile disk (DVD)), the magneto - optical disk
(including a mini - disk (MD) (register trademark)), and the semiconductor memory.
Alternatively, the storage medium may be the ROM 1002, the hard disk contained in the storage section 1008, or the like, which have the program stored therein and is delivered to the user together with the device that containing them.
It should also be noted that the step in which the above - described series of processes are performed may naturally be performed chronologically in order of description but needed not be performed chronologically. Some steps may be performed in parallel or independently of one another. Although illustrative embodiments have been described herein, it should be understood that various other changes, replacements and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention. Further more, the present application is not limited to the above-described specific embodiments for processes, devices, manufactures, structures of substances, means, methods and steps. One skilled in the art will understand from the disclosure of the present invention that, according to the present invention, it is possible to use existing processes, devices, manufactures, structures of substances, means, methods or steps and those to be developed in the future which perform substantially the same functions with the above-described embodiments or obtain substantially the same results. Therefore, the appended claims are intended to cover in their scopes such processes, devices, manufactures, structures of substances, means, methods or steps.

Claims

1. A method of recognizing a speech emotion change of a speaker from speech data of the speaker, comprising the following steps: a window dividing step of dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating step of calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing step of recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
2. The method according to claim 1, wherein the speech data of the speaker is constituted by one or more continuous speech segments of the speaker extracted from dialogue data of a plurality of speakers comprising the speaker.
3. The method according to claim 1 or 2, wherein the window width is a predetermined time width.
4. The method according to claim 2, wherein the window width is determined by a larger one of the minimum length of the one or more continuous speech segments and a predetermined time width.
5. The method according to claim 1, wherein the speech emotion feature comprises one or more of speech pitch, speech energy and speech speed.
6. The method according to claim 1, wherein the window speech emotion feature calculating step comprises an average value calculating step of calculating an average value of speech emotion features of respective feature extraction intervals in the window as the speech emotion feature of the window.
7. The method according to claim 6, wherein the window speech emotion feature calculating step further comprises a singularity removing step of removing speech emotion feature singularities from the speech emotion features of respective feature extraction intervals in the window, before the average value calculating step.
8. The method according to claim 1, wherein a speech emotion change recognizing step further comprises the following steps: a normalizing step of normalizing the speech emotion features of the window set; an Euclidean distance calculating step of calculating an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database; and a determining step of determining a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold as the matching speech emotion feature change template.
9. The method according to claim 1, wherein the speech emotion change recognizing step is performed only if there is any one of speech emotion feature changes between neighboring windows in the window set exceeding a predetermined threshold.
10. The method according to claim 1, further comprising a speech emotion recognizing step of recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
11. An apparatus for recognizing a speech emotion change of a speaker from speech data of the speaker, comprising: a window dividing means for dividing the speech data of the speaker into a plurality of windows by a window width; a window speech emotion feature calculating means for calculating a speech emotion feature for each of the plurality of windows; and a speech emotion change recognizing means for recognizing the speech emotion change of the speaker for a window set consisting of at least two contiguous windows by comparing the speech emotion features of the window set with each of a plurality of speech emotion feature change templates stored in a speech emotion feature change database to find out a speech emotion feature change template which matches the speech emotion features of the window set.
12. The apparatus according to claim 11, wherein the speech data of the speaker is constituted by one or more continuous speech segments of the speaker extracted from dialogue data of a plurality of speakers comprising the speaker.
13. The apparatus according to claim 11 or 12, wherein the window width is a predetermined time width.
14. The apparatus according to claim 12, wherein the window width is determined by a larger one of the minimum length of the one or more continuous speech segments and a predetermined time width.
15. The apparatus according to claim 11, wherein the speech emotion feature comprises one or more of speech pitch, speech energy and speech speed.
16. The apparatus according to claim 11, wherein the window speech emotion feature calculating means comprises an average value calculating means for calculating an average value of speech emotion features of respective feature extraction intervals in the window as the speech emotion feature of the window.
17. The apparatus according to claim 16, wherein the window speech emotion feature calculating means further comprises a singularity removing means for removing speech emotion feature singularities from the speech emotion features of respective feature extraction intervals in the window, before the process in the average value calculating means is performed.
18. The apparatus according to claim 11, wherein a speech emotion change recognizing means further comprises: a normalizing means for normalizing speech emotion features of the window set; an Euclidean distance calculating means for calculating an Euclidean distance between the normalized speech emotion features of the window set and each of the plurality of speech emotion feature change templates stored in the speech emotion feature change database; and a determining means for determining a speech emotion feature change template whose Euclidean distance with the normalized speech emotion features of the window set is the smallest and less than a predetermined threshold as the matching speech emotion feature change template.
19. The apparatus according to claim 11, wherein the process in the speech emotion change recognizing means is performed only if there is any one of speech emotion feature changes between neighboring windows in the window set exceeding a predetermined threshold.
20. The apparatus according to claim 11, further comprising a speech emotion recognizing means for recognizing speech emotions of respective windows in the window set according to a recognition result of speech emotion change in the window set.
21. A computer-readable storage medium with a computer program stored thereon, wherein said computer program, when being executed, causes a computer to execute the method according to any of claims 1 to 10.
PCT/CN2009/070801 2009-03-16 2009-03-16 Apparatus and method for recognizing speech emotion change WO2010105396A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2009/070801 WO2010105396A1 (en) 2009-03-16 2009-03-16 Apparatus and method for recognizing speech emotion change
CN2009801279599A CN102099853B (en) 2009-03-16 2009-03-16 Apparatus and method for recognizing speech emotion change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2009/070801 WO2010105396A1 (en) 2009-03-16 2009-03-16 Apparatus and method for recognizing speech emotion change

Publications (1)

Publication Number Publication Date
WO2010105396A1 true WO2010105396A1 (en) 2010-09-23

Family

ID=42739098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/070801 WO2010105396A1 (en) 2009-03-16 2009-03-16 Apparatus and method for recognizing speech emotion change

Country Status (2)

Country Link
CN (1) CN102099853B (en)
WO (1) WO2010105396A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8948893B2 (en) 2011-06-06 2015-02-03 International Business Machines Corporation Audio media mood visualization method and system
WO2019226406A1 (en) * 2018-05-25 2019-11-28 Microsoft Technology Licensing, Llc Dynamic extraction of contextually-coherent text blocks
CN116578691A (en) * 2023-07-13 2023-08-11 江西合一云数据科技股份有限公司 Intelligent pension robot dialogue method and dialogue system thereof

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971711A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system
CN106971729A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope
CN107133567B (en) * 2017-03-31 2020-01-31 北京奇艺世纪科技有限公司 woundplast notice point selection method and device
CN107154257B (en) * 2017-04-18 2021-04-06 苏州工业职业技术学院 Customer service quality evaluation method and system based on customer voice emotion
CN109087670B (en) * 2018-08-30 2021-04-20 西安闻泰电子科技有限公司 Emotion analysis method, system, server and storage medium
CN108986430A (en) * 2018-09-13 2018-12-11 苏州工业职业技术学院 Net based on speech recognition about vehicle safe early warning method and system
CN111048075A (en) * 2018-10-11 2020-04-21 上海智臻智能网络科技股份有限公司 Intelligent customer service system and intelligent customer service robot
CN110619894B (en) * 2019-09-30 2023-06-27 北京淇瑀信息科技有限公司 Emotion recognition method, device and system based on voice waveform diagram

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812739A (en) * 1994-09-20 1998-09-22 Nec Corporation Speech recognition system and speech recognition method with reduced response time for recognition
WO2007017853A1 (en) * 2005-08-08 2007-02-15 Nice Systems Ltd. Apparatus and methods for the detection of emotions in audio interactions
CN1979491A (en) * 2005-12-10 2007-06-13 三星电子株式会社 Method for music mood classification and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812739A (en) * 1994-09-20 1998-09-22 Nec Corporation Speech recognition system and speech recognition method with reduced response time for recognition
WO2007017853A1 (en) * 2005-08-08 2007-02-15 Nice Systems Ltd. Apparatus and methods for the detection of emotions in audio interactions
CN1979491A (en) * 2005-12-10 2007-06-13 三星电子株式会社 Method for music mood classification and system thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAN, WEIJING ET AL.: "Speech emotion recognition with combined short and long erm features", J TSINGHUA UNIV (SCI & TECH), vol. 48, no. SL, 2008, pages 709 - 713 *
ZHAO LASHENG ET AL.: "Survey on speech emotion recognition", APPLICATION RESEARCH OF COMPUTERS, vol. 26, no. 2, February 2009 (2009-02-01), pages 428 - 431 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8948893B2 (en) 2011-06-06 2015-02-03 International Business Machines Corporation Audio media mood visualization method and system
US9235918B2 (en) 2011-06-06 2016-01-12 International Business Machines Corporation Audio media mood visualization
US9953451B2 (en) 2011-06-06 2018-04-24 International Business Machines Corporation Audio media mood visualization
US10255710B2 (en) 2011-06-06 2019-04-09 International Business Machines Corporation Audio media mood visualization
WO2019226406A1 (en) * 2018-05-25 2019-11-28 Microsoft Technology Licensing, Llc Dynamic extraction of contextually-coherent text blocks
US11031003B2 (en) 2018-05-25 2021-06-08 Microsoft Technology Licensing, Llc Dynamic extraction of contextually-coherent text blocks
CN116578691A (en) * 2023-07-13 2023-08-11 江西合一云数据科技股份有限公司 Intelligent pension robot dialogue method and dialogue system thereof

Also Published As

Publication number Publication date
CN102099853A (en) 2011-06-15
CN102099853B (en) 2012-10-10

Similar Documents

Publication Publication Date Title
WO2010105396A1 (en) Apparatus and method for recognizing speech emotion change
CN110085251B (en) Human voice extraction method, human voice extraction device and related products
US9396724B2 (en) Method and apparatus for building a language model
CN107305541B (en) Method and device for segmenting speech recognition text
US10068570B2 (en) Method of voice recognition and electronic apparatus
CN111145756B (en) Voice recognition method and device for voice recognition
WO2014190732A1 (en) Method and apparatus for building a language model
WO2021179701A1 (en) Multilingual speech recognition method and apparatus, and electronic device
CN107564543B (en) Voice feature extraction method with high emotion distinguishing degree
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
CN106445915B (en) New word discovery method and device
CN108039181B (en) Method and device for analyzing emotion information of sound signal
JP6622681B2 (en) Phoneme Breakdown Detection Model Learning Device, Phoneme Breakdown Interval Detection Device, Phoneme Breakdown Detection Model Learning Method, Phoneme Breakdown Interval Detection Method, Program
CN113823323B (en) Audio processing method and device based on convolutional neural network and related equipment
US10950221B2 (en) Keyword confirmation method and apparatus
CN113450771B (en) Awakening method, model training method and device
CN110827853A (en) Voice feature information extraction method, terminal and readable storage medium
CN113626614B (en) Method, device, equipment and storage medium for constructing information text generation model
WO2024093578A1 (en) Voice recognition method and apparatus, and electronic device, storage medium and computer program product
CN110475139B (en) Video subtitle shielding method and device, storage medium and electronic equipment
Jia et al. A deep learning system for sentiment analysis of service calls
CN115630643A (en) Language model training method and device, electronic equipment and storage medium
CN113823326B (en) Method for using training sample of high-efficiency voice keyword detector
CN114267342A (en) Recognition model training method, recognition method, electronic device and storage medium
CN114120425A (en) Emotion recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980127959.9

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09841681

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09841681

Country of ref document: EP

Kind code of ref document: A1