CN113163053A - Electronic device and play control method - Google Patents

Electronic device and play control method Download PDF

Info

Publication number
CN113163053A
CN113163053A CN202010074142.8A CN202010074142A CN113163053A CN 113163053 A CN113163053 A CN 113163053A CN 202010074142 A CN202010074142 A CN 202010074142A CN 113163053 A CN113163053 A CN 113163053A
Authority
CN
China
Prior art keywords
voice
message
voice message
electronic device
detection unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010074142.8A
Other languages
Chinese (zh)
Other versions
CN113163053B (en
Inventor
阎慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alpine Electronics Inc
Original Assignee
Alpine Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alpine Electronics Inc filed Critical Alpine Electronics Inc
Priority to CN202010074142.8A priority Critical patent/CN113163053B/en
Publication of CN113163053A publication Critical patent/CN113163053A/en
Application granted granted Critical
Publication of CN113163053B publication Critical patent/CN113163053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides an electronic device and a play control method. The electronic device has: a detection unit that detects whether or not the voice feature amount of the voice message received by the reception unit is equal to or less than a predetermined permitted playback threshold; a character conversion unit that converts the voice message into a character message when the detection unit detects that the voice feature amount of the voice message is equal to or less than the playback permission threshold; and a voice playing control part for controlling the playing part to play the character message converted by the character conversion part through voice. Therefore, the voice message with high possibility that the user is difficult to timely know the content of the voice message is converted into the text message and the text message is played through system voice, so that the user can be helped to timely know the content of the voice message, the operation burden of the user for replaying the voice message or requesting the sender to re-record and send the voice message is further reduced, and the processing burden and the communication burden of the electronic device are reduced.

Description

Electronic device and play control method
Technical Field
The present invention relates to an electronic device and a play control method, and more particularly, to an electronic device and a play control method for controlling play of a voice message.
Background
With the popularization of various social software, electronic devices having a data communication function, such as smart phones, personal computers, and vehicle-mounted devices, communicate with each other by transmitting and receiving messages using the social software, and thus become an essential part of daily life. When people use the social software to carry out contact and communication, not only can the text message be sent and received, but also the voice message can be sent and received.
When receiving and sending voice messages, a sender records audio data and sends the audio data to a receiver through data communication, and the receiver receives and plays the audio data. However, the voice message is different from the text message, and is affected by the self condition and the surrounding environment when the sender records, and the content may be incoherent, or the voice speed is too fast, the volume is too small, and the noise is too large, so that the receiver cannot hear the content of the voice message when playing the voice message.
In this case, the receiver cannot grasp the content of the voice message in time, and convenience in contact communication through the voice message is impaired.
Furthermore, the receiving side is likely to play the voice message again in order to hear the content of the voice message, and if the content cannot be heard, only the sending side can request to record again and send the voice message. As a result, the operation load on the receiving side and/or the transmitting side increases, and the processing load and the communication load on the electronic apparatus become heavy.
As the use of voice messages in social software becomes more and more popular, especially, the use of voice messages is started in the application scene of vehicle-mounted devices and the like, in order to improve convenience and safety, the above technical problems in the prior art need to be solved urgently.
Disclosure of Invention
In view of the above-mentioned problems of the prior art, it is an object of the present invention to provide an electronic device and a playback control method that help a user to grasp the content of a voice message in time.
One embodiment of the present invention provides an electronic device including: a receiving unit that receives a voice message; and a playing section that plays the voice message received by the receiving section; the electronic device is characterized by further comprising: a detection unit that detects whether or not a voice feature amount of the voice message received by the reception unit is equal to or less than a predetermined permitted playback threshold; a text conversion unit that converts the voice message into a text message when the detection unit detects that the voice feature amount of the voice message is equal to or less than the playback permission threshold; and a voice playing control part for controlling the playing part to play the text message converted by the text conversion part through voice.
Therefore, the voice message with high possibility that the user is difficult to timely know the content of the voice message is converted into the text message and the text message is played through system voice, so that the user can be helped to timely know the content of the voice message, the operation burden of the user for replaying the voice message or requesting the sender to re-record and send the voice message is further reduced, and the processing burden and the communication burden of the electronic device are reduced.
In the electronic device according to one embodiment of the present invention, the detection unit may calculate the sound feature amount of the voice message based on at least one feature of the voice message, including a speech rate, a sound volume, a pitch, a sound quality, time information, a noise level, and a degree of recognition, and the detection unit may compare the calculated sound feature amount of the voice message with the playback permission threshold preset based on the at least one feature to determine whether or not the sound feature amount of the voice message is equal to or less than the playback permission threshold.
Thus, by comparing the characteristics of the voice message itself with the threshold value set in advance based on the characteristics, it is possible to reliably determine the possibility that the contents of the received voice message are difficult to grasp in time.
In the electronic device according to one embodiment of the present invention, the detection unit may calculate the sound feature amount of the voice message based on the discontinuity characteristic of the voice message, and the detection unit may further compare the calculated sound feature amount of the voice message with the playback permission threshold value preset based on the discontinuity characteristic to determine whether or not the sound feature amount of the voice message is equal to or less than the playback permission threshold value.
Thus, by comparing the discontinuity characteristics of the voice message with the threshold value set in advance based on the discontinuity characteristics, it is possible to reliably determine the possibility that the contents of the received voice message are difficult to grasp in time due to discontinuity. Further, by converting such a voice message into a text message and playing the text message by, for example, a system voice, it is possible to help a user to listen to the content of the voice message or save time wasted by the user waiting for a break in the voice message.
In the electronic device according to one embodiment of the present invention, the detection unit may detect whether or not the voice message includes specific information, and the detection unit may detect whether or not a sound feature amount of the voice message is equal to or less than the playback permission threshold when detecting that the voice message includes the specific information.
Thus, when the voice message includes specific information that particularly requires the user to grasp, it is possible to reliably determine the possibility that the received voice message is difficult to grasp in time, and it is possible to help the user grasp the content of the voice message including the specific information in time. In addition, it is possible to reduce the processing load of the electronic apparatus for detecting the voice feature amount with respect to the received voice message only by detecting whether or not the voice feature amount is equal to or less than the threshold value with respect to the voice message including the specific information.
In the electronic apparatus according to one embodiment of the present invention, the detection unit may calculate a voice feature amount of the voice message based on at least a speech rate when detecting that the voice message includes the specific information, and determine whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value preset based on at least the speech rate, and the character conversion unit may convert the voice message into a character message when the detection unit determines that the voice feature amount of the voice message is equal to or less than the playback permission threshold value, and the voice playback control unit may control the playback unit to play the character message converted by the character conversion unit by voice at a speech rate lower than the speech rate of the voice message.
Therefore, when specific information that particularly needs to be grasped by the user is included in a voice message and the speech rate of the voice message is too fast, the voice message is converted into a text message and the text message is played at a low speech rate, which can help the user to reliably grasp the content of the voice message including the specific information.
The electronic device according to one embodiment of the present invention may further include: and a receiving unit configured to receive an input of a message from a user of the electronic apparatus, wherein the detecting unit detects whether or not a message indicating a specific meaning is received from the user of the electronic apparatus after the voice message is received by the receiving unit, and the detecting unit detects whether or not a voice feature amount of the voice message is equal to or less than the playback permission threshold when the message indicating the specific meaning is received from the user of the electronic apparatus after the voice message is detected to be received.
Thus, when a message indicating a specific meaning (for example, requesting the sender to re-record and send or indicating that the message is not heard) is received from the user, it is possible to judge the possibility that the received voice message is difficult to grasp in time, and it is possible to help the user grasp the content of the voice message in time with more pertinence. Further, it is possible to reduce the processing load of the electronic apparatus for detecting the voice feature amount with respect to the received voice message by determining whether or not the voice feature amount is equal to or less than the threshold value only when a message indicating a specific meaning is received from the user.
In the electronic device according to one embodiment of the present invention, when the detection unit detects that the message indicating the specific meaning is received from the user of the electronic device after the reception of the voice message, the character conversion unit may convert the voice message into a character message for the voice message, the voice playback control unit may control the playback unit to play back the character message converted by the character conversion unit by voice, the detection unit may detect whether or not a voice feature amount of a subsequent voice message is equal to or less than the playback permission threshold for at least one subsequent voice message that is sent from the same contact object after the voice message, and the character conversion unit may convert the subsequent voice message into the character message when the detection unit detects that the voice feature amount of the subsequent voice message is equal to or less than the playback permission threshold, the voice playing control part controls to make the playing part play the character message converted by the character conversion part through voice.
Thus, when a message indicating a specific meaning (for example, that the sender is requested to re-record and send or that the message is not heard) is received from the user, the voice message received so far is directly converted into a text message and played back by, for example, a system voice, and it is determined whether or not the voice feature amount is equal to or less than a threshold value for a subsequent voice message sent from the same contact object.
In the electronic device according to one embodiment of the present invention, when the detection unit detects that the message indicating the specific meaning is received from the user of the electronic device after the voice message is received, the playback unit may play the text message converted by the text conversion unit in voice without transmitting the message indicating the specific meaning from the electronic device to the contact destination that has transmitted the voice message.
Thus, without transmitting a message indicating a specific meaning (for example, a request sender re-records and transmits or indicates that it is not heard) received from the user to the contact object, the voice message received immediately before is converted into a text message and is played by, for example, a system voice, and thus it is possible to improve the efficiency of contact between the user and the contact object and reduce unnecessary communication load.
The electronic device according to one embodiment of the present invention may be an in-vehicle device.
As described above, the electronic device of the present embodiment is helpful for a user to timely grasp the content of the voice message, thereby reducing the operation burden of the user in replaying the voice message or requesting a sender to rerecord and send the voice message. When the electronic device is an in-vehicle device, the driving safety can be improved by reducing the operation load of the user.
An embodiment of the present invention further provides a method for controlling playback of a voice message, including: a detection step of detecting, for a received voice message, whether or not a sound characteristic quantity of the voice message is equal to or less than a predetermined play-permitted threshold; a text conversion step of converting the voice message into a text message when the sound characteristic quantity of the voice message detected by the detection step is less than or equal to the play permission threshold; and a voice play control step of controlling to play the text message converted by the text conversion step by voice.
The above-described various aspects of the electronic device of the present invention can also be applied to the playback control method, the playback control system, the playback control program, and the recording medium on which the playback control program is recorded of the present invention, and corresponding technical effects are obtained.
Drawings
Fig. 1 is a block diagram of an electronic device according to a first embodiment of the present invention.
Fig. 2 is a flowchart of an example of a playback control method according to the first embodiment of the present invention.
Fig. 3 is an explanatory diagram of an audio feature amount calculation table in a specific example of the first embodiment of the present invention.
Fig. 4 is an audio waveform diagram of a specific example of the second embodiment of the present invention.
Fig. 5 is a flowchart of an example of a playback control method according to a third embodiment of the present invention.
Fig. 6 is an audio waveform diagram of a specific example of the third embodiment of the present invention.
Fig. 7 is a flowchart of another example of the playback control method according to the third embodiment of the present invention.
Fig. 8 is a block diagram of an electronic device according to a fourth embodiment of the present invention.
Fig. 9 is a flowchart of an example of a playback control method according to a fourth embodiment of the present invention.
Fig. 10 is a flowchart of another example of the playback control method according to the fourth embodiment of the present invention.
Description of reference numerals:
1. 1A: an electronic device; 11: a receiving section; 12: a playing section; 13: a detection unit; 14: a character conversion section; 15: a voice play control section; 16: and a reception unit.
Detailed Description
The present invention will be described in more detail below with reference to the accompanying drawings, embodiments, and specific examples. The following description is only an example for the convenience of understanding the present invention and is not intended to limit the scope of the present invention. In the embodiments, the components of the apparatus may be changed, deleted or added according to the actual situation, and the steps of the method may be changed, deleted, added or changed in order according to the actual situation.
(first embodiment)
The first embodiment of the present invention will be specifically explained. First, an electronic device 1 according to a first embodiment of the present invention will be described. The electronic device 1 is an electronic apparatus such as a smartphone, a computer, and an in-vehicle device. Fig. 1 is a block diagram of an electronic device 1 according to a first embodiment of the present invention. As shown in fig. 1, the electronic device 1 includes a receiving unit 11, a playback unit 12, a detection unit 13, a character conversion unit 14, and a voice playback control unit 15, which will be described in detail below.
The receiving unit 11 receives a voice message. The receiving unit 11 can receive a voice message from a contact object. The receiving part 11 may have a data communication function itself so as to be able to receive a voice message from a contact object of another electronic device. Alternatively, the receiving unit 11 may be capable of receiving a voice message received by another electronic device by establishing a wired or wireless connection with the other electronic device having a data communication function. The receiving unit 11 is not limited to receiving only a voice message, and may receive a text message, a video message, or the like, and the video message may be regarded as a type of voice message when being played as voice.
The playback unit 12 plays back the voice message received by the reception unit 11, and may play back the voice message through a speaker or the like, or output an audio signal to an external playback device through an interface. In addition, the playing section 12 may play an audio portion of the video message, thereby playing the video message as one of the voice messages.
The electronic device 1 further includes a detection unit 13, a character conversion unit 14, and a voice playback control unit 15 as characteristic components. The detection unit 13, the character conversion unit 14, and the audio playback control unit 15 may be realized by a processor provided in the electronic apparatus 1 executing a program corresponding to the functions of each unit, or may be realized by a dedicated circuit.
The detection unit 13 detects whether or not the voice feature amount of the voice message received by the reception unit 11 is equal to or less than a predetermined permitted playback threshold. For example, the detection unit 13 may analyze audio data corresponding to the voice message received by the reception unit 11, extract a sound feature amount of the voice message reflected by the audio data, and compare the extracted sound feature amount with a preset threshold value for allowing playback of the voice data. The threshold value for allowing the audio data to be played back is a threshold value for determining whether or not the content of the audio data can be grasped by the user in time, and may be stored in a memory (not shown) provided in the electronic apparatus 1, or may be stored in a cloud server or the like and acquired by the electronic apparatus 1 in real time through data communication, for example. The threshold value for permitting playback of voice data may be calculated from history data that is difficult to be recognized by the user in a voice message received by the electronic apparatus 1 in the past, may be calculated from big data obtained by integrating history data of a plurality of electronic apparatuses 1, or may be determined by machine learning or the like.
The character conversion unit 14 converts the voice message into a character message when the detection unit 13 detects that the voice feature amount of the voice message is equal to or less than the playback permission threshold. For example, the character conversion unit 14 performs voice recognition on the audio data corresponding to the voice message received by the reception unit 11 by using a voice recognition technique, and extracts character information in the voice message to construct a character message corresponding to the voice message.
The voice playback control unit 15 controls the playback unit 12 to play back the text message converted by the text conversion unit 14 by voice. Here, the voice playback control unit 15 controls the playback unit 12 to play the text message converted by the text conversion unit 14 in the system voice based on, for example, system voice data stored in a memory (not shown) provided in the electronic device 1. Alternatively, the audio playback control unit 15 may control the playback unit 12 to play the text message converted by the text conversion unit 14 in an external audio based on, for example, external audio data stored in a cloud server. Alternatively, the electronic apparatus 1 may set in advance target voice data representing a target voice personally preferred by the user based on a user operation, and the voice playback control unit 15 may control the playback unit 12 to play the text message converted by the text conversion unit 14 in the target voice.
According to the embodiment, a voice message with high possibility that the user cannot grasp the content of the voice message in time is converted into a text message, and the text message is played through system voice, so that the user can be helped to grasp the content of the voice message in time, the operation burden of the user for replaying the voice message or requesting the sender to record again and send the voice message is further reduced, and the processing burden and the communication burden of the electronic device are reduced.
Next, a playback control method executed by the electronic apparatus 1 according to the first embodiment of the present invention will be described. Fig. 2 is a flowchart of an example of a playback control method according to the first embodiment of the present invention. The flow of fig. 2 may be executed, for example, when the electronic device 1 receives a voice message through the receiving unit 11, but is not limited to this, and may be executed when the electronic device 1 prepares to play a voice message through the playing unit 12, or may be executed at any time between when the receiving unit 11 receives a voice message and when the playing unit 12 plays a voice message. The following step S101 corresponds to a detection step, step S102 corresponds to a character conversion step, and steps S103 and S104 correspond to a voice playback control step.
In step S101, the detection unit 13 detects whether or not the voice feature amount of the voice message received by the reception unit 11 is equal to or less than a predetermined playback permission threshold. For example, the detection unit 13 calculates the voice feature amount of the voice message based on at least one feature of the voice message, such as the speed of speech, the volume of sound, the pitch of sound, the sound quality, time information, the noise level, and the degree of recognition. Further, the detection unit 13 compares the calculated voice feature amount of the voice message with a playback permission threshold value set in advance based on the at least one feature, and determines whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value.
Among the above-described features of the voice message, the time information may include a duration of the voice message, a duration of the voice message exceeding or falling below a prescribed volume, a duration of the voice message in which a noise level exceeds a prescribed level, and the like. The degree of recognition indicates the degree of recognition by the user, and may be calculated by weighted addition of some of the above parameters, or may be learned by a machine. In addition, when the play permission threshold is preset, the play permission threshold may be set based on the at least one feature in the critical voice data by using the critical voice data that can be recognized by the user just in time.
Fig. 3 is an explanatory diagram of an audio feature amount calculation table in a specific example of the first embodiment of the present invention. As shown in fig. 3, for example, the voice feature amount may be calculated by weighting and adding a weight w1 for a speech rate feature, a weight w2 for a volume feature, a weight w3 for a pitch feature, a weight w4 for a sound quality feature, a weight w5 for a time information feature, and a weight w6 for a noise level feature of a voice message. Alternatively, the degree of recognition may be calculated based on the same weighted addition, and the degree of recognition itself or a value obtained by performing a predetermined operation on the degree of recognition (for example, taking a predetermined ratio of the degree of recognition) may be used as the sound feature amount. In addition, when the playback permission threshold is set in advance, the playback permission threshold may be set in advance by using the threshold speech data that can be recognized by the user just in time and by weighted addition based on the above-described features in the threshold speech data.
Alternatively, in a vector space including a plurality of features among the above features, multidimensional data (vector) including the plurality of features may be used as the sound feature amount. In this case, the multidimensional data (vector) including the plurality of features of the critical voice data may be used as a playback permission threshold, and the voice feature vector of the received voice message may be compared with the playback permission threshold vector to determine whether or not the voice feature of the voice message is equal to or less than the playback permission threshold.
If it is determined in step S101 that the sound feature amount of the voice message is not equal to or less than the predetermined permitted playback threshold (no in step S101), step S104 is executed and the playback unit 12 plays the voice message as it is.
If it is determined in step S101 that the sound feature amount of the voice message is equal to or less than the predetermined permitted playback threshold (yes in step S101), step S102 is executed. For example, in the above-described specific example, when the voice feature amount of the voice message calculated by the weighted addition is equal to or less than the playback permission threshold set in advance by the threshold voice data, step S102 is executed, and the character conversion unit 14 converts the voice message into a character message.
Next, in step S103, the voice playback control unit 15 controls the text message converted in step S102 to be played back by voice.
According to this example, by comparing the characteristics of the voice message itself with the threshold value set in advance based on the characteristics, it is possible to reliably determine the possibility that the contents of the received voice message are difficult to grasp in time.
(second embodiment)
The second embodiment of the present invention will be specifically explained. This embodiment is an example of the first embodiment, and calculates the sound feature amount from the intermittent feature of the voice message. The following description focuses on differences of the present embodiment from the first embodiment, and the same or similar contents as or to the first embodiment will be omitted in the present embodiment.
The structure of the electronic device 1 and the flow of the playback control method executed by the electronic device 1 in this embodiment are the same as the structure of the electronic device 1 and the playback control method executed by the electronic device in the first embodiment, and are not described herein again. A specific example of the present embodiment will be described below.
Fig. 4 is an audio waveform diagram of a specific example of the second embodiment of the present invention. In fig. 4, the audio waveform of the voice message "big connecting star sea square" is shown. It can be seen that there are significant discontinuities in the audio waveform of the voice message between times t1 and t2, between times t3 and t4, and between times t5 and t 6.
In contrast, the detection unit 13 calculates the sound feature amount of the voice message based on the intermittent feature of the voice message. For example, the sound feature quantity of the voice message may be calculated from any one of the intermittent features such as the number of times of interruption, the proportion of interruption time, and the maximum time of interruption, or the sound feature quantity of the voice message may be calculated from weighted addition of a plurality of intermittent features. For example, in the example shown in fig. 4, the number of breaks is 3, the maximum time of the breaks is t4-t3, and the ratio of the breaks can be calculated according to the following formula:
the ratio of the interruption time to the total interruption time/message time
The detection unit 13 may determine that a break has occurred when the volume of the voice message is lower than a predetermined threshold, or may extract the volume of the voice message by excluding noise, environmental sound, and the like from the voice message and determine that a break has occurred when the volume of the voice message is lower than a predetermined threshold.
Further, the detection unit 13 compares the calculated voice feature amount of the voice message with a playback permission threshold value set in advance based on the discontinuity characteristic, and determines whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value.
For example, if the proportion of the off-time is greater than 30%, the sound feature amount of the voice message is set to be equal to or less than the play permission threshold. In this case, in the example shown in fig. 4, the detection unit 13 determines that the sound feature amount of the voice message is equal to or less than the playback permission threshold, the character conversion unit 14 converts the voice message into the character message "star and sea plaza", and the voice playback control unit 15 controls so that the character message "star and sea plaza" is played back continuously by the system voice.
Thus, by comparing the discontinuity characteristics of the voice message with the threshold value set in advance based on the discontinuity characteristics, it is possible to reliably determine the possibility that the contents of the received voice message are difficult to grasp in time due to discontinuity. Further, by converting such a voice message into a text message and playing the text message by, for example, a system voice, it is possible to help a user to listen to the content of the voice message or save time wasted by the user waiting for a break in the voice message.
(third embodiment)
The third embodiment of the present invention will be specifically explained. This embodiment is an example of the first embodiment or the second embodiment, and detects whether or not the sound feature amount of the voice message is equal to or less than the playback permission threshold on the assumption that the voice message includes specific information. The following description focuses on differences of the present embodiment from the first embodiment or the second embodiment, and the same or similar contents as those of the first embodiment or the second embodiment will not be described in the present embodiment.
The structure of the electronic device of this embodiment is the same as that of the electronic device 1 of the first embodiment, and is not described herein again. The playback control method executed by the electronic device 1 of the present embodiment is described below. Fig. 5 is a flowchart of an example of a playback control method according to a third embodiment of the present invention. The flow of fig. 5 may be executed, for example, when the electronic device 1 receives a voice message through the receiving unit 11, but is not limited to this, and may be executed when the electronic device 1 prepares to play a voice message through the playing unit 12, or may be executed at any time between when the receiving unit 11 receives a voice message and when the playing unit 12 plays a voice message. Steps S201, S202 described below correspond to a detection step, step S203 corresponds to a character conversion step, and steps S204, S205 correspond to a voice playback control step.
In step S201, the detection unit 13 detects whether or not the specific information is included in the voice message received by the reception unit 11. The specific information includes, for example, a telephone number, an address, and the like, and can be detected by means of voice recognition or the like. Fig. 6 is an audio waveform diagram of a specific example of the third embodiment of the present invention. In fig. 6, an audio waveform of a voice message "my phone number is 84757138" is shown. The detection unit 13 recognizes that the voice message includes specific information such as a telephone number, for example, by a voice recognition technique.
If the detection unit 13 detects in step S201 that the voice message includes the specific information (yes in step S201), step S202 is executed. If the detection unit 13 detects in step S201 that the specific information is not included in the voice message (no in step S201), step S205 is executed. The subsequent steps S202, S203, S204, S205 correspond to steps S101, S102, S103, S104 in the first embodiment or the second embodiment, respectively. In the example shown in fig. 6, the voice message "my phone number is 84757138" is converted into a text message, and the converted text message is played by, for example, system voice.
Thus, when the voice message includes specific information that particularly requires the user to grasp, it is possible to reliably determine the possibility that the received voice message is difficult to grasp in time, and it is possible to help the user grasp the content of the voice message including the specific information in time. In addition, it is possible to reduce the processing load of the electronic apparatus for detecting the voice feature amount with respect to the received voice message only by detecting whether or not the voice feature amount is equal to or less than the threshold value with respect to the voice message including the specific information.
Another example of the playback control method according to the present embodiment is described below. Fig. 7 is a flowchart of another example of the playback control method according to the third embodiment of the present invention. The flow of fig. 7 may be executed, for example, when the electronic device 1 receives a voice message through the receiving unit 11, but is not limited to this, and may be executed when the electronic device 1 prepares to play a voice message through the playing unit 12, or may be executed at any time between when the receiving unit 11 receives a voice message and when the playing unit 12 plays a voice message. Steps S301, S302 described below correspond to a detection step, step S303 corresponds to a character conversion step, and steps S304, S305, S306 correspond to a voice playback control step.
Step S301 in fig. 7 is equivalent to step S201 in fig. 5, and is not described herein again. Still referring to the specific example shown in fig. 6, the detection unit 13 recognizes that the voice message "my phone number is 84757138" includes specific information such as a phone number, for example, by a voice recognition technique.
If the detection unit 13 detects in step S301 that the voice message includes the specific information (yes in step S301), step S302 is executed. If the detection unit 13 detects in step S301 that the specific information is not included in the voice message (no in step S301), step S306 is executed.
In step S302, the detection unit 13 calculates the voice feature amount of the voice message based on at least the speech rate, and determines whether or not the voice feature amount of the voice message is equal to or less than a playback permission threshold value preset based on at least the speech rate. For example, the faster the speech rate of a voice message is, the smaller the calculated sound feature amount is. In addition, when the sound feature amount is calculated and the play permission threshold is set in advance, the sound feature amount may be calculated by, for example, weighted addition or multidimensional vector in combination with other feature amounts in addition to the speech rate.
If the detection unit 13 determines in step S302 that the sound feature amount of the voice message is equal to or less than the playback permission threshold (yes in step S302), step S303 is executed. If the detection unit 13 determines in step S302 that the sound feature amount of the voice message is not equal to or less than the playback permission threshold (no in step S302), step S306 is executed.
Step S303 is equivalent to step S102 shown in fig. 2 or step S203 shown in fig. 5, and is not repeated herein. Next, in step S304, the speech sound reproduction control unit 15 controls the reproduction unit 12 to set the speech rate at which the text message converted in step S303 is to be reproduced to a speech rate lower than the speech rate of the original speech message received by the reception unit 11. Next, in step S305, the speech sound reproduction control unit 15 controls the reproduction unit 12 to reproduce the text message converted in step S303 by speech sound at a speech sound speed set to be lower than the speech sound speed of the original speech message.
In the example shown in fig. 6, the received original voice message "my telephone number is 84757138" is converted into a text message if the voice feature quantity is below the preset play-permitted threshold value due to excessively fast voice speed, and the converted text message is played at a voice speed lower than the voice speed of the original voice message.
Therefore, when specific information that particularly needs to be grasped by the user is included in a voice message and the speech rate of the voice message is too fast, the voice message is converted into a text message and the text message is played at a low speech rate, which can help the user to reliably grasp the content of the voice message including the specific information.
(fourth embodiment)
The fourth embodiment of the present invention will be specifically explained. This embodiment is an example of the first embodiment or the second embodiment, and detects whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value on the premise that a message indicating a specific meaning is received from the user. The following description focuses on differences of the present embodiment from the first embodiment or the second embodiment, and the same or similar contents as those of the first embodiment or the second embodiment will not be described in the present embodiment.
First, the configuration of the electronic device 1A according to the present embodiment will be described. Fig. 8 is a block diagram of an electronic device 1A according to a fourth embodiment of the present invention. As shown in fig. 8, the electronic device 1A of the present embodiment further includes a receiving unit 16 in addition to the electronic device 1 of the first or second embodiment. The receiving unit 16 receives an input of a message from the user of the electronic device 1A, and may receive a voice input from the user by a microphone or the like, or receive a character input from the user by a touch panel, a mouse, a keyboard or the like.
The playback control method executed by the electronic device 1A according to the present embodiment is described below. Fig. 9 is a flowchart of an example of a playback control method according to a fourth embodiment of the present invention. The flow of fig. 9 may be executed in a prescribed cycle or in real time, for example. The following steps S401, S402, S403 correspond to a detection step, step S404 corresponds to a character conversion step, and steps S405, S406 correspond to a voice playback control step.
In step S401, the detection unit 13 detects whether or not the voice message is received by the reception unit 11. If it is detected in step S401 that the voice message is not received (no in step S401), step S401 is repeatedly executed. In the case where it is detected in step S401 that the voice message is received (step S401: YES), step S402 is executed.
In step S402, the detection unit 13 detects whether or not a message indicating a specific meaning is received from the user of the electronic device 1A. The message indicating the specific meaning may be a message requesting the contact object to resend, such as "please resend" or "please speak again", or a message indicating that the content of the voice message transmitted from the contact object is difficult to grasp, such as "cannot be heard" or "you speak clearly". For example, the detection unit 13 may detect whether or not a message indicating a specific meaning is received from the user of the electronic apparatus 1A by recognizing the content of the message received from the user of the electronic apparatus 1A by a voice recognition technique and comparing the recognized content with a template message indicating a specific meaning set in advance.
If it is detected in step S402 that a message indicating a specific meaning is received from the user of the electronic apparatus 1A (yes in step S402), step S403 is executed. If it is detected in step S402 that the message indicating the specific meaning has not been accepted from the user of the electronic apparatus 1A (no in step S402), step S406 is executed. The subsequent steps S403, S404, S405, S406 correspond to steps S101, S102, S103, S104 in the first embodiment or the second embodiment, respectively.
Thus, when a message indicating a specific meaning (for example, requesting the sender to re-record and send or indicating that the message is not heard) is received from the user, it is possible to judge the possibility that the received voice message is difficult to grasp in time, and it is possible to help the user grasp the content of the voice message in time with more pertinence. Further, it is possible to reduce the processing load of the electronic apparatus for detecting the voice feature amount with respect to the received voice message by determining whether or not the voice feature amount is equal to or less than the threshold value only when a message indicating a specific meaning is received from the user.
Another example of the playback control method according to the present embodiment is described below. Fig. 10 is a flowchart of another example of the playback control method according to the fourth embodiment of the present invention. The flow of fig. 10 may be executed in a prescribed cycle or in real time, for example. Steps S501, S502, S505, S506, and S507 described below correspond to a detection step, steps S503 and S508 correspond to a character conversion step, and steps S504, S509, and S510 correspond to a voice playback control step.
Steps S501 and S502 correspond to steps S401 and S402 shown in fig. 9, respectively, and are not described herein again. If it is detected in step S502 that a message indicating a specific meaning is accepted from the user of the electronic apparatus 1A (yes in step S502), step S503 is executed. If it is detected in step S502 that the message indicating the specific meaning has not been accepted from the user of the electronic apparatus 1A (no in step S502), step S510 is executed.
In one specific example, step S503 is executed next because the detection unit 13 detects that the voice message of "grand star sea square" shown in fig. 4 is received from the contact target in step S501, and the detection unit 13 detects that the message of "request for retransmission" is accepted from the user in step S502.
Steps S503 and S504 correspond to steps S404 and S405 shown in fig. 9, respectively. In the above-described specific example, the character conversion unit 14 converts the voice message of "star and sea square" into a character message in step S503, and the voice playback control unit 15 controls the playback unit 12 to play back the converted character message by voice in step S504. That is, when a message indicating a specific meaning is received from a user ("request to resend"), the voice message is converted directly into a text message and played back by voice without detecting the voice feature amount of the voice message ("starry sea square") received from the contact object.
Step S505 is performed next. Step S505 is the same as step S501, and is not described herein. When the detection unit 13 detects that the voice message is received in step S505, it executes step S506 to determine whether the voice message received in step S505 and the voice message received in step S501 are the same voice message sent from the same contact destination.
If it is detected in step S506 that the voice message is not a voice message sent from the same contact object (no in step S506), the process returns to step S502. If it is detected in step S506 that the voice message is a voice message sent from the same contact object (yes in step S506), step S507 is executed.
Steps S507, S508, S509, and S510 correspond to steps S403, S404, S405, and S406 in fig. 9, respectively, and are not described herein again. In the above specific example, it is assumed that a voice message of "eat 6 pm" is received from the same contact object. For the voice message, it is detected whether or not the sound feature quantity is below a play permission threshold, and in the case where the sound feature quantity is below the play permission threshold, the voice message ("6 pm eating") is converted into a text message and played by, for example, a system voice.
Thus, when a message indicating a specific meaning (for example, that the sender is requested to re-record and send or that the message is not heard) is received from the user, the voice message received so far is directly converted into a text message and played back by, for example, a system voice, and it is determined whether or not the voice feature amount is equal to or less than a threshold value for a subsequent voice message sent from the same contact object.
In the present embodiment and its specific example, when the detection unit 13 detects that a message indicating a specific meaning is received from the user of the electronic device 1A after the reception of a voice message, the playback unit 12 may play the text message converted by the text conversion unit 14 by voice without transmitting the message indicating the specific meaning from the electronic device 1A to the contact destination that has transmitted the voice message. For example, in the above-described specific example, the message received from the user of the electronic apparatus 1A may not be transmitted to the contact destination ("retransmission request").
Thus, without transmitting a message indicating a specific meaning (for example, a request sender re-records and transmits or indicates that it is not heard) received from the user to the contact object, the voice message received immediately before is converted into a text message and is played by, for example, a system voice, and thus it is possible to improve the efficiency of contact between the user and the contact object and reduce unnecessary communication load.
The embodiments and specific examples of the present invention have been described above with reference to the accompanying drawings. The above-described embodiments and specific examples are merely specific examples of the present invention and are not intended to limit the scope of the present invention. Those skilled in the art can modify the embodiments and specific examples based on the technical idea of the present invention, and various modifications, combinations, and appropriate omissions of the elements can be made, and the embodiments obtained thereby are also included in the scope of the present invention. For example, the above embodiments and specific examples may be combined with each other, and the combined embodiments are also included in the scope of the present invention.
For example, the electronic device 1, 1A according to the embodiment of the present invention may be an in-vehicle device. As described above, the electronic device 1, 1A helps the user to grasp the content of the voice message in time, thereby reducing the operation burden of the user to replay the voice message or to request the sender to re-record and send the voice message. When the electronic devices 1 and 1A are in-vehicle devices, the driving safety can be improved by reducing the operation load on the user.
For example, the electronic apparatus 1, 1A according to the embodiment of the present invention may be configured as a playback control system with another electronic device having a data communication function, and may be configured to receive a voice message received by the other electronic device by establishing a wired or wireless connection with the other electronic device.
Further, the steps included in the playback control method according to each of the above-described embodiments of the present invention may be implemented as each unit (means) included in the playback control system, each step included in the playback control program, or a recording medium on which the playback control program is recorded, and the same technical effects can be obtained.

Claims (10)

1. An electronic device has:
a receiving unit that receives a voice message; and
a playing part for playing the voice message received by the receiving part;
the electronic device is characterized by further comprising:
a detection unit that detects whether or not a voice feature amount of the voice message received by the reception unit is equal to or less than a predetermined permitted playback threshold;
a text conversion unit that converts the voice message into a text message when the detection unit detects that the voice feature amount of the voice message is equal to or less than the playback permission threshold; and
and a voice playing control part for controlling the playing part to play the text message converted by the text conversion part through voice.
2. The electronic device of claim 1,
the detection section calculates a voice feature quantity of the voice message based on at least one feature among a speech rate, a sound volume, a pitch, a sound quality, time information, a noise level, and a degree of recognition of the voice message,
the detection unit further compares the calculated sound feature amount of the voice message with the playback permission threshold value set in advance based on the at least one feature, and determines whether or not the sound feature amount of the voice message is equal to or less than the playback permission threshold value.
3. The electronic device of claim 1,
the detection unit calculates a voice feature amount of the voice message based on the intermittent feature of the voice message,
the detection unit further compares the calculated voice feature amount of the voice message with the playback permission threshold value set in advance based on the discontinuity characteristic, and determines whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value.
4. The electronic device of claim 1,
the detecting section detects whether or not specific information is included in the voice message,
the detection unit detects whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold when it is detected that the voice message includes specific information.
5. The electronic device of claim 4,
the detection unit calculates a voice feature amount of the voice message based on at least a speech rate when it is detected that the voice message includes specific information, and determines whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold value preset based on at least the speech rate,
when the detection unit determines that the sound characteristic amount of the voice message is equal to or less than the playback permission threshold, the text conversion unit converts the voice message into a text message, and the voice playback control unit controls the playback unit to play the text message converted by the text conversion unit in voice at a lower speed of speech than the speed of speech of the voice message.
6. The electronic device of claim 1, further comprising:
a reception unit for receiving a message input from a user of the electronic device,
the detection unit detects whether or not a message indicating a specific meaning is accepted from a user of the electronic apparatus after the voice message is received by the reception unit,
the detection unit detects whether or not the voice feature amount of the voice message is equal to or less than the playback permission threshold when the message indicating the specific meaning is accepted from the user of the electronic device after the reception of the voice message is detected.
7. The electronic device of claim 6,
when the detection unit detects that the voice message has been received and then receives the message indicating the specific meaning from the user of the electronic device,
the text conversion part converts the voice message into a text message, the voice playing control part controls the playing part to play the text message converted by the text conversion part through voice,
the detection unit detects whether or not the sound characteristic quantity of the subsequent voice message is equal to or less than the play permission threshold, the character conversion unit converts the subsequent voice message into a character message when the detection unit detects that the sound characteristic quantity of the subsequent voice message is equal to or less than the play permission threshold, and the voice play control unit controls the play unit to play the character message converted by the character conversion unit by voice.
8. The electronic device of claim 7,
when the detection unit detects that the message indicating the specific meaning is accepted from the user of the electronic device after the voice message is received, the playback unit plays the text message converted by the text conversion unit by voice without transmitting the message indicating the specific meaning from the electronic device to the contact object that has transmitted the voice message.
9. The electronic device of any of claims 1-8,
the electronic device is an in-vehicle device.
10. A method for controlling the playing of a voice message, comprising:
a detection step of detecting, for a received voice message, whether or not a sound characteristic quantity of the voice message is equal to or less than a predetermined play-permitted threshold;
a text conversion step of converting the voice message into a text message when the sound characteristic quantity of the voice message detected by the detection step is less than or equal to the play permission threshold; and
a voice playing control step of controlling to play the text message converted by the text conversion step by voice.
CN202010074142.8A 2020-01-22 2020-01-22 Electronic device and play control method Active CN113163053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010074142.8A CN113163053B (en) 2020-01-22 2020-01-22 Electronic device and play control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010074142.8A CN113163053B (en) 2020-01-22 2020-01-22 Electronic device and play control method

Publications (2)

Publication Number Publication Date
CN113163053A true CN113163053A (en) 2021-07-23
CN113163053B CN113163053B (en) 2024-05-28

Family

ID=76881529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010074142.8A Active CN113163053B (en) 2020-01-22 2020-01-22 Electronic device and play control method

Country Status (1)

Country Link
CN (1) CN113163053B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030000400A (en) * 2001-06-25 2003-01-06 주식회사 보이스텍 Method and apparatus for real- time modification of audio play speed
KR20120126649A (en) * 2011-05-12 2012-11-21 주식회사 유피아이케이 Method, system and recording medium for prviding conversation contents
CN104285428A (en) * 2012-05-08 2015-01-14 三星电子株式会社 Method and system for operating communication service
US20150317979A1 (en) * 2014-04-30 2015-11-05 Samsung Electronics Co., Ltd. Method for displaying message and electronic device
KR20160008311A (en) * 2014-07-14 2016-01-22 박철 Voice recognition and character expression for voice mesage
CN106210323A (en) * 2016-07-13 2016-12-07 广东欧珀移动通信有限公司 A kind of speech playing method and terminal unit
CN106448665A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Voice processing device and method
CN106847256A (en) * 2016-12-27 2017-06-13 苏州帷幄投资管理有限公司 A kind of voice converts chat method
CN108831475A (en) * 2018-05-24 2018-11-16 广州市千钧网络科技有限公司 A kind of text message extracting method and system
CN110033769A (en) * 2019-04-23 2019-07-19 努比亚技术有限公司 A kind of typing method of speech processing, terminal and computer readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030000400A (en) * 2001-06-25 2003-01-06 주식회사 보이스텍 Method and apparatus for real- time modification of audio play speed
KR20120126649A (en) * 2011-05-12 2012-11-21 주식회사 유피아이케이 Method, system and recording medium for prviding conversation contents
CN104285428A (en) * 2012-05-08 2015-01-14 三星电子株式会社 Method and system for operating communication service
US20150317979A1 (en) * 2014-04-30 2015-11-05 Samsung Electronics Co., Ltd. Method for displaying message and electronic device
KR20160008311A (en) * 2014-07-14 2016-01-22 박철 Voice recognition and character expression for voice mesage
CN106210323A (en) * 2016-07-13 2016-12-07 广东欧珀移动通信有限公司 A kind of speech playing method and terminal unit
CN106448665A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Voice processing device and method
CN106847256A (en) * 2016-12-27 2017-06-13 苏州帷幄投资管理有限公司 A kind of voice converts chat method
CN108831475A (en) * 2018-05-24 2018-11-16 广州市千钧网络科技有限公司 A kind of text message extracting method and system
CN110033769A (en) * 2019-04-23 2019-07-19 努比亚技术有限公司 A kind of typing method of speech processing, terminal and computer readable storage medium

Also Published As

Publication number Publication date
CN113163053B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
JP6489563B2 (en) Volume control method, system, device and program
CN113138743B (en) Keyword group detection using audio watermarking
US9756439B2 (en) Method and devices for outputting an audio file
MX2008016354A (en) Detecting an answering machine using speech recognition.
CN103973877A (en) Method and device for using characters to realize real-time communication in mobile terminal
JP6904357B2 (en) Information processing equipment, information processing methods, and programs
US8868419B2 (en) Generalizing text content summary from speech content
JP2007049657A (en) Automatic answering telephone apparatus
CN109348048B (en) Call message leaving method, terminal and device with storage function
CN109347980B (en) Method, medium, device and computing equipment for presenting and pushing information
CN103916511A (en) Information processing method and electronic equipment
KR101643808B1 (en) Method and system of providing voice service using interoperation between application and server
CN104980583A (en) Event reminding method and terminal
CN113163053B (en) Electronic device and play control method
CN112399638B (en) Communication connection establishment method, storage medium and equipment
CN101827153A (en) System and method for information interaction
US11244697B2 (en) Artificial intelligence voice interaction method, computer program product, and near-end electronic device thereof
CN108281145B (en) Voice processing method, voice processing device and electronic equipment
JP5427102B2 (en) Message communication system
CN113271491B (en) Electronic device and play control method
CN104967728A (en) Voice communication method
CN104869240A (en) Terminal
CN104683550A (en) Information processing method and electronic equipment
KR101621136B1 (en) Method and communication terminal of providing voice service using illumination sensor
CN113571038B (en) Voice dialogue method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant