WO2022143888A1 - 音频处理方法、装置及电子设备 - Google Patents

音频处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2022143888A1
WO2022143888A1 PCT/CN2021/143036 CN2021143036W WO2022143888A1 WO 2022143888 A1 WO2022143888 A1 WO 2022143888A1 CN 2021143036 W CN2021143036 W CN 2021143036W WO 2022143888 A1 WO2022143888 A1 WO 2022143888A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
segment
clip
spliced
playback
Prior art date
Application number
PCT/CN2021/143036
Other languages
English (en)
French (fr)
Inventor
续路波
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Priority to KR1020237025661A priority Critical patent/KR20230125284A/ko
Priority to EP21914619.8A priority patent/EP4273863A4/en
Publication of WO2022143888A1 publication Critical patent/WO2022143888A1/zh
Priority to US18/343,055 priority patent/US20230343325A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B2020/10935Digital recording or reproducing wherein a time constraint must be met
    • G11B2020/10972Management of interruptions, e.g. due to editing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B2020/10935Digital recording or reproducing wherein a time constraint must be met
    • G11B2020/10981Recording or reproducing data when the data rate or the relative speed between record carrier and transducer is variable

Definitions

  • the present application belongs to the field of audio technology, and in particular relates to an audio processing method, device and electronic device.
  • the sending and receiving of audio messages is relatively common, especially when it is inconvenient for users to enter and read text.
  • the audio message is interrupted before the audio message is played, when the user wants to continue listening to the audio message, it is often necessary to replay the audio message from the beginning to the end, which is a waste of time. .
  • the interruption position of audio playback is detected, and the remaining audio is marked based on the interruption position, which is convenient for listening next time.
  • the inventor found that there are at least the following problems in the prior art: starting to mark the remaining audio based on the interruption position of the audio playback, it is easy to cause the content of the marked remaining audio to have poor integrity. sentence.
  • the purpose of the embodiments of the present application is to provide an audio processing method, apparatus and electronic device, which can solve the problem in the prior art that the content of the marked remaining audio is poor in integrity due to starting to mark the remaining audio based on the interruption position of audio playback.
  • an embodiment of the present application provides an audio processing method, the method comprising:
  • the first audio determines the first audio according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each silent segment of the first audio
  • the first position of the audio wherein the first position is the segment position or the end position of the silent segment of the first audio segment of the first audio, and the first audio segment is the start position of the first audio to the audio segment between the playback interruption position of the first audio;
  • an audio processing device comprising:
  • the first determination module is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each mute segment of the first audio. position, to determine the first position of the first audio, where the first position is the segmented position or the end position of the mute segment of the first audio segment of the first audio, and the first audio segment is the The audio segment between the start position of the first audio and the playback interruption position of the first audio;
  • a segmentation module configured to segment the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the The audio segment is between the end position of the first audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
  • embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .
  • an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.
  • a computer program product is provided, the computer program product is stored in a non-volatile storage medium, the computer program product is executed by at least one processor to implement the method of the first aspect.
  • the interruption position of the playback of the first audio, each segment position of the first audio, and each silent segment of the first audio determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio; divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein, The second audio segment is an audio segment between the first position of the first audio and the end position of the first audio, and the third audio segment is the start position of the first audio to the first audio segment.
  • An audio segment between first positions of audio Since the audio is divided based on the segment position determined according to the playback interruption position or the end position of the silent segment, the integrity of the segmented audio segment can be improved.
  • FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a segmented sentence position and a position of a silent segment of audio provided by an embodiment of the present application;
  • FIG. 3 is a schematic diagram before and after audio splicing provided by an embodiment of the present application.
  • FIG. 5 is a structural diagram of an audio processing apparatus provided by an embodiment of the present application.
  • FIG. 6 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a structural diagram of an electronic device provided by another embodiment of the present application.
  • first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
  • the objects are usually of one type, and the number of objects is not limited.
  • the first object may be one or more than one.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application. As shown in FIG. 1, the audio processing method includes the following steps:
  • Step 101 In the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio and the position of each silent segment of the first audio, determine the The first position of the first audio, wherein the first position is the segment position or the end position of the silence fragment of the first audio clip, and the first audio clip is the first audio The audio segment between the start position of the first audio and the playback interruption position of the first audio.
  • the above-mentioned first audio may be any audio, for example, an audio message, an audio file, or an audio part in a video.
  • the above-mentioned playback interruption position can be understood as the first audio playback interruption time point or playback interruption time. For example, if the playback is interrupted when the first audio is played to the 5th second, the above-mentioned playback interruption position is the 5th second of the first audio.
  • the above-mentioned first audio segment may refer to an audio segment between the start position of the first audio and the playback interruption position of the first audio, that is, an audio segment that has been played in the first audio.
  • the above sentence segmentation position may refer to the segmentation position of each sentence in the first audio. It should be noted that, the above sentence segmentation position may be understood as a sentence fragmentation time point or a sentence fragmentation moment of the audio.
  • the above-mentioned silent segment may refer to a silent portion in the first audio.
  • the position of the silent segment may include a start position and an end position of the silent segment.
  • start position of the silence segment may be understood as the start time point or start time of the silence segment
  • end position of the silence segment may be understood as the end time point or end time of the silence segment.
  • each mute segment in the first audio can be detected by a voice activity detection (Voice Activity Detection, VAD) algorithm, wherein the above-mentioned VAD algorithm can divide each frame of the audio into two categories, one is a mute frame (that is, a mute frame). sil frame), one type is a non-silent frame, wherein the classification algorithm adopted by the above VAD algorithm may include a filter algorithm or a neural network algorithm.
  • a silent portion of the first audio whose duration exceeds a preset duration may be determined as a silent segment.
  • the above-mentioned preset duration can be set reasonably according to actual requirements, for example, 1 second, 1.5 seconds, or 2 seconds.
  • each segment position and the position of each silent segment of the first audio may be pre-marked, for example, as shown in FIG. 2 , the audio part marked as sil is a silent segment.
  • the audio part marked as sil is a silent segment.
  • a segment position or the end position of the silence segment may be determined as the first position from each segment position of the first audio segment and the end position of each silence segment based on the playback interruption position of the first audio. For example, the segment position that is closest to the playback interruption position in the first audio clip or the end position of the mute segment can be used as the first position, or the segment position or mute position that is less than a preset distance from the playback interruption position in the first audio segment can be used as the first position. The end position of the segment as the first position and so on.
  • Step 102 Divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the first audio segment The audio segment is between the end positions of the audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
  • the first audio can be divided into a third audio segment (that is, the audio segment that has been played) and a second audio segment (that is, the audio segment that has not been played) according to the first position, so that the user needs to continue listening to the first audio.
  • the above-mentioned second audio clip can be directly listened to, which saves the user's time.
  • the first audio is divided based on the segment position closest to the playback interruption position of the first audio or the end position of the mute clip, it is possible to improve the performance. The integrity of the resulting audio segment after segmentation.
  • the audio processing method in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segmented position of the first audio, and each of the first audio The position of the silent segment, to determine the first location of the first audio, where the first location is the segment position of the first audio segment of the first audio or the end position of the silent segment, the first audio The segment is an audio segment between the start position of the first audio and the playback interruption position of the first audio; the first audio is divided according to the first position to obtain a second audio segment and a third audio segment, In this way, the integrity of the audio segment obtained after the segmentation can be improved, so that the user can more easily understand the content of the audio while continuing to listen to the second audio segment.
  • the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
  • the segment position that is closest to the playback interruption position in the first audio clip or the end position of the mute clip may be used as the first position. For example, you can view each word in the played audio segment before the playback interruption position. If the end position of the previous word at the playback interruption position is the sentence interruption position, or the previous word is a silent word, you can put the end of the previous word The moment is used as the split position of the audio. For example, as shown in FIG. 2, if the playback is interrupted when it reaches "My name", and the preceding word is a silent word (ie sil), the end position of the silent word can be taken as the first position, that is, The split position of the first audio.
  • the first audio segment is segmented by using the segment position closest to the playback interruption position or the end location of the mute segment as the first location, which not only improves the integrity of the segmented audio segment, but also improves the integrity of the segmented audio segment.
  • the accuracy of segmentation of the played part and the unplayed part of the first audio can be improved.
  • the method in the case of monitoring that the playback of the first audio is interrupted, before identifying the first position in the first audio segment of the first audio that is closest to the position where the playback of the first audio is interrupted, the method also includes:
  • each sentence fragmentation position of the first audio is determined.
  • the first audio can be converted into text through a speech recognition algorithm, and the audio position corresponding to each word in the text can be marked, for example, the start time point and end time of each word in the text in the first audio can be marked point.
  • a text segmentation algorithm can be used to mark punctuation marks for the text, for example, commas, periods, question marks, exclamation marks, blanks and other symbols, wherein, marked as blank means that the process is not performed here.
  • Sentence segmentation otherwise, it means that sentence segmentation is required here.
  • the above-mentioned text segmentation algorithm may be a classification algorithm based on N labeled text training, by classifying the end of each word in the above-mentioned text, wherein the classification category may include commas, periods, question marks, and exclamation marks. , blank and other symbols, the value of the above N is often large, for example, 5000, 10000, 50000, etc., which can be set reasonably according to actual needs.
  • the above text segmentation algorithm can include Conditional Random Field (CRF) algorithm or Neural network algorithms, etc., are not limited in this embodiment.
  • each segment position of the text can be obtained.
  • each segment position of the first audio can be obtained in combination with the audio position corresponding to each word in the marked text. For example, as shown in Figure 2, there is an exclamation mark after the word "hello" in the text, and the end time point of the word in the audio is the second second, then the second second of the audio can be determined as a segment position .
  • the accuracy of the sentence segmentation processing result can be improved.
  • the audio position corresponding to each word in the text is marked, and the sentence segmentation processing is based on the audio position corresponding to each word in the text and the text. As a result, the segmented position of the audio is determined, which is simple and convenient to implement.
  • the first audio is an audio message
  • the method further includes at least one of the following:
  • de-duplication processing is performed on the post-audio and the second audio segment, and the de-duplicated post-audio and the second audio are de-duplicated.
  • the segments are spliced to obtain the first spliced audio; wherein, the post-audio is an audio message after the first audio, and the audio object corresponding to the post-audio is the same as the audio object corresponding to the first audio ;
  • de-duplication processing is performed on the pre-audio and the third audio segment, and the de-duplicated pre-audio and the third audio are de-duplicated.
  • the segments are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the audio object corresponding to the first audio .
  • the above-mentioned first audio may be an audio message transmitted through an instant messaging application.
  • the above audio object can be understood as the speaking object of the audio.
  • the above-mentioned pre-audio can be understood as an audio message whose corresponding audio object is the same as the audio object corresponding to the first audio, and is located after the first audio and adjacent to the first audio. Specifically, if the next message of the first audio is an audio message and the audio object corresponding to the audio message is the same as the audio object of the first audio, it is determined that the first audio has post-audio; otherwise, it is determined that the first audio does not have post-audio audio.
  • audio message A has post-audio, that is, audio message B; if the latter message of audio message A is not an audio message, or the latter message of audio message A is audio message B but the audio object corresponding to audio message B is different from the audio object corresponding to audio message A, it can be determined that the audio There is no post audio for message A.
  • the above-mentioned pre-audio may be understood as an audio message whose corresponding audio object is the same as the audio object corresponding to the first audio, and is located before and adjacent to the first audio. Specifically, if the previous message of the first audio is an audio message and the audio object corresponding to the audio message is the same as the audio object of the first audio, it is determined that the first audio has pre-audio; otherwise, it is determined that the first audio does not have pre-audio audio.
  • audio message A has pre-audio, that is, audio message C; if the previous message of the audio message A is not an audio message, or the previous message of the audio message A is an audio message C but the audio object corresponding to the audio message C is different from the audio object corresponding to the audio message A, then determine the audio message A Pre-audio does not exist.
  • an audio message is usually short, for example, the longest is 60 seconds, and it is often difficult to fully express the content that the user needs to convey. Therefore, the user often expresses the content that the user needs to convey by sending multiple consecutive audio messages.
  • the second audio segment in the first audio is spliced with the post-audio
  • the third audio segment in the first audio is spliced with the pre-audio, so that the user can listen to a relatively complete audio content based on the spliced audio. , which is convenient for users to operate.
  • performing de-duplication processing before audio splicing can improve the smoothness of audio splicing.
  • the time interval between the transmission time of the above-mentioned post-audio and the transmission time of the above-mentioned first audio is less than the first preset time interval, or the transmission time of the above-mentioned post-audio and the transmission time of the above-mentioned first audio are within the same day,
  • the time interval between the transmission time of the above-mentioned pre-audio and the transmission time of the above-mentioned first audio is less than the second preset time interval, or the transmission time of the above-mentioned pre-audio is within the same day as the transmission time of the above-mentioned first audio.
  • Two unrelated audio messages are spliced together.
  • the above-mentioned first preset time interval and second preset time interval can be reasonably set according to actual requirements, for example, 10 minutes, 5 minutes, and the like.
  • the above-mentioned transmission time may include sending time and receiving time.
  • the performing de-duplication processing on the post-position audio and the second audio segment may include:
  • the performing de-duplication processing on the pre-audio and the third audio segment includes:
  • the above-mentioned audio clip located before the first segment position of the post-audio or the position of the first mute clip, that is, the above-mentioned fourth audio clip, for example, as shown in FIG. 2 the first audio clip of the post-audio A segment position is the end position of "I'm looking for you", then the fourth audio segment above is the audio segment corresponding to "I'm looking for you” in the post-audio.
  • the fourth audio clip may be deleted from the post-audio.
  • the audio segment corresponding to "I'm looking for you” in the first audio segment is deleted, and the second audio segment of the fifth audio segment is deleted and the post-audio splicing is performed.
  • the second audio segment and the post-audio may be directly spliced.
  • the audio segment located after the position of the last sentence segment or the position of the last silent segment of the preceding audio is the above-mentioned sixth audio segment.
  • the first segment position of the first audio is the end position of "Hello”
  • the above-mentioned seventh audio segment is the audio segment corresponding to "Hello" in the first audio.
  • the sixth audio clip can be deleted from the pre-audio, and the pre-audio of the deleted sixth audio clip and the Three audio clips are spliced; or the seventh audio clip is deleted from the third audio clip, and the third audio clip and the preceding audio clip of the seventh audio clip will be deleted.
  • the third audio clip and the preceding audio can be directly spliced.
  • This embodiment determines the repeated audio of the post-audio and the second audio segment based on the position of the first segment of the post-audio or the position of the first silent segment and the position of the last segment of the second audio segment or the position of the last silent segment segment, and based on the position of the last segment of the pre-audio or the position of the last silent segment and the position of the first segment of the third audio segment or the position of the first silent segment to determine the repeating audio of the pre-audio and the third audio segment clips, duplicate audio clips can be determined quickly and accurately, and the speed and accuracy of deduplication processing can be improved.
  • the method further includes:
  • the method further includes:
  • the first spliced audio when the first spliced audio is obtained, the first spliced audio may be displayed in the message display window, and the display of the rear audio and the second audio segment is cancelled, wherein the first spliced audio
  • the spliced audio is marked as unread, and the first playback speed adjustment mark is displayed on the first spliced audio
  • the second spliced audio when the second spliced audio is obtained, the second spliced audio can be displayed in the message display window, and canceled
  • the pre-audio and the third audio clip are displayed, wherein the second spliced audio is marked as read, and a second playback speed adjustment mark is displayed on the second spliced audio, for example, as shown in Figure 3 .
  • the above-mentioned first playback speed adjustment mark is used to adjust the playback speed of the first spliced audio, and can adjust the playback speed of the first spliced audio to be the same as the above-mentioned first input when the first input of the first playback speed adjustment mark is received.
  • the above-mentioned second playback speed adjustment mark is used to adjust the playback speed of the second spliced audio, and can adjust the playback speed of the second spliced audio to be the same as the above-mentioned second in the case of receiving the second input of the second playback speed adjustment mark.
  • the playback speed corresponding to the playback speed adjustment flag is used to adjust the playback speed of the first spliced audio, and can adjust the playback speed of the first spliced audio to be the same as the above-mentioned first input when the first input of the first playback speed adjustment mark is received.
  • both the above-mentioned first playback speed adjustment mark and the second playback speed adjustment mark may include at least one playback speed sub-sign, and each playback speed sub-sign corresponds to a playback speed.
  • the above-mentioned first playback speed adjustment mark and Each of the second playback speed adjustment identifiers may include at least one of a playback speed sub-id for 1.5-multiple playback, a playback speed sub-ID for 2-times playback, and a playback speed sub-ID for 3-multiple playback.
  • the first spliced audio is displayed in the message display window, and the cancellation is cancelled.
  • displaying the rear audio and the second audio clip wherein the first spliced audio is marked as unread, and a first playback speed adjustment mark is displayed on the first spliced audio;
  • the second audio clip can be directly marked as unread, and a third playback speed adjustment mark is displayed on the second audio clip, the third playback speed adjustment The mark can be used to adjust the playback speed of the second audio clip; if the first audio does not have pre-audio, the third audio clip can be directly marked as read, and the fourth playback speed is displayed on the third audio clip Adjustment mark, the fourth playback speed adjustment mark can be used to adjust the playback speed of the third audio clip, which not only facilitates the user to intuitively distinguish the played audio clip from the unplayed audio clip, so that the playback selection can be made quickly, and the adjustment can be flexibly The playback speed of the audio message, saving the user's time in listening to the audio message.
  • the text corresponding to the target audio when a playback input for target audio is received, the text corresponding to the target audio can be identified; the text corresponding to the target audio can be semantically understood, and the target can be determined based on the semantic understanding result. Background music; playing the target background music during the process of playing the target audio.
  • the above-mentioned target audio may be any audio or audio segment or spliced audio.
  • the above-mentioned target background music may be music that matches the semantic understanding result of the text corresponding to the target audio. For example, if the semantic understanding result of the text corresponding to the target audio indicates that the target audio is a birthday wish, the above-mentioned target background music may be related to birthday wishes. Song, if the semantic understanding result of the text corresponding to the target audio indicates that the target audio is a travel plan, the above-mentioned target background music may be a travel-related song.
  • FIG. 4 is a flowchart of an audio processing method provided by another embodiment of the present application.
  • the audio processing method includes the following steps:
  • Step 201 Play the first audio.
  • Step 202 whether to pause playing the first audio.
  • the interrupted playback position of the first audio may be recorded, and step 203 is performed, otherwise, the playback of the first audio is continued.
  • Step 203 Detect each silent segment in the first audio through a voice activity detection algorithm.
  • each frame of the audio can be divided into two categories by the VAD algorithm, one is silent frame (ie, sil frame), and the other is non-silent frame, and the silent part whose duration exceeds the preset duration is determined as a silent segment.
  • silent frame ie, sil frame
  • non-silent frame the silent part whose duration exceeds the preset duration is determined as a silent segment.
  • Step 204 Recognize the text corresponding to the first audio through a speech recognition algorithm, and mark the audio positions corresponding to each word in the text.
  • the text corresponding to the first audio is recognized by a speech recognition algorithm, and the audio positions corresponding to each word in the text are marked, that is, the start time and end time of each word in the first audio are marked.
  • Step 205 Perform sentence segmentation processing on the text through a text segmentation algorithm, and determine each sentence fragmentation position of the first audio in combination with the audio positions corresponding to each word in the text.
  • punctuation marks can be marked for the text according to the text segmentation algorithm, for example, commas, periods, question marks, exclamation marks, blanks, etc., where marking as blank means that sentence segmentation is not performed here, otherwise, it means that sentence segmentation is required here, and After obtaining each sentence segment position of the text, in combination with the audio position corresponding to each word in the marked text, each sentence fragment position of the first audio can be obtained.
  • Step 206 Determine the segmentation position according to the playback interruption position of the first audio, the position of each mute segment in the first audio, and each segmented position of the first audio.
  • the segment position or the end position of the mute segment that is closest to the playback interruption position in the played audio segment of the first audio may be searched for. For example, you can view each word in the played audio segment before the playback interruption position. If the end position of the previous word at the playback interruption position is the sentence interruption position, or the previous word is a silent word, you can put the end of the previous word The moment is used as the split position of the audio.
  • Step 207 Divide the first audio according to the division position to obtain a second audio segment and a third audio segment.
  • the second audio segment is an audio segment between the split position of the first audio and the end position of the first audio
  • the third audio segment is the start position of the first audio to the first audio segment. Audio clips between split positions of the audio.
  • Step 208 Determine whether the first audio has pre-audio and post-audio.
  • step 210 if the first audio has pre-audio and post-audio, go to step 210; if the first audio does not have pre-audio and post-audio, go to step 209; if the first audio exists In the case of post-audio but no pre-audio, the third audio clip can be marked as read, and the second audio clip and post-audio can be de-duplicated and spliced to obtain the first spliced audio; When the audio has pre-audio but no post-audio, the second audio segment can be marked as unread, and the third audio segment and pre-audio are deduplicated and spliced to obtain the second spliced audio.
  • Step 209 Mark the second audio clip as unread, and mark the third audio clip as read.
  • Step 210 Perform de-duplication processing and splicing on the second audio segment and the post-audio to obtain the first spliced audio, and perform de-duplication processing and splicing on the third audio segment and the pre-audio to obtain the second spliced audio.
  • Step 211 Mark the first spliced audio as unread, and mark the second spliced audio as unread.
  • the embodiment of the present application can automatically adjust the playback interruption point through the voice activity detection algorithm, the speech recognition algorithm, and the text segmentation algorithm, so that the audio behind the interruption point is relatively complete, and it is convenient to listen to the last audio next time.
  • the embodiment of the present application can remove repeated audios during the audio splicing process, which can increase the smoothness of splicing two audios, which is convenient for listening.
  • the execution body may be an audio processing apparatus, or, or a control module in the audio processing apparatus for executing the loading audio processing method.
  • the audio processing device provided by the embodiment of the present application is described by taking the audio processing device performing the loading audio processing method as an example.
  • FIG. 5 is a structural diagram of an audio processing apparatus provided by an embodiment of the present application. As shown in FIG. 5, the audio processing apparatus 500 includes:
  • the first determination module 501 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segment position of the first audio, and each silent segment of the first audio , determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio;
  • a segmentation module 502 configured to segment the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the The audio segment is between the end position of the first audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
  • the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
  • the device further includes:
  • the identification module is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each silent segment of the first audio. position, before determining the first position of the first audio, identify the text corresponding to the first audio;
  • a marking module for marking the audio position corresponding to each word in the text
  • a sentence segmentation module used to perform sentence segmentation processing on the text to obtain a sentence segmentation processing result
  • the second determination module is configured to determine each segmented position of the first audio according to the segmented processing result and the audio position corresponding to each word in the text.
  • the first audio is an audio message
  • the device further includes at least one of the following:
  • the first processing module is configured to divide the first audio according to the first position to obtain the second audio segment and the third audio segment, and in the case that the first audio has post-audio
  • the post audio and the second audio segment are de-duplicated, and the de-duplicated post-audio and the second audio segment are spliced to obtain the first spliced audio;
  • the audio is the next audio message of the first audio, and the audio object corresponding to the rear audio is the same as the audio object corresponding to the first audio;
  • the second processing module is configured to perform de-duplication processing on the pre-audio and the third audio segment when the first audio has pre-audio, and deduplicate the pre-audio after the de-duplication process.
  • the audio and the third audio segment are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the first audio.
  • the audio objects corresponding to one audio are the same.
  • the first processing module is specifically used for:
  • the second processing module is specifically used for:
  • the device further includes:
  • the first display module is used for splicing the deduplicated post audio and the second audio segment to obtain the first splicing audio, displaying the first splicing audio in the message display window, and displaying the first splicing audio in the message display window.
  • the second display module is used for splicing the deduplicated pre-audio and the third audio segment to obtain the second splicing audio, displaying the second splicing audio in the message display window, and displaying the second splicing audio. Cancel the display of the pre-audio and the third audio clip, wherein the second spliced audio is marked as read, and a second playback speed adjustment mark is displayed on the second spliced audio.
  • the audio processing apparatus in this embodiment of the present application may be an apparatus, and may also be a component, an integrated circuit, or a chip in a terminal.
  • the apparatus may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook, or a personal digital assistant (Personal Digital Assistant).
  • UMPC Ultra-Mobile Personal Computer
  • netbook or a personal digital assistant (Personal Digital Assistant).
  • non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (Personal Computer, PC), television (Television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • Network Attached Storage NAS
  • PC Personal Computer
  • TV Television, TV
  • teller machine or self-service machine etc.
  • the audio processing apparatus in this embodiment of the present application may be an apparatus having an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
  • the audio processing apparatus provided in the embodiment of the present application can implement each process in the foregoing method embodiment, which is not repeated here to avoid repetition.
  • the first determination module 501 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio and each segment of the first audio the position and the position of each silent segment of the first audio, to determine the first position of the first audio, where the first position is the segment position or the silent segment of the first audio segment of the first audio.
  • the end position of the first audio segment is the audio segment between the start position of the first audio and the playback interruption position of the first audio; the segmentation module 502 is used for segmenting the first audio according to the first position.
  • the first audio to obtain a second audio segment and a third audio segment
  • the second audio segment is an audio segment between the first position of the first audio and the end position of the first audio
  • the third audio segment is an audio segment between the start position of the first audio and the first position of the first audio. Since the audio is divided based on the segment position determined according to the playback interruption position or the end position of the silent segment, the integrity of the segmented audio segment can be improved.
  • an embodiment of the present application further provides an electronic device 600, including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
  • an electronic device 600 including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
  • the program or instruction is executed by the processor 601
  • each process of the above audio processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, details are not described here.
  • the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 7 is a structural diagram of an electronic device provided by another embodiment of the present application.
  • the electronic device 700 includes but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input Unit 704, sensor 705, display unit 706, user input unit 707, interface unit 708, memory 709, processor 710 and other components.
  • the electronic device 700 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 710 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions.
  • a power source such as a battery
  • the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device.
  • the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
  • the processor 710 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segment position of the first audio, and each silent segment of the first audio , determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio; divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein, The second audio segment is an audio segment between the first position of the first audio and the end position of the first audio, and the third audio segment is the start position of the first audio to the first audio segment. An audio segment between first positions of audio.
  • the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
  • processor 710 is further configured to:
  • each sentence fragmentation position of the first audio is determined.
  • processor 710 is further configured to:
  • the first audio is an audio message, and after the first audio is divided according to the first position to obtain the second audio segment and the third audio segment, in the case that the first audio has post-audio, Perform de-duplication processing on the post audio and the second audio segment, and splicing the de-duplicated post audio and the second audio segment to obtain the first spliced audio; wherein the The post-audio is the next audio message of the first audio, and the audio object corresponding to the post-audio is the same as the audio object corresponding to the first audio;
  • de-duplication processing is performed on the pre-audio and the third audio segment, and the de-duplicated pre-audio and the third audio are de-duplicated.
  • the segments are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the audio object corresponding to the first audio .
  • processor 710 is further configured to:
  • the display unit 706 is used for:
  • the first spliced audio is displayed in the message display window, and the post-audio is cancelled. and the second audio clip, wherein the first spliced audio is marked as unread, and a first playback speed adjustment mark is displayed on the first spliced audio;
  • the input unit 704 may include a graphics processor (Graphics Processing Unit, GPU) 7041 and a microphone 7042. Such as camera) to obtain still pictures or video image data for processing.
  • the display unit 706 may include a display panel 7061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 707 includes a touch panel 7071 and other input devices 7072 .
  • the touch panel 7071 is also called a touch screen.
  • the touch panel 7071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 7072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • Memory 709 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems.
  • the processor 710 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the like, and the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 710.
  • the embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium.
  • a program or an instruction is stored on the readable storage medium.
  • the processor is the processor in the electronic device described in the foregoing embodiments.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the audio processing method embodiments described above.
  • the chip includes a processor and a communication interface
  • the communication interface is coupled to the processor
  • the processor is configured to run a program or an instruction to implement the audio processing method embodiments described above.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
  • the embodiment of the present application further provides an electronic device, which is configured to execute each process of the above-mentioned audio processing method embodiment, and can achieve the same technical effect. To avoid repetition, details are not described here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了一种音频处理方法、装置及电子设备,属于音频技术领域。该方法包括:在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;根据所述分割位置分割所述第一音频,得到第二音频片段和第三音频片段。

Description

音频处理方法、装置及电子设备
相关申请的交叉引用
本申请主张在2020年12月30日在中国提交的中国专利申请No.202011604816.7的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于音频技术领域,具体涉及一种音频处理方法、装置及电子设备。
背景技术
在通过社交软件通信的过程中,音频消息的发送和接收较为常见,特别是在用户不方便输入文本和阅读文本的情况下。目前,在播放音频消息的过程中,若该音频消息在未播放完的情况下播放中断了,当用户想要继续收听该音频消息时,往往需要重新从头到尾播放该音频消息,较为浪费时间。
针对上述问题,现有技术中通过检测音频播放的中断位置,基于该中断位置对剩余音频进行标记,方便下次收听。然而,在实现本申请过程中,发明人发现现有技术中至少存在如下问题:基于音频播放的中断位置开始标记剩余音频,容易导致标记的剩余音频的内容的完整性较差,例如,包含半个句子。
发明内容
本申请实施例的目的是提供一种音频处理方法、装置及电子设备,能够解决现有技术中基于音频播放的中断位置开始标记剩余音频导致标记的剩余音频的内容的完整性较差的问题。
为了解决上述技术问题,本申请是这样实现的:
第一方面,本申请实施例提供了一种音频处理方法,该方法包括:
在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置, 确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;
根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
第二方面,本申请实施例提供了一种音频处理装置,该装置包括:
第一确定模块,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;
分割模块,用于根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
第六方面,提供了一种计算机程序产品,所述计算机程序产品被存储在非易失的存储介质中,所述计算机程序产品被至少一个处理器执行以实现如 第一方面所述的方法。
在本申请实施例中,通过在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。由于基于根据播放中断位置确定的断句位置或者静音片段的结束位置分割音频,可以提高分割后得到的音频片段的完整性。
附图说明
图1是本申请一实施例提供的音频处理方法的流程图;
图2是本申请实施例提供的音频的断句位置和静音片段的位置的示意图;
图3是本申请实施例提供的音频拼接前和音频拼接后的示意图;
图4是本申请另一实施例提供的音频处理方法的流程图;
图5是本申请一实施例提供的音频处理装置的结构图;
图6是本申请一实施例提供的电子设备的结构图;
图7是本申请另一实施例提供的电子设备的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数 据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的音频处理方法进行详细地说明。
参见图1,图1是本申请一实施例提供的音频处理方法的流程图,如图1所示,该音频处理方法包括以下步骤:
步骤101、在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段。
本申请实施例中,上述第一音频可以是任意音频,例如,音频消息、音频文件或视频中的音频部分等。上述播放中断位置可以理解为第一音频播放中断时间点或者播放中断时刻,例如,在第一音频播放到第5秒的情况下中断播放,则上述播放中断位置为第一音频的第5秒。上述第一音频片段可以是指第一音频的开始位置至第一音频的播放中断位置之间的音频片段,也即第一音频中已播放的音频片段。
上述断句位置可以是指第一音频中各个句子的分割位置。需要说明的是,上述断句位置可以理解为音频的断句时间点或者断句时刻。
上述静音片段可以是指第一音频中的静音部分。上述静音片段的位置可以包括静音片段的开始位置和结束位置。需要说明的是,上述静音片段的开始位置可以理解为静音片段的开始时间点或者开始时刻,上述静音片段的结束位置可以理解为静音片段的结束时间点或者结束时刻。
可选地,可以通过语音活动检测(Voice Activity Detection,VAD)算法检测第一音频中的各个静音片段,其中,上述VAD算法可以将音频的每一帧 分成两类,一类是静音帧(即sil帧),一类是非静音帧,其中,上述VAD算法所采用的分类算法可以包括滤波器算法或神经网络算法。可选地,可以将所述第一音频中时长超过预设时长的静音部分确定为静音片段。其中,上述预设时长可以根据实际需求进行合理设置,例如,1秒、1.5秒或者2秒等。
可选地,本实施例可以预先标记第一音频的各个断句位置、各个静音片段的位置,例如,如图2所示,其中,标记为sil的音频部分为静音片段。这样方便快速查找第一音频的第一音频片段中距离第一音频的播放中断位置最近的断句位置或者静音片段的结束位置。
该步骤101中,可以基于第一音频的播放中断位置从第一音频片段的各个断句位置和各个静音片段的结束位置中确定一断句位置或静音片段的结束位置作为第一位置。例如,可以将第一音频片段中与播放中断位置距离最近的断句位置或者静音片段的结束位置作为第一位置,或者可以将第一音频片段中与播放中断位置小于预设距离的断句位置或静音片段的结束位置作为第一位置等。
步骤102、根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
该步骤中,可以根据第一位置将第一音频分割为第三音频片段(即已播放音频片段)和第二音频片段(即未播放音频片段),这样用户在需要继续收听第一音频中未播放音频部分的情况下,可以直接收听上述第二音频片段,节省用户时间,此外,由于是基于距离第一音频的播放中断位置最近的断句位置或者静音片段的结束位置分割第一音频,可以提高分割后得到的音频片段的完整性。
本申请实施例的音频处理方法,通过在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的 播放中断位置之间的音频片段;根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,这样可以提高分割后得到的音频片段的完整性,使得用户在继续收听第二音频片段的情况下更容易理解音频的内容。
可选地,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
本实施例中,可以将第一音频片段中与播放中断位置距离最近的断句位置或者静音片段的结束位置作为第一位置。例如,可以查看已播放音频片段中位于播放中断位置之前的每个词,如果播放中断位置的前一个词的结束位置为断句位置,或者前一个词是静音词,则可以把前一个词的结束时刻作为音频的分割位置。例如,如图2中所示,若在播放到“我叫”的时候中断播放,其前面的词是静音词(即sil),则可以将该静音词的结束位置作为第一位置,也即第一音频的分割位置。
本申请实施例通过将第一音频片段中与播放中断位置距离最近的断句位置或者静音片段的结束位置作为第一位置以分割第一音频,不仅可以提高分割后得到的音频片段的完整性,还可以提高第一音频的已播放部分和未播放部分分割的准确性。
可选地,所述在监测到第一音频播放中断的情况下,识别所述第一音频的第一音频片段中距离所述第一音频的播放中断位置最近的第一位置之前,所述方法还包括:
识别所述第一音频对应的文本;
标记所述文本中每个词对应的音频位置;
对所述文本进行断句处理,得到断句处理结果;
根据所述断句处理结果和所述文本中每个词对应的音频位置,确定所述第一音频的各个断句位置。
本实施例中,可以通过语音识别算法将第一音频转换为文本,并标记文本中每个词对应的音频位置,例如,标记文本中每个词在第一音频中的开始时间点和结束时间点。
上述对所述文本进行断句处理,示例地,可以是采用文本断句算法,为该文本标记标点符号,例如,逗号、句号、问号、叹号、空白等符号,其中,标记为空白表示此处不进行断句,否则表示此处需要断句。
可选地,上述文本断句算法可以是一种基于N个标注文本训练得到的分类算法,通过对上述文本中的每个词的末尾进行分类,其中,分类类别可以包括逗号、句号、问号、叹号、空白等符号,上述N的取值往往较大,例如,5000、10000、50000等,具体可以根据实际需求进行合理设置,上述文本断句算法可以包括条件随机场(Conditional Random Field,CRF)算法或神经网络算法等,本实施例对此不做限定。
本实施例通过对所述文本进行断句处理,可以得到文本的各个断句位置,这样结合标记的文本中每个词对应的音频位置,可以得到第一音频的各个断句位置。例如,如图2所示,文本中“你好”这一词后带有叹号,且该词在音频中的结束时间点为第2秒,则可以确定该音频的第2秒为一个断句位置。
本实施例通过将音频转换为文本进行断句处理,可以提高断句处理结果的准确性,此外,通过标注文本中各个词对应的音频位置,并基于文本中各个词对应的音频位置和文本的断句处理结果确定音频的断句位置,实现较为简单和方便。
可选地,所述第一音频为音频消息,所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,所述方法还包括如下至少一项:
在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;
在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对 象相同。
本实施例中,上述第一音频可以为通过即时通信应用传输的音频消息。上述音频对象可以理解为音频的说话对象。
上述前置音频可以理解为对应的音频对象与第一音频对应的音频对象相同,位于第一音频之后且与第一音频相邻的音频消息。具体地,若第一音频的后一条消息为音频消息且该音频消息对应的音频对象与第一音频的音频对象相同,则确定第一音频存在后置音频,否则确定第一音频不存在后置音频。例如,若音频消息A的后一条消息为音频消息B且该音频消息B对应的音频对象与音频消息A对应的音频对象均为用户B,则确定音频消息A存在后置音频,也即音频消息B;若音频消息A的后一条消息不为音频消息,或者音频消息A的后一条消息为音频消息B但该音频消息B对应的音频对象与音频消息A对应的音频对象不同,则可以确定音频消息A不存在后置音频。
上述前置音频可以理解为对应的音频对象与第一音频对应的音频对象相同,位于第一音频之前且与第一音频相邻的音频消息。具体地,若第一音频的前一条消息为音频消息且该音频消息对应的音频对象与第一音频的音频对象相同,则确定第一音频存在前置音频,否则确定第一音频不存在前置音频。例如,若音频消息A的前一条消息为音频消息C且该音频消息C对应的音频对象与音频消息A对应的音频对象均为用户B,则确定音频消息A存在前置音频,也即音频消息C;若音频消息A的前一条消息不为音频消息,或者音频消息A的前一条消息为音频消息C但该音频消息C对应的音频对象与音频消息A对应的音频对象不同,则确定音频消息A不存在前置音频。
实际应用中,一条音频消息通常较短,例如,最长60秒,往往难以完整表达用户需要传达的内容,因此,用户往往是通过发送连续多条音频消息来表达其需要传达的内容。本实施例通过将第一音频中的第二音频片段与后置音频拼接,将第一音频中的第三音频片段与前置音频拼接,这样用户基于拼接音频即可以收听到较为完整的音频内容,方便用户操作。此外,在音频拼接前进行去重处理,可以提高音频拼接的平滑性。
可选地,上述后置音频的传输时间与上述第一音频的传输时间的时间间隔小于第一预设时间间隔,或者上述后置音频的传输时间与上述第一音频的 传输时间位于同一天内,上述前置音频的传输时间与上述第一音频的传输时间的时间间隔小于第二预设时间间隔,或者上述前置音频的传输时间与上述第一音频的传输时间位于同一天内,这样可以减少将不相关的两条音频消息拼接在一起。
其中,上述第一预设时间间隔和第二预设时间间隔均可以根据实际需求进行合理设置,例如,10分钟、5分钟等。需要说明的是,上述传输时间可以包括发送时间和接收时间。
可选地,所述对所述后置音频与所述第二音频片段进行去重处理,可以包括:
获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;
若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;
所述对所述前置音频与所述第三音频片段进行去重处理,包括:
获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;
若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删除所述第七音频片段。
本实施例中,上述位于后置音频的第一个断句位置或者第一个静音片段的位置之前的音频片段,也即上述第四音频片段,例如,如图2所示,后置音频的第一个断句位置为“我找你”的结束位置,则上述第四音频片段为后置音频中对应“我找你”的音频片段。上述位于第三音频片段的最后一个断句位置或者最后一个静音片段的位置之后的音频片段,也即上述第五音频片 段,例如,如图2所示,第一音频的最后一个断句位置为“我找你”的开始位置,则上述第五音频片段为第一音频中对应“我找你”的音频片段。
具体地,在第四音频片段对应的文本与第五音频片段对应的文本相同的情况下,可以从所述后置音频中删除所述第四音频片段,例如,如图2所示,删除后置音频中对应“我找你”的音频片段,并将删除第四音频片段后的后置音频和第二音频片段进行拼接;或者从所述第二音频片段中删除所述第五音频片段,例如,如图2所示,删除第一音频中对应“我找你”的音频片段,并将删除第五音频片段的第二音频片段和后置音频拼接。在第四音频片段对应的文本与第五音频片段对应的文本不相同的情况下,可以直接将第二音频片段和后置音频拼接。
同样地,上述位于前置音频的最后一个断句位置或者最后一个静音片段的位置之后的音频片段,也即上述第六音频片段。上述位于第三音频片段的第一个断句位置或者第一个静音片段的位置之前的音频片段,也即上述第七音频片段。如图2所示,第一音频的第一个断句位置为“你好”的结束位置,则上述第七音频片段为第一音频中对应“你好”的音频片段。
具体地,在第六音频片段对应的文本与第七音频片段对应的文本相同的情况下,可以删除从前置音频中删除第六音频片段,并将删除第六音频片段的前置音频与第三音频片段拼接;或者从第三音频片段中删除第七音频片段,并将删除第七音频片段的第三音频片段和前置音频拼接。在第六音频片段对应的文本与第七音频片段对应的文本不相同的情况下,可以直接将第三音频片段和前置音频拼接。
本实施例基于后置音频的第一个断句位置或者第一个静音片段的位置以及第二音频片段的最后一个断句位置或者最后一个静音片段的位置确定后置音频与第二音频片段的重复音频片段,并基于前置音频的最后一个断句位置或者最后一个静音片段的位置以及第三音频片段的第一个断句位置或者第一个静音片段的位置确定前置音频和第三音频片段的重复音频片段,可以较为快速、准确的确定重复音频片段,进而可以提高去重处理的速度和准确性。
可选地,所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,所述方法还包括:
在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,所述方法还包括:
在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
本实施例中,在得到第一拼接音频的情况下,可以在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;在得到第二拼接音频的情况下,可以在消息显示窗口显示第二拼接音频,并取消显示前置音频和第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识,例如,如图3所示。
上述第一播放速度调整标识用于调整第一拼接音频的播放速度,可以在接收到对第一播放速度调整标识的第一输入的情况下,调整第一拼接音频的播放速度为与上述第一播放速度调整标识对应的播放速度。上述第二播放速度调整标识用于调整第二拼接音频的播放速度,可以在接收到对第二播放速度调整标识的第二输入的情况下,调整第二拼接音频的播放速度为与上述第二播放速度调整标识对应的播放速度。
可选地,上述第一播放速度调整标识和第二播放速度调整标识均可以包括至少一个播放速度子标识,每个播放速度子标识对应一种播放速度,例如,上述第一播放速度调整标识和第二播放速度调整标识均可以包括1.5倍数播放的播放速度子标识、2倍速播放的播放速度子标识和3倍数播放的播放速度子标识中的至少一项。
本申请实施例通过在所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接 音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
在所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。这样不仅方便用户直观分辨已播放音频片段和未播放音频片段,进而可以快速的进行播放选择,还可以灵活的调整音频消息的播放速度,节省用户收听音频消息的时间。
可选地,在第一音频不存在后置音频的情况下,可以直接标记第二音频片段为未读状态,并在第二音频片段上显示第三播放速度调整标识,该第三播放速度调整标识可以用于调整第二音频片段的播放速度;在第一音频不存在前置音频的情况下,可以直接标记第三音频片段为已读状态,并在第三音频片段上显示第四播放速度调整标识,该第四播放速度调整标识可以用于调整第三音频片段的播放速度,不仅方便用户直观分辨已播放音频片段和未播放音频片段,进而可以快速的进行播放选择,还可以灵活的调整音频消息的播放速度,节省用户收听音频消息的时间。
可选地,本申请实施例可以在接收到针对目标音频的播放输入的情况下,识别所述目标音频对应的文本;对所述目标音频对应的文本进行语义理解,并基于语义理解结果确定目标背景音乐;在播放所述目标音频的过程中播放所述目标背景音乐。
其中,上述目标音频可以是任意的音频或音频片段或者拼接音频。上述目标背景音乐可以是与目标音频对应的文本的语义理解结果匹配的音乐,例如,若目标音频对应的文本的语义理解结果指示该目标音频为生日祝福,则上述目标背景音乐可以为生日祝福相关歌曲,若目标音频对应的文本的语义理解结果指示该目标音频为旅游计划,则上述目标背景音乐可以为旅游相关歌曲。
本申请实施例通过在播放音频的过程中播放与该音频的内容相匹配的背景音乐,可以提高音频播放的效果和趣味性。
参见图4,图4是本申请另一实施例提供的音频处理方法的流程图。
如图4所示,本申请实施例提供的音频处理方法包括以下步骤:
步骤201、播放第一音频。
步骤202、是否暂停播放第一音频。
在暂停播放第一音频的情况下,可以记录第一音频的中断播放位置,并执行步骤203,否则继续播放第一音频。
步骤203、通过语音活动检测算法检测第一音频中的各个静音片段。
该步骤中,可以通过VAD算法将音频的每一帧分成两类,一类是静音帧(即sil帧),一类是非静音帧,并将时长超过预设时长的静音部分确定为静音片段。
步骤204、通过语音识别算法识别第一音频对应的文本,并标记文本中各个词对应的音频位置。
该步骤中,通过语音识别算法识别第一音频对应的文本,并标记文本中各个词对应的音频位置,也即标记各个词在第一音频中的开始时间点和结束时间点。
步骤205、通过文本断句算法对文本进行断句处理,并结合文本中各个词对应的音频位置确定第一音频的各个断句位置。
该步骤中,可以根据文本断句算法为该文本标记标点符号,例如,逗号、句号、问号、叹号、空白等符号,其中,标记为空白表示此处不进行断句,否则表示此处需要断句,并可以在得到文本的各个断句位置之后,结合标记的文本中每个词对应的音频位置,可以得到第一音频的各个断句位置。
步骤206、根据第一音频的播放中断位置、第一音频中的各个静音片段的位置和第一音频的各个断句位置确定分割位置。
该步骤中,可以查找第一音频的已播放音频片段(即上述第一音频片段)中与播放中断位置距离最近的断句位置或者静音片段的结束位置。例如,可以查看已播放音频片段中位于播放中断位置之前的每个词,如果播放中断位置的前一个词的结束位置为断句位置,或者前一个词是静音词,则可以把前一个词的结束时刻作为音频的分割位置。
步骤207、根据分割位置分割第一音频,得到第二音频片段和第三音频 片段。
该步骤中,第二音频片段为所述第一音频的分割位置至所述第一音频的结束位置之间的音频片段,第三音频片段为所述第一音频的开始位置至所述第一音频的分割位置之间的音频片段。
步骤208、判断第一音频是否存在前置音频和后置音频。
该步骤中,在第一音频存在前置音频和后置音频的情况下,执行步骤210,在第一音频不存在前置音频和后置音频的情况下,执行步骤209;在第一音频存在后置音频但不存在前置音频的情况下,可以将第三音频片段标记为已读状态,并将第二音频片段和后置音频进行去重处理并拼接得到第一拼接音频;在第一音频存在前置音频但不存在后置音频的情况下,可以将第二音频片段标记为未读状态,并将将第三音频片段与前置音频进行去重处理并拼接得到第二拼接音频。
步骤209、将第二音频片段标记为未读状态,将第三音频片段标记为已读状态。
步骤210、将第二音频片段和后置音频进行去重处理并拼接得到第一拼接音频,将第三音频片段与前置音频进行去重处理并拼接得到第二拼接音频。
该步骤中,将第二音频片段和后置音频进行去重处理以及将第三音频片段与前置音频进行去重处理可以参见前述相关说明,在此不做赘述。
步骤211、将第一拼接音频标记为未读状态,将第二拼接音频标记为未读状态。
本申请实施例通过语音活动检测算法、语音识别算法、文本断句算法,可以自动调整播放中断点,使得中断点后面的音频相对完整,方便下次接着上次的音频来听。此外,本申请实施例可以在音频拼接过程中把重复的音频去除,可以增加两个音频拼接的平滑性,方便收听。
需要说明的是,本申请实施例提供的音频处理方法,执行主体可以为音频处理装置,或者,或者该音频处理装置中的用于执行加载音频处理方法的控制模块。本申请实施例中以音频处理装置执行加载音频处理方法为例,说明本申请实施例提供的音频处理装置。
参见图5,图5是本申请一实施例提供的音频处理装置的结构图,如图5 所示,该音频处理装置500包括:
第一确定模块501,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;
分割模块502,用于根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
可选地,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
可选地,所述装置还包括:
识别模块,用于所述在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置之前,识别所述第一音频对应的文本;
标记模块,用于标记所述文本中每个词对应的音频位置;
断句模块,用于对所述文本进行断句处理,得到断句处理结果;
第二确定模块,用于根据所述断句处理结果和所述文本中每个词对应的音频位置,确定所述第一音频的各个断句位置。
可选地,所述第一音频为音频消息,所述装置还包括如下至少一项:
第一处理模块,用于所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后 置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;
第二处理模块,用于在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
可选地,所述第一处理模块具体用于:
获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;
若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;
所述第二处理模块具体用于:
获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;
若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删除所述第七音频片段。
可选地,所述装置还包括:
第一显示模块,用于所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
第二显示模块,用于所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
本申请实施例中的音频处理装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、上网本或者个人数字助理(Personal Digital Assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(Personal Computer,PC)、电视机(Television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的音频处理装置可以为具有操作***的装置。该操作***可以为安卓(Android)操作***,可以为ios操作***,还可以为其他可能的操作***,本申请实施例不作具体限定。
本申请实施例提供的音频处理装置能够实现上述方法实施例中的各个过程,为避免重复,这里不再赘述。
本申请实施例的音频处理装置500中,第一确定模块501,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;分割模块502,用于根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。由于基于根据播放中断位置确定的断句位置或者静音片段的结束位置分割音频,可以提高分割后得到的音频 片段的完整性。
可选地,如图6所示,本申请实施例还提供一种电子设备600,包括处理器601,存储器602,存储在存储器602上并可在所述处理器601上运行的程序或指令,该程序或指令被处理器601执行时实现上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
参见图7,图7是本申请另一实施例提供的电子设备的结构图,如图7所示,该电子设备700包括但不限于:射频单元701、网络模块702、音频输出单元703、输入单元704、传感器705、显示单元706、用户输入单元707、接口单元708、存储器709、以及处理器710等部件。
本领域技术人员可以理解,电子设备700还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理***与处理器710逻辑相连,从而通过电源管理***实现管理充电、放电、以及功耗管理等功能。图7中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,处理器710,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
可选地,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
可选地,所述处理器710还用于:
所述在监测到第一音频播放中断的情况下,识别所述第一音频的第一音频片段中距离所述第一音频的播放中断位置最近的第一位置之前,识别所述第一音频对应的文本;
标记所述文本中每个词对应的音频位置;
对所述文本进行断句处理,得到断句处理结果;
根据所述断句处理结果和所述文本中每个词对应的音频位置,确定所述第一音频的各个断句位置。
可选地,所述处理器710还用于:
所述第一音频为音频消息,所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;
在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
可选地,所述处理器710还用于:
获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;
若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;
获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段 的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;
若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删除所述第七音频片段。
可选地,所述显示单元706用于:
所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
应理解的是,本申请实施例中,输入单元704可以包括图形处理器(Graphics Processing Unit,GPU)7041和麦克风7042,图形处理器7041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元706可包括显示面板7061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板7061。用户输入单元707包括触控面板7071以及其他输入设备7072。触控面板7071,也称为触摸屏。触控面板7071可包括触摸检测装置和触摸控制器两个部分。其他输入设备7072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器709可用于存储软件程序以及各种数据,包括但不限于应用程序和操作***。处理器710可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器710中。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程 序或指令,该程序或指令被处理器执行时实现上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为***级芯片、***芯片、芯片***或片上***芯片等。
本申请实施例另提供了一种电子设备,被配置为执行上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (17)

  1. 一种音频处理方法,包括:
    在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;
    根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
  2. 根据权利要求1所述的方法,其中,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
  3. 根据权利要求1所述的方法,其中,所述在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置之前,所述方法还包括:
    识别所述第一音频对应的文本;
    标记所述文本中每个词对应的音频位置;
    对所述文本进行断句处理,得到断句处理结果;
    根据所述断句处理结果和所述文本中每个词对应的音频位置,确定所述第一音频的各个断句位置。
  4. 根据权利要求1所述的方法,其中,所述第一音频为音频消息,所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,所述方法还包括:
    在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音 频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;
    在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
  5. 根据权利要求4所述的方法,其中,所述对所述后置音频与所述第二音频片段进行去重处理,包括:
    获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;
    若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;
    所述对所述前置音频与所述第三音频片段进行去重处理,包括:
    获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;
    若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删除所述第七音频片段。
  6. 根据权利要求4所述的方法,其中,所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,所述方法还包括:
    在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所 述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
    所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,所述方法还包括:
    在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
  7. 一种音频处理装置,包括:
    第一确定模块,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;
    分割模块,用于根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
  8. 根据权利要求7所述的装置,其中,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
  9. 根据权利要求7所述的装置,其中,所述装置还包括:
    识别模块,用于所述在监测到第一音频播放中断的情况下,识别所述第一音频的第一音频片段中距离所述第一音频的播放中断位置最近的第一位置之前,识别所述第一音频对应的文本;
    标记模块,用于标记所述文本中每个词对应的音频位置;
    断句模块,用于对所述文本进行断句处理,得到断句处理结果;
    第二确定模块,用于根据所述断句处理结果和所述文本中每个词对应的 音频位置,确定所述第一音频的各个断句位置。
  10. 根据权利要求7所述的装置,其中,所述第一音频为音频消息,所述装置还包括:
    第一处理模块,用于所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;
    第二处理模块,用于在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
  11. 根据权利要求10所述的装置,其中,所述第一处理模块具体用于:
    获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;
    若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;
    所述第二处理模块具体用于:
    获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;
    若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删 除所述第七音频片段。
  12. 根据权利要求10所述的装置,其中,所述装置还包括:
    第一显示模块,用于所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;
    第二显示模块,用于所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
  13. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,其中,所述程序或指令被所述处理器执行时实现如权利要求1-6任一项所述的音频处理方法的步骤。
  14. 一种可读存储介质,所述可读存储介质上存储程序或指令,其中,所述程序或指令被处理器执行时实现如权利要求1-6任一项所述的音频处理方法的步骤。
  15. 一种芯片,包括处理器和通信接口,其中,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-6任一项所述的音频处理方法的步骤。
  16. 一种计算机程序产品,其中,所述计算机程序产品被存储在非易失的存储介质中,所述计算机程序产品被至少一个处理器执行以实现如权利要求1-6任一项所述的音频处理方法的步骤。
  17. 一种电子设备,被配置为执行如权利要求1-6任一项所述的音频处理方法的步骤。
PCT/CN2021/143036 2020-12-30 2021-12-30 音频处理方法、装置及电子设备 WO2022143888A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020237025661A KR20230125284A (ko) 2020-12-30 2021-12-30 오디오 처리 방법, 장치 및 전자기기
EP21914619.8A EP4273863A4 (en) 2020-12-30 2021-12-30 AUDIO PROCESSING METHOD AND APPARATUS AND ELECTRONIC DEVICE
US18/343,055 US20230343325A1 (en) 2020-12-30 2023-06-28 Audio processing method and apparatus, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011604816.7A CN112712825B (zh) 2020-12-30 2020-12-30 音频处理方法、装置及电子设备
CN202011604816.7 2020-12-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/343,055 Continuation US20230343325A1 (en) 2020-12-30 2023-06-28 Audio processing method and apparatus, and electronic device

Publications (1)

Publication Number Publication Date
WO2022143888A1 true WO2022143888A1 (zh) 2022-07-07

Family

ID=75547078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143036 WO2022143888A1 (zh) 2020-12-30 2021-12-30 音频处理方法、装置及电子设备

Country Status (5)

Country Link
US (1) US20230343325A1 (zh)
EP (1) EP4273863A4 (zh)
KR (1) KR20230125284A (zh)
CN (1) CN112712825B (zh)
WO (1) WO2022143888A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712825B (zh) * 2020-12-30 2022-09-23 维沃移动通信有限公司 音频处理方法、装置及电子设备
CN113674724A (zh) * 2021-08-18 2021-11-19 青岛海信移动通信技术股份有限公司 专辑文件的解析文件生成方法和终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038827A (zh) * 2014-06-06 2014-09-10 小米科技有限责任公司 多媒体播放方法及装置
US9182940B1 (en) * 2013-12-10 2015-11-10 Amazon Technologies, Inc. Systems and methods for determining playback locations in media files
CN110036442A (zh) * 2016-10-25 2019-07-19 乐威指南公司 用于恢复媒体资产的***和方法
CN111641551A (zh) * 2020-05-27 2020-09-08 维沃移动通信有限公司 语音播放方法、语音播放装置和电子设备
CN112712825A (zh) * 2020-12-30 2021-04-27 维沃移动通信有限公司 音频处理方法、装置及电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102811182A (zh) * 2012-08-10 2012-12-05 上海量明科技发展有限公司 即时通信中播放音频消息的方法、客户端及***
CN103970477A (zh) * 2014-04-30 2014-08-06 华为技术有限公司 控制语音消息的方法和设备
CN104965872B (zh) * 2015-06-11 2019-04-26 联想(北京)有限公司 一种信息处理方法和电子设备
CN105827516B (zh) * 2016-05-09 2019-06-21 腾讯科技(深圳)有限公司 消息处理方法和装置
WO2018037956A1 (ja) * 2016-08-26 2018-03-01 ソニー株式会社 情報処理装置及び情報処理方法
CN107888757A (zh) * 2017-09-25 2018-04-06 努比亚技术有限公司 一种语音消息处理方法、终端和计算机可读存储介质
CN111128254B (zh) * 2019-11-14 2021-09-03 网易(杭州)网络有限公司 音频播放方法、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9182940B1 (en) * 2013-12-10 2015-11-10 Amazon Technologies, Inc. Systems and methods for determining playback locations in media files
CN104038827A (zh) * 2014-06-06 2014-09-10 小米科技有限责任公司 多媒体播放方法及装置
CN110036442A (zh) * 2016-10-25 2019-07-19 乐威指南公司 用于恢复媒体资产的***和方法
CN111641551A (zh) * 2020-05-27 2020-09-08 维沃移动通信有限公司 语音播放方法、语音播放装置和电子设备
CN112712825A (zh) * 2020-12-30 2021-04-27 维沃移动通信有限公司 音频处理方法、装置及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4273863A4 *

Also Published As

Publication number Publication date
EP4273863A1 (en) 2023-11-08
KR20230125284A (ko) 2023-08-29
CN112712825B (zh) 2022-09-23
CN112712825A (zh) 2021-04-27
EP4273863A4 (en) 2024-07-03
US20230343325A1 (en) 2023-10-26

Similar Documents

Publication Publication Date Title
WO2022143888A1 (zh) 音频处理方法、装置及电子设备
US9576580B2 (en) Identifying corresponding positions in different representations of a textual work
KR102527281B1 (ko) 멀티미디어 재생 방법 및 그 디바이스
KR101977072B1 (ko) 음성 파일과 관련된 텍스트의 표시 방법 및 이를 구현한 전자기기
CN111368155B (zh) 环境感知对话策略和响应生成
KR101897774B1 (ko) 녹음된 음성의 탐색을 용이하게 하는 방법 및 이를 구현한 전자기기
WO2022156709A1 (zh) 音频信号处理方法、装置、电子设备和可读存储介质
KR20130129747A (ko) 음성 녹음 중에 탐색을 용이하게 하는 방법 및 이를 구현한 전자기기
WO2016023317A1 (zh) 一种语音信息的处理方法及终端
WO2016165346A1 (zh) 存储和播放音频文件的方法和装置
US11238858B2 (en) Speech interactive method and device
US9141588B2 (en) Communication using handwritten input
US9613641B2 (en) Identifying corresponding positions in different representations of a textual work
WO2022161273A1 (zh) 文件保存方法、装置、设备及存储介质
KR20160004914A (ko) 멀티미디어 재생 방법 및 그 디바이스
JP2022003415A (ja) 音声制御方法及び音声制御装置、電子機器並びに記憶媒体
WO2021093333A1 (zh) 音频播放方法、电子设备及存储介质
US9368115B2 (en) Identifying corresponding positions in different representations of a textual work
WO2018005268A1 (en) Assistive technology notifications for relevant metadata changes in a document
CN113055529A (zh) 录音控制方法和录音控制装置
KR20140014510A (ko) 음성 인식에 의하여 형성된 문자의 편집 방법 및 그 단말
WO2023213313A1 (zh) 用于音频编辑的方法、装置、设备和存储介质
EP3108381B1 (en) Encoded associations with external content items
WO2022213986A1 (zh) 语音识别的方法、装置、电子设备和可读存储介质
CN115547337A (zh) 语音识别方法及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21914619

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202327047796

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 20237025661

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021914619

Country of ref document: EP

Effective date: 20230731