WO2022143888A1 - 音频处理方法、装置及电子设备 - Google Patents
音频处理方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2022143888A1 WO2022143888A1 PCT/CN2021/143036 CN2021143036W WO2022143888A1 WO 2022143888 A1 WO2022143888 A1 WO 2022143888A1 CN 2021143036 W CN2021143036 W CN 2021143036W WO 2022143888 A1 WO2022143888 A1 WO 2022143888A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- segment
- clip
- spliced
- playback
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 45
- 230000011218 segmentation Effects 0.000 claims abstract description 40
- 238000012545 processing Methods 0.000 claims description 60
- 238000012544 monitoring process Methods 0.000 claims description 12
- 238000013467 fragmentation Methods 0.000 claims description 9
- 238000006062 fragmentation reaction Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 239000012634 fragment Substances 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 description 20
- 230000005540 biological transmission Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 241000590419 Polygonia interrogationis Species 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B2020/10935—Digital recording or reproducing wherein a time constraint must be met
- G11B2020/10972—Management of interruptions, e.g. due to editing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B2020/10935—Digital recording or reproducing wherein a time constraint must be met
- G11B2020/10981—Recording or reproducing data when the data rate or the relative speed between record carrier and transducer is variable
Definitions
- the present application belongs to the field of audio technology, and in particular relates to an audio processing method, device and electronic device.
- the sending and receiving of audio messages is relatively common, especially when it is inconvenient for users to enter and read text.
- the audio message is interrupted before the audio message is played, when the user wants to continue listening to the audio message, it is often necessary to replay the audio message from the beginning to the end, which is a waste of time. .
- the interruption position of audio playback is detected, and the remaining audio is marked based on the interruption position, which is convenient for listening next time.
- the inventor found that there are at least the following problems in the prior art: starting to mark the remaining audio based on the interruption position of the audio playback, it is easy to cause the content of the marked remaining audio to have poor integrity. sentence.
- the purpose of the embodiments of the present application is to provide an audio processing method, apparatus and electronic device, which can solve the problem in the prior art that the content of the marked remaining audio is poor in integrity due to starting to mark the remaining audio based on the interruption position of audio playback.
- an embodiment of the present application provides an audio processing method, the method comprising:
- the first audio determines the first audio according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each silent segment of the first audio
- the first position of the audio wherein the first position is the segment position or the end position of the silent segment of the first audio segment of the first audio, and the first audio segment is the start position of the first audio to the audio segment between the playback interruption position of the first audio;
- an audio processing device comprising:
- the first determination module is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each mute segment of the first audio. position, to determine the first position of the first audio, where the first position is the segmented position or the end position of the mute segment of the first audio segment of the first audio, and the first audio segment is the The audio segment between the start position of the first audio and the playback interruption position of the first audio;
- a segmentation module configured to segment the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the The audio segment is between the end position of the first audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
- embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.
- an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .
- an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.
- a computer program product is provided, the computer program product is stored in a non-volatile storage medium, the computer program product is executed by at least one processor to implement the method of the first aspect.
- the interruption position of the playback of the first audio, each segment position of the first audio, and each silent segment of the first audio determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio; divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein, The second audio segment is an audio segment between the first position of the first audio and the end position of the first audio, and the third audio segment is the start position of the first audio to the first audio segment.
- An audio segment between first positions of audio Since the audio is divided based on the segment position determined according to the playback interruption position or the end position of the silent segment, the integrity of the segmented audio segment can be improved.
- FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of a segmented sentence position and a position of a silent segment of audio provided by an embodiment of the present application;
- FIG. 3 is a schematic diagram before and after audio splicing provided by an embodiment of the present application.
- FIG. 5 is a structural diagram of an audio processing apparatus provided by an embodiment of the present application.
- FIG. 6 is a structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 7 is a structural diagram of an electronic device provided by another embodiment of the present application.
- first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
- the objects are usually of one type, and the number of objects is not limited.
- the first object may be one or more than one.
- “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
- FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application. As shown in FIG. 1, the audio processing method includes the following steps:
- Step 101 In the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio and the position of each silent segment of the first audio, determine the The first position of the first audio, wherein the first position is the segment position or the end position of the silence fragment of the first audio clip, and the first audio clip is the first audio The audio segment between the start position of the first audio and the playback interruption position of the first audio.
- the above-mentioned first audio may be any audio, for example, an audio message, an audio file, or an audio part in a video.
- the above-mentioned playback interruption position can be understood as the first audio playback interruption time point or playback interruption time. For example, if the playback is interrupted when the first audio is played to the 5th second, the above-mentioned playback interruption position is the 5th second of the first audio.
- the above-mentioned first audio segment may refer to an audio segment between the start position of the first audio and the playback interruption position of the first audio, that is, an audio segment that has been played in the first audio.
- the above sentence segmentation position may refer to the segmentation position of each sentence in the first audio. It should be noted that, the above sentence segmentation position may be understood as a sentence fragmentation time point or a sentence fragmentation moment of the audio.
- the above-mentioned silent segment may refer to a silent portion in the first audio.
- the position of the silent segment may include a start position and an end position of the silent segment.
- start position of the silence segment may be understood as the start time point or start time of the silence segment
- end position of the silence segment may be understood as the end time point or end time of the silence segment.
- each mute segment in the first audio can be detected by a voice activity detection (Voice Activity Detection, VAD) algorithm, wherein the above-mentioned VAD algorithm can divide each frame of the audio into two categories, one is a mute frame (that is, a mute frame). sil frame), one type is a non-silent frame, wherein the classification algorithm adopted by the above VAD algorithm may include a filter algorithm or a neural network algorithm.
- a silent portion of the first audio whose duration exceeds a preset duration may be determined as a silent segment.
- the above-mentioned preset duration can be set reasonably according to actual requirements, for example, 1 second, 1.5 seconds, or 2 seconds.
- each segment position and the position of each silent segment of the first audio may be pre-marked, for example, as shown in FIG. 2 , the audio part marked as sil is a silent segment.
- the audio part marked as sil is a silent segment.
- a segment position or the end position of the silence segment may be determined as the first position from each segment position of the first audio segment and the end position of each silence segment based on the playback interruption position of the first audio. For example, the segment position that is closest to the playback interruption position in the first audio clip or the end position of the mute segment can be used as the first position, or the segment position or mute position that is less than a preset distance from the playback interruption position in the first audio segment can be used as the first position. The end position of the segment as the first position and so on.
- Step 102 Divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the first audio segment The audio segment is between the end positions of the audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
- the first audio can be divided into a third audio segment (that is, the audio segment that has been played) and a second audio segment (that is, the audio segment that has not been played) according to the first position, so that the user needs to continue listening to the first audio.
- the above-mentioned second audio clip can be directly listened to, which saves the user's time.
- the first audio is divided based on the segment position closest to the playback interruption position of the first audio or the end position of the mute clip, it is possible to improve the performance. The integrity of the resulting audio segment after segmentation.
- the audio processing method in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segmented position of the first audio, and each of the first audio The position of the silent segment, to determine the first location of the first audio, where the first location is the segment position of the first audio segment of the first audio or the end position of the silent segment, the first audio The segment is an audio segment between the start position of the first audio and the playback interruption position of the first audio; the first audio is divided according to the first position to obtain a second audio segment and a third audio segment, In this way, the integrity of the audio segment obtained after the segmentation can be improved, so that the user can more easily understand the content of the audio while continuing to listen to the second audio segment.
- the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
- the segment position that is closest to the playback interruption position in the first audio clip or the end position of the mute clip may be used as the first position. For example, you can view each word in the played audio segment before the playback interruption position. If the end position of the previous word at the playback interruption position is the sentence interruption position, or the previous word is a silent word, you can put the end of the previous word The moment is used as the split position of the audio. For example, as shown in FIG. 2, if the playback is interrupted when it reaches "My name", and the preceding word is a silent word (ie sil), the end position of the silent word can be taken as the first position, that is, The split position of the first audio.
- the first audio segment is segmented by using the segment position closest to the playback interruption position or the end location of the mute segment as the first location, which not only improves the integrity of the segmented audio segment, but also improves the integrity of the segmented audio segment.
- the accuracy of segmentation of the played part and the unplayed part of the first audio can be improved.
- the method in the case of monitoring that the playback of the first audio is interrupted, before identifying the first position in the first audio segment of the first audio that is closest to the position where the playback of the first audio is interrupted, the method also includes:
- each sentence fragmentation position of the first audio is determined.
- the first audio can be converted into text through a speech recognition algorithm, and the audio position corresponding to each word in the text can be marked, for example, the start time point and end time of each word in the text in the first audio can be marked point.
- a text segmentation algorithm can be used to mark punctuation marks for the text, for example, commas, periods, question marks, exclamation marks, blanks and other symbols, wherein, marked as blank means that the process is not performed here.
- Sentence segmentation otherwise, it means that sentence segmentation is required here.
- the above-mentioned text segmentation algorithm may be a classification algorithm based on N labeled text training, by classifying the end of each word in the above-mentioned text, wherein the classification category may include commas, periods, question marks, and exclamation marks. , blank and other symbols, the value of the above N is often large, for example, 5000, 10000, 50000, etc., which can be set reasonably according to actual needs.
- the above text segmentation algorithm can include Conditional Random Field (CRF) algorithm or Neural network algorithms, etc., are not limited in this embodiment.
- each segment position of the text can be obtained.
- each segment position of the first audio can be obtained in combination with the audio position corresponding to each word in the marked text. For example, as shown in Figure 2, there is an exclamation mark after the word "hello" in the text, and the end time point of the word in the audio is the second second, then the second second of the audio can be determined as a segment position .
- the accuracy of the sentence segmentation processing result can be improved.
- the audio position corresponding to each word in the text is marked, and the sentence segmentation processing is based on the audio position corresponding to each word in the text and the text. As a result, the segmented position of the audio is determined, which is simple and convenient to implement.
- the first audio is an audio message
- the method further includes at least one of the following:
- de-duplication processing is performed on the post-audio and the second audio segment, and the de-duplicated post-audio and the second audio are de-duplicated.
- the segments are spliced to obtain the first spliced audio; wherein, the post-audio is an audio message after the first audio, and the audio object corresponding to the post-audio is the same as the audio object corresponding to the first audio ;
- de-duplication processing is performed on the pre-audio and the third audio segment, and the de-duplicated pre-audio and the third audio are de-duplicated.
- the segments are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the audio object corresponding to the first audio .
- the above-mentioned first audio may be an audio message transmitted through an instant messaging application.
- the above audio object can be understood as the speaking object of the audio.
- the above-mentioned pre-audio can be understood as an audio message whose corresponding audio object is the same as the audio object corresponding to the first audio, and is located after the first audio and adjacent to the first audio. Specifically, if the next message of the first audio is an audio message and the audio object corresponding to the audio message is the same as the audio object of the first audio, it is determined that the first audio has post-audio; otherwise, it is determined that the first audio does not have post-audio audio.
- audio message A has post-audio, that is, audio message B; if the latter message of audio message A is not an audio message, or the latter message of audio message A is audio message B but the audio object corresponding to audio message B is different from the audio object corresponding to audio message A, it can be determined that the audio There is no post audio for message A.
- the above-mentioned pre-audio may be understood as an audio message whose corresponding audio object is the same as the audio object corresponding to the first audio, and is located before and adjacent to the first audio. Specifically, if the previous message of the first audio is an audio message and the audio object corresponding to the audio message is the same as the audio object of the first audio, it is determined that the first audio has pre-audio; otherwise, it is determined that the first audio does not have pre-audio audio.
- audio message A has pre-audio, that is, audio message C; if the previous message of the audio message A is not an audio message, or the previous message of the audio message A is an audio message C but the audio object corresponding to the audio message C is different from the audio object corresponding to the audio message A, then determine the audio message A Pre-audio does not exist.
- an audio message is usually short, for example, the longest is 60 seconds, and it is often difficult to fully express the content that the user needs to convey. Therefore, the user often expresses the content that the user needs to convey by sending multiple consecutive audio messages.
- the second audio segment in the first audio is spliced with the post-audio
- the third audio segment in the first audio is spliced with the pre-audio, so that the user can listen to a relatively complete audio content based on the spliced audio. , which is convenient for users to operate.
- performing de-duplication processing before audio splicing can improve the smoothness of audio splicing.
- the time interval between the transmission time of the above-mentioned post-audio and the transmission time of the above-mentioned first audio is less than the first preset time interval, or the transmission time of the above-mentioned post-audio and the transmission time of the above-mentioned first audio are within the same day,
- the time interval between the transmission time of the above-mentioned pre-audio and the transmission time of the above-mentioned first audio is less than the second preset time interval, or the transmission time of the above-mentioned pre-audio is within the same day as the transmission time of the above-mentioned first audio.
- Two unrelated audio messages are spliced together.
- the above-mentioned first preset time interval and second preset time interval can be reasonably set according to actual requirements, for example, 10 minutes, 5 minutes, and the like.
- the above-mentioned transmission time may include sending time and receiving time.
- the performing de-duplication processing on the post-position audio and the second audio segment may include:
- the performing de-duplication processing on the pre-audio and the third audio segment includes:
- the above-mentioned audio clip located before the first segment position of the post-audio or the position of the first mute clip, that is, the above-mentioned fourth audio clip, for example, as shown in FIG. 2 the first audio clip of the post-audio A segment position is the end position of "I'm looking for you", then the fourth audio segment above is the audio segment corresponding to "I'm looking for you” in the post-audio.
- the fourth audio clip may be deleted from the post-audio.
- the audio segment corresponding to "I'm looking for you” in the first audio segment is deleted, and the second audio segment of the fifth audio segment is deleted and the post-audio splicing is performed.
- the second audio segment and the post-audio may be directly spliced.
- the audio segment located after the position of the last sentence segment or the position of the last silent segment of the preceding audio is the above-mentioned sixth audio segment.
- the first segment position of the first audio is the end position of "Hello”
- the above-mentioned seventh audio segment is the audio segment corresponding to "Hello" in the first audio.
- the sixth audio clip can be deleted from the pre-audio, and the pre-audio of the deleted sixth audio clip and the Three audio clips are spliced; or the seventh audio clip is deleted from the third audio clip, and the third audio clip and the preceding audio clip of the seventh audio clip will be deleted.
- the third audio clip and the preceding audio can be directly spliced.
- This embodiment determines the repeated audio of the post-audio and the second audio segment based on the position of the first segment of the post-audio or the position of the first silent segment and the position of the last segment of the second audio segment or the position of the last silent segment segment, and based on the position of the last segment of the pre-audio or the position of the last silent segment and the position of the first segment of the third audio segment or the position of the first silent segment to determine the repeating audio of the pre-audio and the third audio segment clips, duplicate audio clips can be determined quickly and accurately, and the speed and accuracy of deduplication processing can be improved.
- the method further includes:
- the method further includes:
- the first spliced audio when the first spliced audio is obtained, the first spliced audio may be displayed in the message display window, and the display of the rear audio and the second audio segment is cancelled, wherein the first spliced audio
- the spliced audio is marked as unread, and the first playback speed adjustment mark is displayed on the first spliced audio
- the second spliced audio when the second spliced audio is obtained, the second spliced audio can be displayed in the message display window, and canceled
- the pre-audio and the third audio clip are displayed, wherein the second spliced audio is marked as read, and a second playback speed adjustment mark is displayed on the second spliced audio, for example, as shown in Figure 3 .
- the above-mentioned first playback speed adjustment mark is used to adjust the playback speed of the first spliced audio, and can adjust the playback speed of the first spliced audio to be the same as the above-mentioned first input when the first input of the first playback speed adjustment mark is received.
- the above-mentioned second playback speed adjustment mark is used to adjust the playback speed of the second spliced audio, and can adjust the playback speed of the second spliced audio to be the same as the above-mentioned second in the case of receiving the second input of the second playback speed adjustment mark.
- the playback speed corresponding to the playback speed adjustment flag is used to adjust the playback speed of the first spliced audio, and can adjust the playback speed of the first spliced audio to be the same as the above-mentioned first input when the first input of the first playback speed adjustment mark is received.
- both the above-mentioned first playback speed adjustment mark and the second playback speed adjustment mark may include at least one playback speed sub-sign, and each playback speed sub-sign corresponds to a playback speed.
- the above-mentioned first playback speed adjustment mark and Each of the second playback speed adjustment identifiers may include at least one of a playback speed sub-id for 1.5-multiple playback, a playback speed sub-ID for 2-times playback, and a playback speed sub-ID for 3-multiple playback.
- the first spliced audio is displayed in the message display window, and the cancellation is cancelled.
- displaying the rear audio and the second audio clip wherein the first spliced audio is marked as unread, and a first playback speed adjustment mark is displayed on the first spliced audio;
- the second audio clip can be directly marked as unread, and a third playback speed adjustment mark is displayed on the second audio clip, the third playback speed adjustment The mark can be used to adjust the playback speed of the second audio clip; if the first audio does not have pre-audio, the third audio clip can be directly marked as read, and the fourth playback speed is displayed on the third audio clip Adjustment mark, the fourth playback speed adjustment mark can be used to adjust the playback speed of the third audio clip, which not only facilitates the user to intuitively distinguish the played audio clip from the unplayed audio clip, so that the playback selection can be made quickly, and the adjustment can be flexibly The playback speed of the audio message, saving the user's time in listening to the audio message.
- the text corresponding to the target audio when a playback input for target audio is received, the text corresponding to the target audio can be identified; the text corresponding to the target audio can be semantically understood, and the target can be determined based on the semantic understanding result. Background music; playing the target background music during the process of playing the target audio.
- the above-mentioned target audio may be any audio or audio segment or spliced audio.
- the above-mentioned target background music may be music that matches the semantic understanding result of the text corresponding to the target audio. For example, if the semantic understanding result of the text corresponding to the target audio indicates that the target audio is a birthday wish, the above-mentioned target background music may be related to birthday wishes. Song, if the semantic understanding result of the text corresponding to the target audio indicates that the target audio is a travel plan, the above-mentioned target background music may be a travel-related song.
- FIG. 4 is a flowchart of an audio processing method provided by another embodiment of the present application.
- the audio processing method includes the following steps:
- Step 201 Play the first audio.
- Step 202 whether to pause playing the first audio.
- the interrupted playback position of the first audio may be recorded, and step 203 is performed, otherwise, the playback of the first audio is continued.
- Step 203 Detect each silent segment in the first audio through a voice activity detection algorithm.
- each frame of the audio can be divided into two categories by the VAD algorithm, one is silent frame (ie, sil frame), and the other is non-silent frame, and the silent part whose duration exceeds the preset duration is determined as a silent segment.
- silent frame ie, sil frame
- non-silent frame the silent part whose duration exceeds the preset duration is determined as a silent segment.
- Step 204 Recognize the text corresponding to the first audio through a speech recognition algorithm, and mark the audio positions corresponding to each word in the text.
- the text corresponding to the first audio is recognized by a speech recognition algorithm, and the audio positions corresponding to each word in the text are marked, that is, the start time and end time of each word in the first audio are marked.
- Step 205 Perform sentence segmentation processing on the text through a text segmentation algorithm, and determine each sentence fragmentation position of the first audio in combination with the audio positions corresponding to each word in the text.
- punctuation marks can be marked for the text according to the text segmentation algorithm, for example, commas, periods, question marks, exclamation marks, blanks, etc., where marking as blank means that sentence segmentation is not performed here, otherwise, it means that sentence segmentation is required here, and After obtaining each sentence segment position of the text, in combination with the audio position corresponding to each word in the marked text, each sentence fragment position of the first audio can be obtained.
- Step 206 Determine the segmentation position according to the playback interruption position of the first audio, the position of each mute segment in the first audio, and each segmented position of the first audio.
- the segment position or the end position of the mute segment that is closest to the playback interruption position in the played audio segment of the first audio may be searched for. For example, you can view each word in the played audio segment before the playback interruption position. If the end position of the previous word at the playback interruption position is the sentence interruption position, or the previous word is a silent word, you can put the end of the previous word The moment is used as the split position of the audio.
- Step 207 Divide the first audio according to the division position to obtain a second audio segment and a third audio segment.
- the second audio segment is an audio segment between the split position of the first audio and the end position of the first audio
- the third audio segment is the start position of the first audio to the first audio segment. Audio clips between split positions of the audio.
- Step 208 Determine whether the first audio has pre-audio and post-audio.
- step 210 if the first audio has pre-audio and post-audio, go to step 210; if the first audio does not have pre-audio and post-audio, go to step 209; if the first audio exists In the case of post-audio but no pre-audio, the third audio clip can be marked as read, and the second audio clip and post-audio can be de-duplicated and spliced to obtain the first spliced audio; When the audio has pre-audio but no post-audio, the second audio segment can be marked as unread, and the third audio segment and pre-audio are deduplicated and spliced to obtain the second spliced audio.
- Step 209 Mark the second audio clip as unread, and mark the third audio clip as read.
- Step 210 Perform de-duplication processing and splicing on the second audio segment and the post-audio to obtain the first spliced audio, and perform de-duplication processing and splicing on the third audio segment and the pre-audio to obtain the second spliced audio.
- Step 211 Mark the first spliced audio as unread, and mark the second spliced audio as unread.
- the embodiment of the present application can automatically adjust the playback interruption point through the voice activity detection algorithm, the speech recognition algorithm, and the text segmentation algorithm, so that the audio behind the interruption point is relatively complete, and it is convenient to listen to the last audio next time.
- the embodiment of the present application can remove repeated audios during the audio splicing process, which can increase the smoothness of splicing two audios, which is convenient for listening.
- the execution body may be an audio processing apparatus, or, or a control module in the audio processing apparatus for executing the loading audio processing method.
- the audio processing device provided by the embodiment of the present application is described by taking the audio processing device performing the loading audio processing method as an example.
- FIG. 5 is a structural diagram of an audio processing apparatus provided by an embodiment of the present application. As shown in FIG. 5, the audio processing apparatus 500 includes:
- the first determination module 501 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segment position of the first audio, and each silent segment of the first audio , determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio;
- a segmentation module 502 configured to segment the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein the second audio segment is from the first position of the first audio to the The audio segment is between the end position of the first audio, and the third audio segment is the audio segment between the start position of the first audio and the first position of the first audio.
- the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
- the device further includes:
- the identification module is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, the position of each segment of the first audio, and the position of each silent segment of the first audio. position, before determining the first position of the first audio, identify the text corresponding to the first audio;
- a marking module for marking the audio position corresponding to each word in the text
- a sentence segmentation module used to perform sentence segmentation processing on the text to obtain a sentence segmentation processing result
- the second determination module is configured to determine each segmented position of the first audio according to the segmented processing result and the audio position corresponding to each word in the text.
- the first audio is an audio message
- the device further includes at least one of the following:
- the first processing module is configured to divide the first audio according to the first position to obtain the second audio segment and the third audio segment, and in the case that the first audio has post-audio
- the post audio and the second audio segment are de-duplicated, and the de-duplicated post-audio and the second audio segment are spliced to obtain the first spliced audio;
- the audio is the next audio message of the first audio, and the audio object corresponding to the rear audio is the same as the audio object corresponding to the first audio;
- the second processing module is configured to perform de-duplication processing on the pre-audio and the third audio segment when the first audio has pre-audio, and deduplicate the pre-audio after the de-duplication process.
- the audio and the third audio segment are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the first audio.
- the audio objects corresponding to one audio are the same.
- the first processing module is specifically used for:
- the second processing module is specifically used for:
- the device further includes:
- the first display module is used for splicing the deduplicated post audio and the second audio segment to obtain the first splicing audio, displaying the first splicing audio in the message display window, and displaying the first splicing audio in the message display window.
- the second display module is used for splicing the deduplicated pre-audio and the third audio segment to obtain the second splicing audio, displaying the second splicing audio in the message display window, and displaying the second splicing audio. Cancel the display of the pre-audio and the third audio clip, wherein the second spliced audio is marked as read, and a second playback speed adjustment mark is displayed on the second spliced audio.
- the audio processing apparatus in this embodiment of the present application may be an apparatus, and may also be a component, an integrated circuit, or a chip in a terminal.
- the apparatus may be a mobile electronic device or a non-mobile electronic device.
- the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook, or a personal digital assistant (Personal Digital Assistant).
- UMPC Ultra-Mobile Personal Computer
- netbook or a personal digital assistant (Personal Digital Assistant).
- non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (Personal Computer, PC), television (Television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
- Network Attached Storage NAS
- PC Personal Computer
- TV Television, TV
- teller machine or self-service machine etc.
- the audio processing apparatus in this embodiment of the present application may be an apparatus having an operating system.
- the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
- the audio processing apparatus provided in the embodiment of the present application can implement each process in the foregoing method embodiment, which is not repeated here to avoid repetition.
- the first determination module 501 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio and each segment of the first audio the position and the position of each silent segment of the first audio, to determine the first position of the first audio, where the first position is the segment position or the silent segment of the first audio segment of the first audio.
- the end position of the first audio segment is the audio segment between the start position of the first audio and the playback interruption position of the first audio; the segmentation module 502 is used for segmenting the first audio according to the first position.
- the first audio to obtain a second audio segment and a third audio segment
- the second audio segment is an audio segment between the first position of the first audio and the end position of the first audio
- the third audio segment is an audio segment between the start position of the first audio and the first position of the first audio. Since the audio is divided based on the segment position determined according to the playback interruption position or the end position of the silent segment, the integrity of the segmented audio segment can be improved.
- an embodiment of the present application further provides an electronic device 600, including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
- an electronic device 600 including a processor 601, a memory 602, a program or instruction stored in the memory 602 and executable on the processor 601,
- the program or instruction is executed by the processor 601
- each process of the above audio processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, details are not described here.
- the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
- FIG. 7 is a structural diagram of an electronic device provided by another embodiment of the present application.
- the electronic device 700 includes but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input Unit 704, sensor 705, display unit 706, user input unit 707, interface unit 708, memory 709, processor 710 and other components.
- the electronic device 700 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 710 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions.
- a power source such as a battery
- the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device.
- the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
- the processor 710 is configured to, in the case of monitoring that the playback of the first audio is interrupted, according to the playback interruption position of the first audio, each segment position of the first audio, and each silent segment of the first audio , determine the first position of the first audio, where the first position is the segment position or the end position of the silent segment of the first audio segment, and the first audio segment is The audio segment between the start position of the first audio and the playback interruption position of the first audio; divide the first audio according to the first position to obtain a second audio segment and a third audio segment, wherein, The second audio segment is an audio segment between the first position of the first audio and the end position of the first audio, and the third audio segment is the start position of the first audio to the first audio segment. An audio segment between first positions of audio.
- the first position is a segmented sentence position or an end position of a mute fragment that is a first distance from the playback interruption position in the first audio clip, and the first distance is a distance from the first audio clip.
- processor 710 is further configured to:
- each sentence fragmentation position of the first audio is determined.
- processor 710 is further configured to:
- the first audio is an audio message, and after the first audio is divided according to the first position to obtain the second audio segment and the third audio segment, in the case that the first audio has post-audio, Perform de-duplication processing on the post audio and the second audio segment, and splicing the de-duplicated post audio and the second audio segment to obtain the first spliced audio; wherein the The post-audio is the next audio message of the first audio, and the audio object corresponding to the post-audio is the same as the audio object corresponding to the first audio;
- de-duplication processing is performed on the pre-audio and the third audio segment, and the de-duplicated pre-audio and the third audio are de-duplicated.
- the segments are spliced to obtain a second spliced audio; wherein, the pre-audio is the previous audio message of the first audio, and the audio object corresponding to the pre-audio is the same as the audio object corresponding to the first audio .
- processor 710 is further configured to:
- the display unit 706 is used for:
- the first spliced audio is displayed in the message display window, and the post-audio is cancelled. and the second audio clip, wherein the first spliced audio is marked as unread, and a first playback speed adjustment mark is displayed on the first spliced audio;
- the input unit 704 may include a graphics processor (Graphics Processing Unit, GPU) 7041 and a microphone 7042. Such as camera) to obtain still pictures or video image data for processing.
- the display unit 706 may include a display panel 7061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
- the user input unit 707 includes a touch panel 7071 and other input devices 7072 .
- the touch panel 7071 is also called a touch screen.
- the touch panel 7071 may include two parts, a touch detection device and a touch controller.
- Other input devices 7072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
- Memory 709 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems.
- the processor 710 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the like, and the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 710.
- the embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium.
- a program or an instruction is stored on the readable storage medium.
- the processor is the processor in the electronic device described in the foregoing embodiments.
- the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
- An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the audio processing method embodiments described above.
- the chip includes a processor and a communication interface
- the communication interface is coupled to the processor
- the processor is configured to run a program or an instruction to implement the audio processing method embodiments described above.
- the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
- the embodiment of the present application further provides an electronic device, which is configured to execute each process of the above-mentioned audio processing method embodiment, and can achieve the same technical effect. To avoid repetition, details are not described here.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Telephone Function (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (17)
- 一种音频处理方法,包括:在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
- 根据权利要求1所述的方法,其中,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
- 根据权利要求1所述的方法,其中,所述在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置之前,所述方法还包括:识别所述第一音频对应的文本;标记所述文本中每个词对应的音频位置;对所述文本进行断句处理,得到断句处理结果;根据所述断句处理结果和所述文本中每个词对应的音频位置,确定所述第一音频的各个断句位置。
- 根据权利要求1所述的方法,其中,所述第一音频为音频消息,所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,所述方法还包括:在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音 频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
- 根据权利要求4所述的方法,其中,所述对所述后置音频与所述第二音频片段进行去重处理,包括:获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;所述对所述前置音频与所述第三音频片段进行去重处理,包括:获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删除所述第七音频片段。
- 根据权利要求4所述的方法,其中,所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,所述方法还包括:在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所 述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,所述方法还包括:在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
- 一种音频处理装置,包括:第一确定模块,用于在监测到第一音频播放中断的情况下,根据所述第一音频的播放中断位置、所述第一音频的各个断句位置和所述第一音频的各个静音片段的位置,确定所述第一音频的第一位置,其中,所述第一位置为位于所述第一音频的第一音频片段的断句位置或静音片段的结束位置,所述第一音频片段为所述第一音频的开始位置至所述第一音频的播放中断位置之间的音频片段;分割模块,用于根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段,其中,所述第二音频片段为所述第一音频的第一位置至所述第一音频的结束位置之间的音频片段,所述第三音频片段为所述第一音频的开始位置至所述第一音频的第一位置之间的音频片段。
- 根据权利要求7所述的装置,其中,所述第一位置为所述第一音频片段中距离所述播放中断位置为第一距离的断句位置或静音片段的结束位置,所述第一距离为所述第一音频片段的各个断句位置和各个静音片段的结束位置与所述播放中断位置的距离中的最小值。
- 根据权利要求7所述的装置,其中,所述装置还包括:识别模块,用于所述在监测到第一音频播放中断的情况下,识别所述第一音频的第一音频片段中距离所述第一音频的播放中断位置最近的第一位置之前,识别所述第一音频对应的文本;标记模块,用于标记所述文本中每个词对应的音频位置;断句模块,用于对所述文本进行断句处理,得到断句处理结果;第二确定模块,用于根据所述断句处理结果和所述文本中每个词对应的 音频位置,确定所述第一音频的各个断句位置。
- 根据权利要求7所述的装置,其中,所述第一音频为音频消息,所述装置还包括:第一处理模块,用于所述根据所述第一位置分割所述第一音频,得到第二音频片段和第三音频片段之后,在所述第一音频存在后置音频的情况下,对所述后置音频与所述第二音频片段进行去重处理,并将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频;其中,所述后置音频为所述第一音频的后一条音频消息,且所述后置音频对应的音频对象与所述第一音频对应的音频对象相同;第二处理模块,用于在所述第一音频存在前置音频的情况下,对所述前置音频与所述第三音频片段进行去重处理,并将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频;其中,所述前置音频为所述第一音频的前一条音频消息,且所述前置音频对应的音频对象与所述第一音频对应的音频对象相同。
- 根据权利要求10所述的装置,其中,所述第一处理模块具体用于:获取位于所述后置音频的第二位置之前的第四音频片段和位于所述第二音频片段的第三位置之后的第五音频片段,其中,所述第二位置包括所述后置音频的第一个断句位置或者第一个静音片段的位置,所述第三位置包括所述第二音频片段的最后一个断句位置或者最后一个静音片段的位置;若所述第四音频片段对应的文本与所述第五音频片段对应的文本相同,则从所述后置音频中删除所述第四音频片段,或者从所述第二音频片段中删除所述第五音频片段;所述第二处理模块具体用于:获取所述前置音频的第四位置之后的第六音频片段和所述第三音频片段的第五位置之前的第七音频片段,其中,所述第四位置包括所述前置音频的最后一个断句位置或者最后一个静音片段的位置,所述第五位置包括所述第三音频片段的第一个断句位置或者第一个静音片段的位置;若所述第六音频片段对应的文本与所述第七音频片段对应的文本相同,则从所述前置音频中删除所述第六音频片段,或者从所述第三音频片段中删 除所述第七音频片段。
- 根据权利要求10所述的装置,其中,所述装置还包括:第一显示模块,用于所述将去重处理后的所述后置音频和所述第二音频片段进行拼接,得到第一拼接音频之后,在消息显示窗口显示所述第一拼接音频,并取消显示所述后置音频和所述第二音频片段,其中,所述第一拼接音频被标记为未读状态,且在所述第一拼接音频上显示有第一播放速度调整标识;第二显示模块,用于所述将去重处理后的所述前置音频和所述第三音频片段进行拼接,得到第二拼接音频之后,在消息显示窗口显示所述第二拼接音频,并取消显示所述前置音频和所述第三音频片段,其中,所述第二拼接音频被标记为已读状态,且在所述第二拼接音频上显示有第二播放速度调整标识。
- 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,其中,所述程序或指令被所述处理器执行时实现如权利要求1-6任一项所述的音频处理方法的步骤。
- 一种可读存储介质,所述可读存储介质上存储程序或指令,其中,所述程序或指令被处理器执行时实现如权利要求1-6任一项所述的音频处理方法的步骤。
- 一种芯片,包括处理器和通信接口,其中,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-6任一项所述的音频处理方法的步骤。
- 一种计算机程序产品,其中,所述计算机程序产品被存储在非易失的存储介质中,所述计算机程序产品被至少一个处理器执行以实现如权利要求1-6任一项所述的音频处理方法的步骤。
- 一种电子设备,被配置为执行如权利要求1-6任一项所述的音频处理方法的步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237025661A KR20230125284A (ko) | 2020-12-30 | 2021-12-30 | 오디오 처리 방법, 장치 및 전자기기 |
EP21914619.8A EP4273863A4 (en) | 2020-12-30 | 2021-12-30 | AUDIO PROCESSING METHOD AND APPARATUS AND ELECTRONIC DEVICE |
US18/343,055 US20230343325A1 (en) | 2020-12-30 | 2023-06-28 | Audio processing method and apparatus, and electronic device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011604816.7A CN112712825B (zh) | 2020-12-30 | 2020-12-30 | 音频处理方法、装置及电子设备 |
CN202011604816.7 | 2020-12-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/343,055 Continuation US20230343325A1 (en) | 2020-12-30 | 2023-06-28 | Audio processing method and apparatus, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022143888A1 true WO2022143888A1 (zh) | 2022-07-07 |
Family
ID=75547078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/143036 WO2022143888A1 (zh) | 2020-12-30 | 2021-12-30 | 音频处理方法、装置及电子设备 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230343325A1 (zh) |
EP (1) | EP4273863A4 (zh) |
KR (1) | KR20230125284A (zh) |
CN (1) | CN112712825B (zh) |
WO (1) | WO2022143888A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112712825B (zh) * | 2020-12-30 | 2022-09-23 | 维沃移动通信有限公司 | 音频处理方法、装置及电子设备 |
CN113674724A (zh) * | 2021-08-18 | 2021-11-19 | 青岛海信移动通信技术股份有限公司 | 专辑文件的解析文件生成方法和终端设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104038827A (zh) * | 2014-06-06 | 2014-09-10 | 小米科技有限责任公司 | 多媒体播放方法及装置 |
US9182940B1 (en) * | 2013-12-10 | 2015-11-10 | Amazon Technologies, Inc. | Systems and methods for determining playback locations in media files |
CN110036442A (zh) * | 2016-10-25 | 2019-07-19 | 乐威指南公司 | 用于恢复媒体资产的***和方法 |
CN111641551A (zh) * | 2020-05-27 | 2020-09-08 | 维沃移动通信有限公司 | 语音播放方法、语音播放装置和电子设备 |
CN112712825A (zh) * | 2020-12-30 | 2021-04-27 | 维沃移动通信有限公司 | 音频处理方法、装置及电子设备 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102811182A (zh) * | 2012-08-10 | 2012-12-05 | 上海量明科技发展有限公司 | 即时通信中播放音频消息的方法、客户端及*** |
CN103970477A (zh) * | 2014-04-30 | 2014-08-06 | 华为技术有限公司 | 控制语音消息的方法和设备 |
CN104965872B (zh) * | 2015-06-11 | 2019-04-26 | 联想(北京)有限公司 | 一种信息处理方法和电子设备 |
CN105827516B (zh) * | 2016-05-09 | 2019-06-21 | 腾讯科技(深圳)有限公司 | 消息处理方法和装置 |
WO2018037956A1 (ja) * | 2016-08-26 | 2018-03-01 | ソニー株式会社 | 情報処理装置及び情報処理方法 |
CN107888757A (zh) * | 2017-09-25 | 2018-04-06 | 努比亚技术有限公司 | 一种语音消息处理方法、终端和计算机可读存储介质 |
CN111128254B (zh) * | 2019-11-14 | 2021-09-03 | 网易(杭州)网络有限公司 | 音频播放方法、电子设备及存储介质 |
-
2020
- 2020-12-30 CN CN202011604816.7A patent/CN112712825B/zh active Active
-
2021
- 2021-12-30 KR KR1020237025661A patent/KR20230125284A/ko not_active Application Discontinuation
- 2021-12-30 EP EP21914619.8A patent/EP4273863A4/en active Pending
- 2021-12-30 WO PCT/CN2021/143036 patent/WO2022143888A1/zh active Application Filing
-
2023
- 2023-06-28 US US18/343,055 patent/US20230343325A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9182940B1 (en) * | 2013-12-10 | 2015-11-10 | Amazon Technologies, Inc. | Systems and methods for determining playback locations in media files |
CN104038827A (zh) * | 2014-06-06 | 2014-09-10 | 小米科技有限责任公司 | 多媒体播放方法及装置 |
CN110036442A (zh) * | 2016-10-25 | 2019-07-19 | 乐威指南公司 | 用于恢复媒体资产的***和方法 |
CN111641551A (zh) * | 2020-05-27 | 2020-09-08 | 维沃移动通信有限公司 | 语音播放方法、语音播放装置和电子设备 |
CN112712825A (zh) * | 2020-12-30 | 2021-04-27 | 维沃移动通信有限公司 | 音频处理方法、装置及电子设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4273863A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4273863A1 (en) | 2023-11-08 |
KR20230125284A (ko) | 2023-08-29 |
CN112712825B (zh) | 2022-09-23 |
CN112712825A (zh) | 2021-04-27 |
EP4273863A4 (en) | 2024-07-03 |
US20230343325A1 (en) | 2023-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022143888A1 (zh) | 音频处理方法、装置及电子设备 | |
US9576580B2 (en) | Identifying corresponding positions in different representations of a textual work | |
KR102527281B1 (ko) | 멀티미디어 재생 방법 및 그 디바이스 | |
KR101977072B1 (ko) | 음성 파일과 관련된 텍스트의 표시 방법 및 이를 구현한 전자기기 | |
CN111368155B (zh) | 环境感知对话策略和响应生成 | |
KR101897774B1 (ko) | 녹음된 음성의 탐색을 용이하게 하는 방법 및 이를 구현한 전자기기 | |
WO2022156709A1 (zh) | 音频信号处理方法、装置、电子设备和可读存储介质 | |
KR20130129747A (ko) | 음성 녹음 중에 탐색을 용이하게 하는 방법 및 이를 구현한 전자기기 | |
WO2016023317A1 (zh) | 一种语音信息的处理方法及终端 | |
WO2016165346A1 (zh) | 存储和播放音频文件的方法和装置 | |
US11238858B2 (en) | Speech interactive method and device | |
US9141588B2 (en) | Communication using handwritten input | |
US9613641B2 (en) | Identifying corresponding positions in different representations of a textual work | |
WO2022161273A1 (zh) | 文件保存方法、装置、设备及存储介质 | |
KR20160004914A (ko) | 멀티미디어 재생 방법 및 그 디바이스 | |
JP2022003415A (ja) | 音声制御方法及び音声制御装置、電子機器並びに記憶媒体 | |
WO2021093333A1 (zh) | 音频播放方法、电子设备及存储介质 | |
US9368115B2 (en) | Identifying corresponding positions in different representations of a textual work | |
WO2018005268A1 (en) | Assistive technology notifications for relevant metadata changes in a document | |
CN113055529A (zh) | 录音控制方法和录音控制装置 | |
KR20140014510A (ko) | 음성 인식에 의하여 형성된 문자의 편집 방법 및 그 단말 | |
WO2023213313A1 (zh) | 用于音频编辑的方法、装置、设备和存储介质 | |
EP3108381B1 (en) | Encoded associations with external content items | |
WO2022213986A1 (zh) | 语音识别的方法、装置、电子设备和可读存储介质 | |
CN115547337A (zh) | 语音识别方法及相关产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21914619 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202327047796 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 20237025661 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021914619 Country of ref document: EP Effective date: 20230731 |