CN113204668A

CN113204668A - Audio clipping method and device, storage medium and electronic equipment

Info

Publication number: CN113204668A
Application number: CN202110560304.3A
Authority: CN
Inventors: 黄永杰
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd; Guangzhou Boguan Telecommunication Technology Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-03

Abstract

The disclosure provides an audio clipping method and device, a storage medium and electronic equipment, and relates to the technical field of audio processing. The audio cropping method comprises the following steps: acquiring one or more search keywords of the audio to be cut; matching the search keywords with the subtitle information base of the audio to be cut to determine one or more candidate audio segments; determining a target audio clip from the candidate audio clips according to a user selection result; and cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut. According to the method and the device, the audio cutting position is determined through keyword matching, and convenience and efficiency of audio cutting are improved.

Description

Audio clipping method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to an audio clipping method, an audio clipping device, a computer-readable storage medium, and an electronic device.

Background

In the audio production process, the audio needs to be cut to extract the required audio segment.

At present, when audio cutting is carried out, because audio clips cannot be observed, the position of the current audio played can be judged only by listening to the audio clips, so that a large amount of time is consumed in the accurate audio cutting process to repeatedly listen to the audio clips to find a proper cutting position, the operation of a user is inconvenient, and the efficiency is low.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides an audio clipping method, an audio clipping device, a computer-readable storage medium, and an electronic device, thereby at least to some extent solving the problems of inconvenient audio clipping and low efficiency in the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided an audio cropping method, comprising: acquiring one or more search keywords of the audio to be cut; matching the search keywords with the subtitle information base of the audio to be cut to determine one or more candidate audio segments; determining a target audio clip from the candidate audio clips according to a user selection result; and cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut.

In an exemplary embodiment of the present disclosure, before obtaining the one or more search keywords of the audio to be clipped, the method further includes: dividing the audio to be cut into a plurality of audio segments; recording the position information of each audio clip in the audio to be cut; and identifying the audio to be cut into text information and storing the text information in the caption information base, wherein the text information carries the position information corresponding to the audio clip.

In an exemplary embodiment of the disclosure, the position information of the audio segment in the audio to be clipped includes any one or more of the following time points: a start time point of the audio segment, and an end time point of the audio segment.

In an exemplary embodiment of the present disclosure, the matching the search keyword with the subtitle information base of the audio to be clipped to determine one or more candidate audio segments includes: matching the search keywords with the subtitle information base of the audio to be cut to determine one or more matched keywords; and determining the candidate audio clips according to the matched keywords.

In an exemplary embodiment of the disclosure, the matching keyword carries position information corresponding to the candidate audio segment.

In an exemplary embodiment of the present disclosure, the clipping the audio to be clipped according to the position information of the target audio segment in the audio to be clipped includes: positioning candidate cutting positions according to the position information of the candidate audio clips in the audio to be cut; determining a target clipping position from the candidate clipping positions according to the position information of the target audio clip in the audio to be clipped; and at the target cutting position, cutting the audio to be cut.

In an exemplary embodiment of the present disclosure, the clipping the audio to be clipped at the target clipping position includes: receiving an adjusting instruction of the target cutting position on the audio track; and cutting the audio to be cut according to the adjusting instruction.

According to a second aspect of the present disclosure, there is provided an audio cropping device comprising: the keyword acquisition module is used for acquiring one or more search keywords of the audio to be cut; the candidate audio clip determining module is used for matching the search keyword with the subtitle information base of the audio to be cut so as to determine one or more candidate audio clips; the target audio clip determining module is used for determining a target audio clip from the candidate audio clips according to a user selection result; and the audio cutting module is used for cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described audio cropping method.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the above-described audio cropping method via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

in the audio cutting process, one or more search keywords of the audio to be cut are obtained; matching the search keywords with a subtitle information base of the audio to be cut to determine one or more candidate audio segments; acquiring a target audio clip selected from the candidate audio clips; and cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut. Confirm to tailor the position based on keyword matching, not only can pinpoint the position that needs to tailor, can also avoid the time consumption that the audio frequency of listening repeatedly brought, help improving convenience and efficiency that the audio frequency was tailor, and then promote user operation and experience.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained from those drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a flow chart of a method of audio cropping in the present exemplary embodiment;

FIG. 2 is a diagram illustrating an example of a user interface for audio to be clipped in the present exemplary embodiment;

FIG. 3 illustrates a flow diagram of audio segmentation in the present exemplary embodiment;

fig. 4 is a diagram showing an example of inputting a search keyword in the present exemplary embodiment;

FIG. 5 is a diagram illustrating an example of selecting matching keywords in the exemplary embodiment;

FIG. 6 illustrates an example diagram of audio cropping in the present exemplary embodiment;

FIG. 7 illustrates a flow diagram for clipping audio to be clipped in the present exemplary embodiment;

FIG. 8 illustrates a flow chart for adjusting cropping position to crop audio in the present exemplary embodiment;

fig. 9 is a block diagram showing the configuration of an audio cropping means in the present exemplary embodiment;

fig. 10 shows an electronic device for implementing the above method in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the related art, a clipping control is usually manually dragged on an audio track to adjust a clipping position, and a specific position where the current clipping control is located is determined by playing an audio clip of an audio position where the clipping control is located, so as to determine an appropriate clipping position. The process needs to repeatedly listen to the audio clips and continuously adjust the cutting control to determine a proper cutting position, a large amount of time needs to be consumed, a user needs to frequently play the corresponding audio clips, the operation is very complicated, and the user experience is low.

In view of one or more of the above problems, exemplary embodiments of the present disclosure provide an audio clipping method, which may be applied to an audio clip synthesis scene, for example, matching a section of video with background audio, extracting a conference audio clip, making a ring tone, and the like, and may be deployed on an intelligent device such as a mobile phone, a tablet computer, and a desktop computer, and implement an audio clipping function by carrying a background program, thereby implementing interaction between a user interface and the background program.

It should be noted that, the above audio clipping method can not only clip pure audio, but also be applied to video clipping, and determine the audio position information to be clipped by using the above audio clipping technical means for the background audio of the video, and clip the video position corresponding to the audio position information based on the audio position information to be clipped, thereby achieving the clipping function for the video.

Fig. 1 shows a schematic flow of an audio clipping method in the present exemplary embodiment, including the following steps S110 to S140:

step S110, acquiring one or more search keywords of the audio to be cut;

step S120, matching the search keywords with a subtitle information base of the audio to be cut to determine one or more candidate audio segments;

step S130, determining a target audio clip from the candidate audio clips according to the user selection result;

and step S140, cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut.

Above-mentioned audio frequency is tailor the in-process, confirms to tailor the position based on keyword matching, not only can pinpoint the position that needs were tailor, can also avoid the time consumption that the audio frequency of listening repeatedly brought, helps improving convenience and the efficiency that the audio frequency was tailor, and then promotes user operation and experiences.

Each step in fig. 1 will be described in detail below.

Step S110, one or more search keywords of the audio to be cut are obtained.

The audio to be cut refers to audio containing text information, and can be audio of songs, conference recording, movie and television dubbing and the like. The search keyword may be a keyword included in the audio segment where the position to be cut is located, and is used to locate the cutting position. The search keyword input by the user can be obtained from the search box of the user interface of the audio to be cut.

As shown in the user interface 200 of fig. 2, a user may enter a search keyword in text or speech form in a user interface search box 202, where 201 denotes an audio track representing a waveform of an audio signal.

In an alternative embodiment, the following steps S310 to S330 may be performed for audio segmentation before acquiring one or more search keywords of the audio to be clipped:

in step S310, the audio to be cut is divided into a plurality of audio segments.

Each audio clip is a segment of continuous audio in the audio to be clipped. The audio to be cut is automatically divided into a plurality of audio segments, and the duration of each audio segment can be the same or different.

In an alternative embodiment, the audio may be automatically segmented according to the silence gaps to divide the audio to be cut into a plurality of audio segments.

The mute interval here may be an interval in which no continuous text information appears. For example, when the audio to be cropped is a song, the audio may be segmented at the start position or the end position of each sentence of lyrics.

The silence gap may be identified by: when a part of the audio to be clipped in which no text information appears continuously exceeds a preset duration, the part can be recognized as a silent gap. For example, the threshold value of the silence gap may be set to 500 milliseconds, and a part of the audio to be clipped, in which no text information appears for 500 milliseconds continuously, is identified as the silence gap.

The audio is segmented based on the mute interval, the cutting operation habit of a user is met, and the positioning and cutting position are facilitated.

In step S320, the position information of each audio clip in the audio to be cut is recorded.

The position information here may contain the time points of the audio segments, the position information being relative to the entire audio to be cropped.

In an alternative embodiment, the position information of the audio segment in the audio to be cut includes any one or more of the following time points: a start time point of the audio segment and an end time point of the audio segment.

Step S330, the audio to be cut is recognized as character information and stored in a caption information base, and the character information carries the position information corresponding to the audio clip.

The text information may be obtained by speech recognition techniques, which convert audio information into text information. For example, when the audio to be clipped is a song, the text information may be lyric information; when the audio to be cut is conference recording, the text information can be conference speaking content; when the audio to be cut is the movie dubbing, the character information can be movie lines. The subtitle information library is used to store the identified text information. The text information carries position information corresponding to the audio segment, such as a start time point or an end time point of the audio segment.

It should be noted that the above mentioned time points are relative to the whole time axis of the audio to be clipped, and the time points are adopted to facilitate the fast and accurate positioning of the audio clips.

In the step shown in fig. 3, the audio to be cut is converted into text information, and the position information corresponding to the audio segment is carried, so that the search keywords are subsequently adopted for comparison, and the audio cutting position is determined from the audio to be cut.

And step S120, matching the search keywords with a subtitle information base of the audio to be cut so as to determine one or more candidate audio segments.

And comparing the search keywords with the text information stored in the caption information base of the audio to be cut to determine one or more candidate audio segments. Candidate audio segments refer to audio segments for the user to select a clipping location. The candidate audio segments may be fed back to the user interface for selection by the user.

In an alternative embodiment, matching the search keyword with the subtitle information base of the audio to be clipped to determine one or more candidate audio segments may be implemented by: matching the search keywords with a subtitle information base of the audio to be cut to determine one or more matched keywords; and determining the candidate audio clips according to the matched keywords.

The matching keywords are text information in the subtitle information base, and the matching keywords add partial context content relative to the search keywords. The audio clips corresponding to the matched keywords can be determined as candidate audio clips, and convenience is provided for a user to further judge the positions of the target audio clips.

As shown in the user interface 400 for inputting search keywords in fig. 4, when the keyword input by the user in the user interface search box 402 is "big lucky for cattle year", three related matching keywords, "congratulatory for your big lucky for cattle year", "big lucky for cattle year", and "big lucky for cattle year" are matched from the caption information base and displayed in the display box 403, and the user can determine the position of the target audio clip according to the displayed matching keywords.

In an alternative embodiment, the matching keyword carries position information corresponding to the candidate audio clip.

The position information corresponding to the candidate audio segment refers to the position information of the candidate audio segment in the whole audio to be clipped, and can be represented by the starting time point or the ending time point of the candidate audio segment in the whole audio to be clipped. The matching keywords carry the position information corresponding to the candidate audio clips, so that the user can identify and select the target audio clip from the candidate audio clips.

As shown in fig. 4, the time point of the occurrence of the matching keyword is also displayed in the display box 403 for the user to refer to, and the time point of the occurrence of the matching keyword may be the time point of the occurrence of the matching keyword in the audio to be clipped, or the time point corresponding to the position information of the candidate audio segment to which the matching keyword belongs.

Step S130, according to the user selection result, determining the target audio frequency segment from the candidate audio frequency segments.

The user selection result refers to a selection result made by the user for the candidate audio segment, and the candidate audio segment selected by the user is taken as the target audio segment. The target audio segment refers to an audio segment associated with the audio position to be clipped, and is selected by the user.

In the specific implementation process, the audio segment to which the matching keyword belongs can be used as a candidate audio segment, and the user selects the matching keyword to further determine the target audio segment, so that the function of selecting the target audio segment from the candidate audio segment is realized.

For example, when the user selects a matching keyword "niu-nian-gis" in fig. 4, the user interface changes, and as shown in fig. 5, a column of the matching keyword selected by the user, the corresponding audio cropping bar, and the cropping button above the corresponding audio cropping bar are displayed in bold.

It should be noted that, in the actual implementation process, the column of the matched keyword, the audio cropping bar, and the cropping button above the audio cropping bar selected by the user may also be displayed in a different manner, such as thickening, enlarging, or changing color, and the specific display effect may be set by the interface developer, which is not limited specifically here.

When a clipping instruction sent by a user is received, the audio to be clipped can be clipped according to the position information of the target audio segment in the audio to be clipped, for example, the starting time point or the ending time point of the target audio segment.

The user may send a clipping instruction by clicking a clipping button in a column of the selected matching keyword in the user interface of fig. 5 or a clipping button above the corresponding audio clipping bar, clip the audio to be clipped, obtain the user interface shown in fig. 6, and divide the entire audio to be clipped into two.

It should be noted that, in the actual application process, the target audio segment may also be directly cut out through the cutting operation, that is, the target audio segment is cut out once respectively according to the start time point and the end time point of the target audio segment in the audio to be cut.

In an optional embodiment, the audio to be clipped is clipped according to the position information of the target audio segment in the audio to be clipped, and the following steps S710 to S730 shown in fig. 7 may be further implemented:

step S710, according to the position information of the candidate audio frequency segment in the audio frequency to be cut, the candidate cutting position is positioned.

The candidate clipping position here refers to a position where the candidate audio piece is located in the audio to be clipped. As shown in fig. 4, the position information of the audio clip to which the matching keyword belongs may be displayed in the form of audio crop bars at corresponding positions of the audio track 401, each audio crop bar representing one candidate crop position.

The user may select the position to be cropped as the target cropping position directly from the candidate positions, for example, directly from the cropping bar on the audio track of fig. 4, and the user interface shown in fig. 5 may also be obtained.

Further, the user can also determine the target clipping position by performing the following step S720.

Step S720, according to the position information of the target audio frequency segment in the audio frequency to be cut, the target cutting position is determined from the candidate cutting positions.

The target clipping position is one of the candidate clipping positions, and is automatically located from the candidate clipping positions according to the clipping position corresponding to the target audio segment selected by the user. The audio clip bar that is bolded on the audio track shown in fig. 5 can be considered as a target clip position.

And step S730, cutting the audio to be cut at the target cutting position.

The user can cut the audio to be cut by clicking the cutting button above the thickened audio cutting bar in the user interface of fig. 5, and the audio to be cut is divided into two parts.

In the step shown in fig. 7, by determining the target clipping position from the candidate clipping positions, the audio segment is associated with the clipping position, which is beneficial to presenting the candidate audio segment and the position of the target candidate segment in the whole audio to be clipped to the user in a global scope, so that the user can clip.

In an alternative embodiment, at the target clipping position, the audio to be clipped is clipped, and the audio can also be clipped by adjusting the clipping position as shown in fig. 8, which specifically includes the following steps S810 to S820:

step S810, receiving an adjustment instruction of a target clipping position on the audio track.

The user can adjust the target clipping position by sliding the audio clipping bar as shown in fig. 5, where the adjustment instruction may be a sliding operation performed by the user.

And step S820, cutting the audio to be cut according to the adjusting instruction.

After the target cutting position is determined, the cutting position can be finely adjusted in the step shown in fig. 8, so that the requirement of a user on more accurate cutting is met, and the cutting is more flexible and convenient.

Exemplary embodiments of the present disclosure also provide an audio cropping apparatus, as shown in fig. 9, the audio cropping apparatus 900 may include:

a keyword obtaining module 910, configured to obtain one or more search keywords of an audio to be clipped;

a candidate audio segment determining module 920, configured to match the search keyword with a subtitle information base of the audio to be clipped, so as to determine one or more candidate audio segments;

a target audio segment determining module 930, configured to determine a target audio segment from the candidate audio segments according to a user selection result;

and the audio clipping module 940 is configured to clip the audio to be clipped according to the position information of the target audio segment in the audio to be clipped.

In an alternative embodiment, the audio cropping device 900 further includes: the audio segment dividing module is used for dividing the audio to be cut into a plurality of audio segments; the position information recording module is used for recording the position information of each audio clip in the audio to be cut; and the audio identification module is used for identifying the audio to be cut into character information and storing the character information in a caption information base, wherein the character information carries the position information corresponding to the audio clip.

In an alternative embodiment, in the audio cropping device 900, the position information of the audio segment in the audio to be cropped includes any one or more of the following time points: a start time point of the audio segment, and an end time point of the audio segment.

In an alternative embodiment, the candidate audio piece determination module 920 is configured to: matching the search keywords with a subtitle information base of the audio to be cut to determine one or more matched keywords; and determining the candidate audio clips according to the matched keywords.

In an optional implementation manner, in the candidate segment determining module, the matching keyword carries position information corresponding to the candidate audio segment.

In an alternative embodiment, the audio cropping module 940 includes: the candidate cutting position positioning module is used for positioning candidate cutting positions according to the position information of the candidate audio frequency fragments in the audio frequency to be cut; the target cutting position determining module is used for determining a target cutting position from the candidate cutting positions according to the position information of the target audio clip in the audio to be cut; and the audio cutting submodule is used for cutting the audio to be cut at the target cutting position.

In an alternative embodiment, the audio cropping sub-module is configured to: receiving an adjusting instruction of a target cutting position on an audio track; and cutting the audio to be cut according to the adjusting instruction.

The specific details of each part in the audio cropping device 900 are described in detail in the method part embodiment, and details that are not disclosed may be referred to in the method part embodiment, and thus are not described again.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described audio cropping method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing an electronic device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the electronic device. The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above audio clipping method. An electronic device 1000 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 1000 may be embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: at least one processing unit 1010, at least one memory unit 1020, a bus 1030 that couples various system components including the memory unit 1020 and the processing unit 1010, and a display unit 1040.

The memory unit 1020 stores program code that may be executed by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, processing unit 1010 may perform any one or more of the method steps of fig. 1, 3, 7, and 8.

The memory unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)1021 and/or a cache memory unit 1022, and may further include a read-only memory unit (ROM) 1023.

Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. An audio cropping method, comprising:

acquiring one or more search keywords of the audio to be cut;

matching the search keywords with the subtitle information base of the audio to be cut to determine one or more candidate audio segments;

determining a target audio clip from the candidate audio clips according to a user selection result;

and cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut.

2. The method of claim 1, wherein prior to obtaining one or more search keywords for audio to be cropped, the method further comprises:

dividing the audio to be cut into a plurality of audio segments;

recording the position information of each audio clip in the audio to be cut;

and identifying the audio to be cut into text information and storing the text information in the caption information base, wherein the text information carries the position information corresponding to the audio clip.

3. The method according to claim 2, wherein the position information of the audio segment in the audio to be clipped comprises any one or more of the following time points:

a start time point of the audio segment, and an end time point of the audio segment.

4. The method of claim 1, wherein the matching the search keyword with a caption information base of the audio to be clipped to determine one or more candidate audio segments comprises:

matching the search keywords with the subtitle information base of the audio to be cut to determine one or more matched keywords;

and determining the candidate audio clips according to the matched keywords.

5. The method of claim 4, wherein the matching keyword carries position information corresponding to the candidate audio segment.

6. The method according to claim 1, wherein the clipping the audio to be clipped according to the position information of the target audio segment in the audio to be clipped comprises:

positioning candidate cutting positions according to the position information of the candidate audio clips in the audio to be cut;

determining a target clipping position from the candidate clipping positions according to the position information of the target audio clip in the audio to be clipped;

and at the target cutting position, cutting the audio to be cut.

7. The method according to claim 6, wherein the cropping the audio to be cropped at the target cropping position comprises:

receiving an adjusting instruction of the target cutting position on the audio track;

and cutting the audio to be cut according to the adjusting instruction.

8. An audio cropping device, comprising:

the keyword acquisition module is used for acquiring one or more search keywords of the audio to be cut;

the candidate audio clip determining module is used for matching the search keyword with the subtitle information base of the audio to be cut so as to determine one or more candidate audio clips;

the target audio clip determining module is used for determining a target audio clip from the candidate audio clips according to a user selection result;

and the audio cutting module is used for cutting the audio to be cut according to the position information of the target audio clip in the audio to be cut.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 7 via execution of the executable instructions.