CN111048094A - Audio information adjusting method, device, equipment and medium - Google Patents

Audio information adjusting method, device, equipment and medium Download PDF

Info

Publication number
CN111048094A
CN111048094A CN201911174875.2A CN201911174875A CN111048094A CN 111048094 A CN111048094 A CN 111048094A CN 201911174875 A CN201911174875 A CN 201911174875A CN 111048094 A CN111048094 A CN 111048094A
Authority
CN
China
Prior art keywords
audio information
audio
adjusted
sentence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911174875.2A
Other languages
Chinese (zh)
Inventor
杨扬
张轶
马颖江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201911174875.2A priority Critical patent/CN111048094A/en
Publication of CN111048094A publication Critical patent/CN111048094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses an audio information adjusting method, an audio information adjusting device, audio information adjusting equipment and an audio information adjusting medium, which are used for solving the problem that the efficiency of adjusting audio information to be adjusted in the prior art is low. In the invention, the target standard audio information corresponding to the audio information to be adjusted is determined according to the collected semantics of the audio information to be adjusted and the semantics of each standard audio information which is preserved in advance; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement; and aiming at each statement, if the first characteristic vector of the audio frequency of the statement in the audio information to be adjusted has a target attribute characteristic which is dissimilar to the second characteristic vector of the audio frequency of the statement in the target standard audio information, adjusting a parameter corresponding to the target attribute characteristic of the audio frequency of the statement in the audio information to be adjusted. The efficiency of audio information adjustment is improved by automatically adjusting the audio information to be adjusted.

Description

Audio information adjusting method, device, equipment and medium
Technical Field
The present invention relates to the field of sound processing technologies, and in particular, to a method, an apparatus, a device, and a medium for adjusting audio information.
Background
With the progress of science and technology, the development of society, the improvement of economy, the rapid popularization of the internet and the arrival of the 5G network era, more and more users like showing themselves in a mode of shooting audios and videos. Users can perform entertainment activities such as singing K, beating and shaking voice and the like through the network, but the talents of many users are limited, and the users do not like to show themselves on the Internet due to the fact that singing is run and speaking is not smooth.
In addition, in the dubbing field of the video industry, if there is a bit of error in dubbing, dubbing again is needed, and in order to reduce the number of dubbing again, a method of adjusting the audio information of the dubbing audio frequency can be adopted to make the dubbing audio frequency meet the dubbing requirement; in addition, when the singer records the song, the audio information of the recorded song audio is adjusted in order to make the recorded song more professional.
However, in the prior art, audio information adjustment is performed on audio information to be adjusted, and mainly a professional sound engineer manually performs audio information adjustment, so that the efficiency of audio information adjustment is low.
Disclosure of Invention
The embodiment of the invention provides an audio information adjusting method, an audio information adjusting device, audio information adjusting equipment and an audio information adjusting medium, which are used for solving the problem that the efficiency of adjusting audio information of audio information to be adjusted is low.
The embodiment of the invention provides an audio information adjusting method, which comprises the following steps:
determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is preserved in advance;
determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
for each statement, if a target attribute feature which is dissimilar to a second feature vector of the audio of the statement in the target standard audio information exists in a first feature vector of the audio of the statement in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the statement in the audio information to be adjusted, so that the audio of the statement in the audio information to be adjusted is similar to the audio of the statement in the target standard audio information.
Further, before determining the target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each pre-stored standard audio information, the method further includes:
and filtering the audio information to be adjusted.
Further, the attribute characteristics of the audio comprise the average volume of the audio, the duration of the audio, the average frequency of each word in the audio and the beat of the audio.
Further, each sentence in the audio information to be adjusted is determined by:
converting the audio information to be adjusted into a text;
and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
Further, determining that target attribute features which are dissimilar to those in the second feature vector of the audio of the sentence in the target standard audio information exist in the first feature vector of the audio of the sentence in the audio information to be adjusted includes:
determining the similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio of the sentence in the target standard audio information;
if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
The embodiment of the invention provides an audio information adjusting device, which comprises:
the determining module is used for determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is stored in advance; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
and an adjusting module, configured to, for each sentence, if a target attribute feature that is dissimilar to a second feature vector of the audio of the sentence in the target standard audio information exists in a first feature vector of the audio of the sentence in the audio information to be adjusted, adjust a parameter corresponding to the target attribute feature of the audio of the sentence in the audio information to be adjusted, so that the audio of the sentence in the audio information to be adjusted is similar to the audio of the sentence in the target standard audio information.
Further, the apparatus further comprises:
and the filtering module is used for filtering the audio information to be adjusted.
Further, the determining module is specifically configured to convert the audio information to be adjusted into a text; and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
Further, the adjusting module is specifically configured to determine a similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and a corresponding attribute feature of the audio of the sentence in the target standard audio information; if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
An embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory, where the memory is used to store program instructions, and the processor is used to implement the steps of any one of the above audio information adjusting methods when executing a computer program stored in the memory.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of any one of the above audio information adjusting methods.
The embodiment of the invention provides an audio information adjusting method, an audio information adjusting device, audio information adjusting equipment and an audio information adjusting medium, wherein the method comprises the steps of determining target standard audio information corresponding to audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is stored in advance; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement; for each sentence, if the first feature vector of the audio frequency of the sentence in the audio information to be adjusted has a target attribute feature which is not similar to the second feature vector of the audio frequency of the sentence in the target standard audio information, adjusting a parameter corresponding to the target attribute feature of the audio frequency of the sentence in the audio information to be adjusted, so that the audio frequency of the sentence in the audio information to be adjusted is similar to the audio frequency of the sentence in the target standard audio information. The automatic adjustment of the audio information to be adjusted is realized, so that the efficiency of adjusting the audio information is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic process diagram of an audio information adjusting method according to an embodiment of the present invention;
FIG. 2 is a schematic process diagram of another audio information adjustment method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an audio information adjusting apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to improve the efficiency of audio information adjustment, embodiments of the present invention provide an audio information adjustment method, apparatus, device, and medium.
Example 1:
fig. 1 is a schematic diagram of an audio information adjustment process provided in an embodiment of the present invention, where the process includes the following steps:
s101: and determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is pre-stored.
The audio information adjusting method is applied to intelligent equipment, such as intelligent equipment such as a smart phone and a PC. The audio information to be adjusted is the audio information input by the user and collected by the intelligent equipment.
The input audio information is generally a simulation of a certain standard audio information, and each standard audio information is acquired in advance and stored for adjustment, wherein the standard audio information can be dubbed audio information recorded in advance by professional dubbing actors, or song audio information recorded in advance by professional song singers, or original song audio information.
When determining the target standard audio information corresponding to the audio information to be adjusted, mainly by a semantic matching method, firstly, obtaining the semantics of the audio information to be adjusted, and determining the target standard audio information matched with the semantics of the audio information to be adjusted from all pre-stored standard audio information.
For example, when the audio information to be adjusted is song audio information, the semantics of the song audio information, that is, the lyrics of the song audio information, are obtained, and the original song audio information of the song with the highest matching degree with the lyrics is determined from the pre-stored standard audio information according to a semantic matching method.
The process of determining the matched audio information by the semantic matching method belongs to the prior art, and is not described in detail in the embodiment of the present invention.
S102: and determining a first feature vector corresponding to each statement in the audio information to be adjusted and including the attribute feature of the audio of the statement.
The audio of each sentence in the audio information to be adjusted refers to a part of audio in the audio information to be adjusted, that is, an audio segment, and since the audio segment corresponding to each sentence in the target standard audio information is known and the target standard audio information corresponds to each sentence in the audio information to be adjusted, the audio segment corresponding to each sentence in the audio information to be adjusted can be determined according to the audio segment corresponding to each sentence in the target standard audio information.
The attribute characteristics of the audio are sound characteristics affecting the sense of the user, and include sound characteristic information such as volume, tone quality, duration, intensity, pitch, accent, rhythm and the like. Therefore, when determining the attribute characteristics of the audio of each sentence, at least one of the above attribute characteristics can be selected as the attribute characteristic describing the audio, such as volume, beat, and the like.
After determining which sound feature information is included in the compared attribute features in the embodiment of the invention, the first feature vector of the audio of each sentence can be determined according to the determined attribute features.
And determining a first feature vector of the audio of each sentence in the audio information to be adjusted according to the attribute features of the audio of each sentence in the audio information to be adjusted. Wherein the process of determining the property features of audio belongs to the prior art.
S103: for each statement, if a target attribute feature which is dissimilar to a second feature vector of the audio of the statement in the target standard audio information exists in a first feature vector of the audio of the statement in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the statement in the audio information to be adjusted, so that the audio of the statement in the audio information to be adjusted is similar to the audio of the statement in the target standard audio information.
In order to make the audio frequency of each sentence of the audio information to be adjusted similar to the audio frequency of each sentence of the target standard audio information, it may be determined whether the audio frequency of each sentence of the audio information to be adjusted is similar to the audio frequency of the corresponding sentence of the target standard audio information, and if not, the audio information to be adjusted is adjusted. Because the audio of each sentence of the audio information to be adjusted includes a plurality of attribute features, that is, there are multi-dimensional attribute features, when determining whether to be similar, a determination may be made for each dimensional attribute feature in the first feature vector of the audio of each sentence of the audio information to be adjusted.
And aiming at each statement of the audio information to be adjusted, identifying each dimension attribute characteristic of the audio of the statement of the audio information to be adjusted, judging whether the attribute characteristic is similar to the corresponding attribute characteristic of the audio of the statement of the target standard audio information, and if not, adjusting the parameter corresponding to the dissimilar attribute characteristic of the audio of the statement in the audio information to be adjusted.
When determining the corresponding sentences, because the first feature vector of the audio of each sentence of the audio information to be adjusted is already determined, and the second feature vector of the audio of each sentence of the target standard audio information is predetermined, whether the corresponding attribute features in the corresponding first feature vector and the corresponding second feature vector are similar or not can be respectively determined according to the sequence of the sentences in the audio information to be adjusted and the target standard audio information.
When the adjustment is performed, because the dissimilar target attribute features and the audio frequency of the corresponding sentence are already determined, the adjustment can be performed on the audio frequency of the sentence in the audio information to be adjusted. It is state of the art to specifically adjust which parameters may change the target property feature of the corresponding audio of the dissimilar sentence.
The number of the dissimilar target attribute features may be one or more. For example, taking the dissimilar target attribute feature as the audio duration, when the audio duration corresponding to the sentence in the audio information to be adjusted is longer than the audio duration corresponding to the sentence in the standard audio information, the audio duration corresponding to the sentence is adjusted to be consistent with the audio duration corresponding to the sentence in the target standard audio information.
The embodiment of the invention provides an audio information adjusting method, an audio information adjusting device, audio information adjusting equipment and an audio information adjusting medium, wherein the method comprises the steps of determining target standard audio information corresponding to audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is stored in advance; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement; for each sentence, if a target attribute feature which is not similar to a second feature vector of the audio of the sentence in the target standard audio information exists in a first feature vector of the audio of the sentence in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the sentence in the audio information to be adjusted, so that the audio of the sentence in the audio information to be adjusted is similar to the audio of the sentence in the target standard audio information. The automatic adjustment of the audio information to be adjusted is realized, so that the efficiency of adjusting the audio information is improved.
Example 2:
in order to improve the accuracy of audio information adjustment, on the basis of the above embodiment, in an embodiment of the present invention, before determining target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each standard audio information that is pre-stored, the method further includes:
and filtering the audio information to be adjusted.
In order to improve the accuracy of audio information adjustment, before the target standard audio information corresponding to the audio information to be adjusted is determined, the audio information to be adjusted can be filtered, so that the audio information to be adjusted only has audio information with biological voiceprint characteristics.
The filtering processing means deleting the audio information without the biological voiceprint feature contained in the audio information to be adjusted from the audio information to be adjusted, so that only the audio information with the biological voiceprint feature exists in the audio information to be adjusted.
Specifically, the audio information to be adjusted may be filtered by using audio filtering software in the prior art.
Example 3:
in order to accurately adjust the audio information, on the basis of the above embodiments, in an embodiment of the present invention, the attribute characteristics of the audio include an average volume of the audio, an audio duration, an average frequency of each word in the audio, and a beat of the audio.
In order to improve the efficiency of adjusting the audio information, when the audio information to be adjusted is adjusted, the attribute characteristics of the audio which has a large influence on the sense of the user are mainly adjusted, wherein the attribute characteristics of the audio comprise the average volume of the audio, the duration of the audio, the average frequency of each word in the audio and the beat of the audio.
The average volume of the audio is the average value of the sum of all volume values in the duration of the audio in the duration; the audio time length refers to the time length from the beginning to the end of the audio; the average frequency of each word in the audio is the average frequency of each word in the audio, and determining the average frequency of each word in the audio requires determining the average value of the sum of all frequency values of the word from the beginning to the end in the audio within the duration of the word; the beat of the audio is a combination rule of the strong beat and the weak beat to which the audio belongs, and common beats include 1/4 beats, 2/4 beats, 3/4 beats, 4/4 beats and the like.
The method for determining the average volume of the audio, the method for determining the average frequency of each word, and the method for determining the beat all belong to the prior art, and the process is not described in detail in the embodiment of the present invention.
Example 4:
in order to accurately adjust the audio information, on the basis of the above embodiments, in an embodiment of the present invention, each sentence in the audio information to be adjusted is determined as follows:
converting the audio information to be adjusted into a text;
and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
And when each sentence in the audio information to be adjusted is determined, converting the audio information to be adjusted into a text, and determining each sentence in the audio information to be adjusted through the text.
The method for converting the audio information to be adjusted into the text belongs to the prior art.
Specifically, a character string between any two adjacent punctuations in the text is determined as a sentence. The punctuation mark includes periods and commas, and also includes other punctuation marks which can distinguish sentences, such as exclamation marks, question marks and the like. Each sentence in the text corresponds to a segment of audio, i.e., corresponds to an audio segment.
The method for corresponding each sentence in the audio information conversion text to the audio segment in the audio information belongs to the prior art.
Example 5:
in order to accurately implement adjustment of audio information, on the basis of the foregoing embodiments, in an embodiment of the present invention, determining that there is a target attribute feature that is dissimilar in a first feature vector of an audio of a sentence in the audio information to be adjusted and in a second feature vector of the audio of the sentence in the target standard audio information includes:
determining the similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio of the sentence in the target standard audio information;
if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
The threshold is preset, each dimension attribute feature has a corresponding threshold, and the threshold corresponding to each dimension attribute feature may be the same or different. The threshold of each dimension attribute feature may be different according to a specific scenario, and may be set to different values. If the similarity between the audio information to be adjusted and the standard audio information is expected to be improved, the threshold corresponding to each dimension attribute feature may be set to be larger, and if the efficiency of adjusting the audio information to be adjusted is expected to be improved, the threshold corresponding to each dimension attribute feature may be set to be smaller.
For example, the attribute features are the average volume of the audio and the audio duration, the average volume of the audio and the audio duration may be set to the same threshold, for example, 0.9, 0.91, etc., or the thresholds corresponding to the average volume of the audio and the audio duration may be set to different values according to different requirements. If it is desired to increase the similarity of the audio information to be adjusted in the audio time length, the threshold of the audio time length may be set to be larger, for example, 0.97, 0.98, etc., and if it is desired to increase the efficiency of adjusting the average volume of the audio information to be adjusted, the threshold of the average volume of the audio may be set to be smaller, for example, 0.8, 0.75, etc.
For the audio frequency of each sentence in the audio information to be adjusted, when determining whether a target attribute feature that is dissimilar to a second feature vector of the audio frequency of the sentence in the target standard audio information exists in a first feature vector of the audio frequency of the sentence in the audio information to be adjusted, it is necessary to determine a similarity between each dimension attribute feature of the audio frequency of the sentence in the audio information to be adjusted and a corresponding attribute feature of the audio frequency of the sentence in the target standard audio information, and determine whether the similarity is smaller than a threshold corresponding to the attribute feature.
If the similarity is larger than the threshold corresponding to the attribute feature, the dimensional attribute feature of the audio of the statement in the audio information to be adjusted is similar to the attribute feature corresponding to the audio of the statement in the target standard audio information; if the similarity is smaller than the threshold corresponding to the attribute feature, it indicates that the dimensional attribute feature of the audio of the sentence in the audio information to be adjusted is not similar to the attribute feature corresponding to the audio of the sentence in the target standard audio information.
Specifically, the similarity between each dimension attribute feature of the audio frequency of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio frequency of the sentence in the target standard audio information is determined mainly by obtaining the parameter of each dimension attribute feature of the audio frequency of the sentence in the audio information to be adjusted and the parameter of the corresponding attribute feature of the audio frequency of the sentence in the target standard audio information.
The method for determining the introduction similarity respectively aims at each dimension of attribute information and comprises the following steps:
1. determining similarity of average volume of audio: obtaining the average volume value of the audio of the sentence in the audio information to be adjusted, judging the difference value between the average volume value of the audio of the sentence in the audio information to be adjusted and the average volume value of the audio of the sentence in the target standard audio, determining the ratio of the difference value to the average volume value of the audio of the sentence in the target standard audio, wherein the difference value between 1 and the ratio is the similarity of the average volume of the audio of the sentence in the audio information to be adjusted.
2. Determining similarity of durations of the audio: acquiring a time length value of the audio frequency of the sentence in the audio information to be adjusted, judging a difference value between the time length value of the audio frequency of the sentence in the audio information to be adjusted and the time length value of the audio frequency of the sentence in the target standard audio frequency, determining a ratio of the difference value to the time length value of the audio frequency of the sentence in the target standard audio frequency, wherein the difference value between 1 and the ratio is the similarity of the audio frequency time lengths of the sentence in the audio information to be adjusted.
3. Determining the similarity of the average frequency of each word in the audio: obtaining the average frequency value of each word in the audio frequency of the statement in the audio information to be adjusted, judging the difference value between the average frequency value of each word in the audio frequency of the statement in the audio information to be adjusted and the average frequency value of each word in the audio frequency of the statement in a target standard audio frequency, determining the ratio of the difference value to the average frequency value of each word in the audio frequency of the statement in the target standard audio frequency, wherein the difference value between 1 and the ratio is the similarity of the average frequency of each word in the audio frequency of the statement in the audio information to be adjusted.
4. Determining similarity of beats of audio: acquiring the beat of the audio of the sentence in the audio information to be adjusted, judging whether the beat of the audio of the sentence in the audio information to be adjusted and the beat of the audio of the sentence in the target standard audio belong to the same beat, if so, determining that the beat of the audio has the similarity of 1, namely the beat is completely similar, and if not, determining that the beat of the audio has the similarity of 0, namely the beat is completely dissimilar.
Example 6:
the following describes the audio information adjusting method according to the present invention by a specific embodiment. Taking the audio information to be adjusted as recorded audio information a as an example, fig. 2 is a schematic diagram of an audio information adjustment process provided by an embodiment of the present invention, where the process includes the following steps:
s201: and filtering the recorded audio information A.
S202: and determining target standard audio information B corresponding to the audio information A according to the filtered semantics of the audio information A and the pre-stored semantics of each standard audio information.
S203: and determining a first feature vector corresponding to each statement in the audio information A and containing the attribute features of the audio of the statement.
S204: for each sentence, if a target attribute feature which is dissimilar to a second feature vector of the audio of the sentence in the target standard audio information B exists in the first feature vector of the audio of the sentence in the audio information a, adjusting a parameter corresponding to the target attribute feature of the audio of the sentence in the audio information a, so that the audio of the sentence in the audio information a is similar to the audio of the sentence in the target standard audio information B.
S205: the adjusted audio information a is obtained.
Example 7:
fig. 3 is a schematic structural diagram of an audio information adjusting apparatus according to an embodiment of the present invention, and on the basis of the foregoing embodiments, an audio information adjusting apparatus according to an embodiment of the present invention is further provided, where the apparatus includes:
the determining module 301 is configured to determine target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each piece of standard audio information that is pre-stored; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
an adjusting module 302, configured to, for each sentence, if a target attribute feature that is dissimilar to a second feature vector of the audio of the sentence in the target standard audio information exists in a first feature vector of the audio of the sentence in the audio information to be adjusted, adjust a parameter corresponding to the target attribute feature of the audio of the sentence in the audio information to be adjusted, so that the audio of the sentence in the audio information to be adjusted is similar to the audio of the sentence in the target standard audio information.
The device further comprises:
and the filtering module 303 is configured to filter the audio information to be adjusted.
The determining module 301 is specifically configured to convert the audio information to be adjusted into a text; and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
The adjusting module 302 is specifically configured to determine a similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and a corresponding attribute feature of the audio of the sentence in the target standard audio information; if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
Example 8:
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and on the basis of the foregoing embodiments, an electronic device according to an embodiment of the present invention further includes a processor 401 and a memory 402, where the processor 401 is configured to implement the steps of the above-mentioned underwriting data processing method when executing a computer program stored in the memory 402.
Alternatively, the processor 401 may be a CPU (central processing unit), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).
A processor 401 for executing the following steps when following the computer program stored in the memory 402:
determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is preserved in advance;
determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
for each statement, if a target attribute feature which is dissimilar to a second feature vector of the audio of the statement in the target standard audio information exists in a first feature vector of the audio of the statement in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the statement in the audio information to be adjusted, so that the audio of the statement in the audio information to be adjusted is similar to the audio of the statement in the target standard audio information.
Before determining the target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each standard audio information pre-stored, the method further comprises:
and filtering the audio information to be adjusted.
The attribute characteristics of the audio comprise the average volume of the audio, the duration of the audio, the average frequency of each word in the audio and the beat of the audio.
Determining each sentence in the audio information to be adjusted by:
converting the audio information to be adjusted into a text;
and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
Determining that target attribute features which are dissimilar to those in the second feature vector of the audio of the sentence in the target standard audio information exist in the first feature vector of the audio of the sentence in the audio information to be adjusted, including:
determining the similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio of the sentence in the target standard audio information;
if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
Example 9:
on the basis of the foregoing embodiments, an embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to perform the following steps:
determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is preserved in advance;
determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
for each statement, if a target attribute feature which is dissimilar to a second feature vector of the audio of the statement in the target standard audio information exists in a first feature vector of the audio of the statement in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the statement in the audio information to be adjusted, so that the audio of the statement in the audio information to be adjusted is similar to the audio of the statement in the target standard audio information.
Before determining the target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each standard audio information pre-stored, the method further comprises:
and filtering the audio information to be adjusted.
The attribute characteristics of the audio comprise the average volume of the audio, the duration of the audio, the average frequency of each word in the audio and the beat of the audio.
Determining each sentence in the audio information to be adjusted by:
converting the audio information to be adjusted into a text;
and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
Determining that target attribute features which are dissimilar to those in the second feature vector of the audio of the sentence in the target standard audio information exist in the first feature vector of the audio of the sentence in the audio information to be adjusted, including:
determining the similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio of the sentence in the target standard audio information;
if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

1. An audio information adjusting method, the method comprising:
determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is preserved in advance;
determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
for each statement, if a target attribute feature which is dissimilar to a second feature vector of the audio of the statement in the target standard audio information exists in a first feature vector of the audio of the statement in the audio information to be adjusted, adjusting a parameter corresponding to the target attribute feature of the audio of the statement in the audio information to be adjusted, so that the audio of the statement in the audio information to be adjusted is similar to the audio of the statement in the target standard audio information.
2. The method according to claim 1, wherein before determining the target standard audio information corresponding to the audio information to be adjusted according to the collected semantics of the audio information to be adjusted and the semantics of each standard audio information pre-stored, the method further comprises:
and filtering the audio information to be adjusted.
3. The method of claim 1, wherein the attribute features of the audio comprise an average volume of the audio, a duration of the audio, an average frequency of each word in the audio, and a tempo of the audio.
4. The method of claim 1, wherein each sentence in the audio information to be adjusted is determined by:
converting the audio information to be adjusted into a text;
and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
5. The method of claim 1, wherein determining that there is a target attribute feature in the first feature vector of the audio of the sentence in the audio information to be adjusted that is dissimilar from the target attribute feature in the second feature vector of the audio of the sentence in the target standard audio information comprises:
determining the similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and the corresponding attribute feature of the audio of the sentence in the target standard audio information;
if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
6. An audio information adjusting apparatus, comprising:
the determining module is used for determining target standard audio information corresponding to the audio information to be adjusted according to the acquired semantics of the audio information to be adjusted and the semantics of each piece of standard audio information which is stored in advance; determining a first feature vector corresponding to each statement in the audio information to be adjusted and containing the attribute features of the audio of the statement;
and an adjusting module, configured to, for each sentence, if a target attribute feature that is dissimilar to a second feature vector of the audio of the sentence in the target standard audio information exists in a first feature vector of the audio of the sentence in the audio information to be adjusted, adjust a parameter corresponding to the target attribute feature of the audio of the sentence in the audio information to be adjusted, so that the audio of the sentence in the audio information to be adjusted is similar to the audio of the sentence in the target standard audio information.
7. The apparatus of claim 6, further comprising:
and the filtering module is used for filtering the audio information to be adjusted.
8. The apparatus according to claim 6, wherein the determining module is specifically configured to convert the audio information to be adjusted into a text; and determining a character string between any two adjacent punctuations in the text as a sentence, wherein the punctuations comprise commas and periods.
9. The apparatus according to claim 6, wherein the adjusting module is specifically configured to determine a similarity between each attribute feature of the audio of the sentence in the audio information to be adjusted and a corresponding attribute feature of the audio of the sentence in the target standard audio information; if the attribute features with the similarity smaller than the corresponding threshold exist, the attribute features with the similarity smaller than the corresponding threshold are determined as target attribute features, and the target attribute features which are dissimilar to the second feature vectors of the audios of the sentences in the target standard audio information exist in the first feature vectors of the audios of the to-be-adjusted audios.
10. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to carry out the steps of the audio information adaptation method according to any of claims 1-5 when executing a computer program stored in the memory.
11. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the audio information adaptation method according to any one of claims 1-5.
CN201911174875.2A 2019-11-26 2019-11-26 Audio information adjusting method, device, equipment and medium Pending CN111048094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911174875.2A CN111048094A (en) 2019-11-26 2019-11-26 Audio information adjusting method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911174875.2A CN111048094A (en) 2019-11-26 2019-11-26 Audio information adjusting method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN111048094A true CN111048094A (en) 2020-04-21

Family

ID=70233431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911174875.2A Pending CN111048094A (en) 2019-11-26 2019-11-26 Audio information adjusting method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111048094A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105810211A (en) * 2015-07-13 2016-07-27 维沃移动通信有限公司 Audio frequency data processing method and terminal
CN106611603A (en) * 2015-10-26 2017-05-03 腾讯科技(深圳)有限公司 Audio processing method and audio processing device
CN108337558A (en) * 2017-12-26 2018-07-27 努比亚技术有限公司 Audio and video clipping method and terminal
CN108665881A (en) * 2018-03-30 2018-10-16 北京小唱科技有限公司 Repair sound controlling method and device
CN108766452A (en) * 2018-04-03 2018-11-06 北京小唱科技有限公司 Repair sound method and device
CN110148427A (en) * 2018-08-22 2019-08-20 腾讯数码(天津)有限公司 Audio-frequency processing method, device, system, storage medium, terminal and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105810211A (en) * 2015-07-13 2016-07-27 维沃移动通信有限公司 Audio frequency data processing method and terminal
CN106611603A (en) * 2015-10-26 2017-05-03 腾讯科技(深圳)有限公司 Audio processing method and audio processing device
CN108337558A (en) * 2017-12-26 2018-07-27 努比亚技术有限公司 Audio and video clipping method and terminal
CN108665881A (en) * 2018-03-30 2018-10-16 北京小唱科技有限公司 Repair sound controlling method and device
CN108766452A (en) * 2018-04-03 2018-11-06 北京小唱科技有限公司 Repair sound method and device
CN110148427A (en) * 2018-08-22 2019-08-20 腾讯数码(天津)有限公司 Audio-frequency processing method, device, system, storage medium, terminal and server

Similar Documents

Publication Publication Date Title
CN109657213B (en) Text similarity detection method and device and electronic equipment
WO2020024690A1 (en) Speech labeling method and apparatus, and device
CN107464555B (en) Method, computing device and medium for enhancing audio data including speech
KR102128926B1 (en) Method and device for processing audio information
CN106960051B (en) Audio playing method and device based on electronic book and terminal equipment
US10665218B2 (en) Audio data processing method and device
CN105161116B (en) The determination method and device of multimedia file climax segment
CN110491383A (en) A kind of voice interactive method, device, system, storage medium and processor
CN109671416B (en) Music melody generation method and device based on reinforcement learning and user terminal
CN108766451B (en) Audio file processing method and device and storage medium
CN106302987A (en) A kind of audio frequency recommends method and apparatus
CN105718486B (en) Online humming retrieval method and system
CN109326270A (en) Generation method, terminal device and the medium of audio file
CN112489676A (en) Model training method, device, equipment and storage medium
CN111108557A (en) Method of modifying a style of an audio object, and corresponding electronic device, computer-readable program product and computer-readable storage medium
CN109190879B (en) Method and device for training adaptation level evaluation model and evaluating adaptation level
CN105244041A (en) Song audition evaluation method and device
CN110942765B (en) Method, device, server and storage medium for constructing corpus
CN106550268B (en) Video processing method and video processing device
CN110781275B (en) Question answering distinguishing method based on multiple characteristics and computer storage medium
CN110312161B (en) Video dubbing method and device and terminal equipment
CN110708619B (en) Word vector training method and device for intelligent equipment
KR20160056104A (en) Analyzing Device and Method for User's Voice Tone
CN106663110B (en) Derivation of probability scores for audio sequence alignment
WO2016110156A1 (en) Voice search method and apparatus, terminal and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421

RJ01 Rejection of invention patent application after publication