CN114863898A - Vehicle karaoke audio processing method and system and storage medium - Google Patents
Vehicle karaoke audio processing method and system and storage medium Download PDFInfo
- Publication number
- CN114863898A CN114863898A CN202110153880.6A CN202110153880A CN114863898A CN 114863898 A CN114863898 A CN 114863898A CN 202110153880 A CN202110153880 A CN 202110153880A CN 114863898 A CN114863898 A CN 114863898A
- Authority
- CN
- China
- Prior art keywords
- mouth
- parameter
- vehicle
- audio signal
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 230000005236 sound signal Effects 0.000 claims abstract description 54
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000001755 vocal effect Effects 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 31
- 238000012937 correction Methods 0.000 claims description 13
- 238000013135 deep learning Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 6
- 238000000034 method Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Abstract
The invention relates to a vehicle karaoke audio processing method and system, and a storage medium, comprising: acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment; acquiring a mouth type continuous frame image of a singer acquired by vehicle-mounted camera equipment, and identifying the mouth type continuous frame image to obtain a mouth type acoustic parameter; obtaining corresponding singing content parameters according to the mouth shape acoustic parameters; generating a first audio signal according to the voiceprint parameters and the singing content parameters; acquiring a second audio signal corresponding to the current song accompaniment music; and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal. The invention can realize the purpose that the singer can sing the song even if the singer forgets words and sings mistakes, thereby improving the user experience effect of karaoke in the vehicle.
Description
Technical Field
The invention relates to the technical field of audio processing, in particular to a vehicle karaoke audio processing method and system and a computer readable storage medium.
Background
Currently, karaoke in a car mainly mixes a vocal signal and a vocal accompaniment audio signal input by a singer, and then plays the audio signal obtained by mixing the voices. However, in the practical application process, the singer may have the situation of singing with a small voice, forgetting words or singing in a wrong way, and in this situation, the experience effect of the karaoke user in the car is not good.
Disclosure of Invention
The invention aims to provide a vehicle karaoke audio processing method and system and a computer readable storage medium, so that a singer can sing a song even under the condition that the singer has little singing voice, forgets words or sings a mistake, and the user experience effect of karaoke in a vehicle is improved.
The invention provides a vehicle karaoke audio processing method in a first aspect, which comprises the following steps:
acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment;
acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and identifying the mouth-shaped continuous frame images by using a pre-trained deep learning network model to obtain mouth-shaped acoustic parameters;
obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
generating a first audio signal according to the voiceprint parameters and the singing content parameters;
acquiring a second audio signal corresponding to the current song accompaniment music;
and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
Optionally, the mouth acoustic parameters include a mouth reliability parameter, and an agreement parameter between the mouth and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
Optionally, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Optionally, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
and determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
Optionally, the mouth-shaped continuous frame images are acquired at the same time as the voiceprint parameters.
Optionally, the voiceprint parameters include a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, and a harmonic-to-noise ratio parameter.
A second aspect of the present invention provides a car karaoke audio processing system, including:
the voice print acquisition unit is used for acquiring voice print parameters of the singer acquired by the vehicle-mounted voice collecting equipment;
the acoustic parameter acquisition unit is used for acquiring mouth type continuous frame images of singers acquired by the vehicle-mounted camera equipment and identifying the mouth type continuous frame images by utilizing a pre-trained deep learning network model to acquire mouth type acoustic parameters;
the singing content acquisition unit is used for acquiring corresponding singing content parameters according to the mouth shape acoustic parameters;
the first audio acquisition unit is used for generating a first audio signal according to the voiceprint parameter and the singing content parameter;
the first audio acquisition unit is used for acquiring a second audio signal corresponding to the current song accompaniment music; and
and the third audio acquisition unit is used for carrying out audio mixing processing on the first audio signal and the second audio signal to obtain a third audio signal and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
Optionally, the mouth acoustic parameters include a mouth reliability parameter, and an agreement parameter between the mouth and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
Optionally, the acoustic parameter obtaining unit is specifically configured to:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
A third aspect of the present invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the vehicle karaoke audio processing method of the first aspect.
Aspects of the present invention respectively provide a method and a system for processing karaoke audio of a vehicle, and a computer-readable storage medium, which, when implemented, have at least the following advantages:
the method has the advantages that the singing content to be played is obtained through intelligent identification according to the mouth type continuous frame images of the singer, the singing content can be the correction or adjustment of the singing content of the singer, the singing content sung by the singer in an ideal state can be obtained through combination with the unique voiceprint characteristics of the singer, and finally the singing content and accompanying sound mixing are processed, output and played, so that the purpose that the singer can sing a song under the condition that the singer has small singing voice, forgets words or sings mistakes is achieved, and the user experience effect of the karaoke in the vehicle is improved.
Additional features and advantages of the invention will be set forth in the description which follows.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for processing karaoke audio of a vehicle according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a car karaoke audio processing system according to another embodiment of the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.
Referring to fig. 1, an embodiment of the present invention provides a car karaoke audio processing method, including the following steps S1 to S6:
step S1, acquiring vocal print parameters of the singer collected by the vehicle-mounted voice collecting equipment;
specifically, the voiceprint parameter is a parameter characterizing the voice characteristics of the singer, and in a specific example, the voiceprint parameter comprises a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter and a harmonic noise ratio parameter of the voice of the singer;
step S2, acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and recognizing the mouth-shaped continuous frame images by using a pre-trained deep learning network model to acquire mouth-shaped acoustic parameters;
specifically, the mouth type acoustic parameters are used for recording the voice content to be expressed with the mouth type of the singer;
in a specific example, the mouth acoustic parameters include, but are not limited to, mouth reliability parameters, mouth fit with lyrics of a current song; the method comprises the steps that a mouth-shaped action of a multi-frame continuous image corresponds to lyric content, and mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and current song lyrics;
it can be understood that a singer needs a certain time to complete a mouth-type action, during which time the main car camera device will take a plurality of consecutive images, and therefore a mouth-type action corresponds to a plurality of consecutive images, and a mouth-type action actually corresponds to a lyrics content, such as "i", "you", "he";
wherein, the mouth-type reliability parameter indicates whether the mouth-type action is reliable, for example, if the mouth-type action is not obvious, the reliability is relatively low at the moment, and for example, if the mouth-type action is obvious, the reliability is relatively high at the moment; specifically, the mouth shape reliability parameter is represented by a value of 0-100%, and the higher the numerical value is, the higher the reliability is;
the matching degree parameter of the mouth shape and the lyrics of the current song can determine the lyrics of the music playing according to the image frame time stamp corresponding to the mouth shape by using the lyrics corresponding to the mouth shape, and then match 2 lyrics to determine the matching degree parameter of the mouth shape and the lyrics of the current song; specifically, the goodness of fit parameter is represented by a value of 0-100%, and the higher the numerical value is, the higher the goodness of fit is;
it should be noted that the deep learning network model is an intelligent tool that can be used for image frame recognition, and can achieve the recognition purpose through training; only the input layer and the output layer based on the existing deep learning network model need to be adjusted, so that the input layer of the deep learning network model corresponds to the mouth-shaped continuous frame image in the embodiment, the output layer corresponds to the mouth-shaped acoustic parameters in the embodiment, and given training samples, the deep learning network model can be trained by self to achieve the identification purpose required by the embodiment;
step S3, obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
in a specific example, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
according to the mouth shape credibility parameter and the matching degree parameter of the mouth shape and the lyrics of the current song, the lyrics content corresponding to each mouth shape action is reserved or corrected; wherein, the correcting comprises selecting the correct lyrics corresponding to the current song to replace the lyrics content, or adjusting the lyrics content to make the similarity between the lyrics content and the correct lyrics corresponding to the current song larger than a preset threshold value;
specifically, whether lyric content corresponding to the mouth-shaped action is reserved or corrected is determined according to a comparison result of the mouth-shaped reliability parameter, the goodness of fit parameter and a preset threshold, for example, if the mouth-shaped reliability parameter corresponding to the mouth-shaped action is greater than the reliability threshold and the goodness of fit parameter is greater than the goodness of fit threshold, it is determined that one lyric content corresponding to the mouth-shaped action is reserved, otherwise, correction is performed;
more specifically, the similarity of the lyrics may be calculated in a text distance calculation manner, so that the distance between 2 words is smaller than a preset threshold, and the distance may be an euclidean distance, a manhattan distance, or the like;
in another specific example, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
determining to reserve or correct a sentence of lyric content corresponding to a plurality of mouth shape actions according to the mouth shape credibility parameter, the mouth shape and the matching degree parameter of the current song lyric; wherein, the correcting comprises selecting a correct lyric corresponding to the current song to replace the content of the lyric, or adjusting the content of the lyric to make the similarity between the content of the lyric and the correct lyric corresponding to the current song larger than a preset threshold value;
specifically, whether a lyric content corresponding to the mouth-shaped action is reserved or corrected is determined according to a comparison result of the mouth-shaped reliability parameter, the goodness of fit parameter and a preset threshold, for example, if the mouth-shaped reliability parameter corresponding to the mouth-shaped action is greater than the reliability threshold and the goodness of fit parameter is greater than the goodness of fit threshold, it is determined that a lyric content corresponding to the mouth-shaped action is reserved, otherwise, correction is performed;
more specifically, the similarity of the lyrics may be calculated in a text distance manner, such that the distance between 2 sentences is smaller than a preset threshold, and the distance may be an euclidean distance, a manhattan distance, or the like.
Step S4, generating a first audio signal according to the voiceprint parameter and the singing content parameter;
specifically, the first audio signal may be understood as singing content sung by a singer in an ideal state, thereby improving the user experience effect of karaoke in a car;
step S5, acquiring a second audio signal corresponding to the current song accompaniment music;
step S6, performing sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to a vehicle-mounted audio playing device so that the vehicle-mounted audio playing device plays the third audio signal.
Specifically, the steps S5 to S6 are conventional karaoke mixing processing, and the method of the present embodiment mainly improves the acquisition aspect of the vocal audio signal of the singer, so as to achieve the purpose that the singer can sing a song even when the singer has little singing voice, forgets words or has a wrong singing, and improve the experience effect of the karaoke user in the vehicle.
In one embodiment, the continuous frame images of the mouth are acquired at the same time as the voiceprint parameters, so that the voiceprint of the singer corresponds to the singing content.
Further, when the singer only acts on the mouth and does not make a sound, the singer only acquires the mouth type record at the moment and cannot acquire the voiceprint parameters of the singer, which indicates that the singer may have too little singing sound or may forget words, and at the moment, the previously identified voiceprint parameters are used as the voiceprint parameters of the current singer to perform subsequent audio signal processing.
Referring to fig. 2, another embodiment of the present invention provides a car karaoke audio processing system, including:
a voiceprint acquisition unit 1, configured to acquire voiceprint parameters of a singer acquired by a vehicle-mounted sound collecting device;
the acoustic parameter acquisition unit 2 is used for acquiring mouth type continuous frame images of singers acquired by the vehicle-mounted camera equipment, and recognizing the mouth type continuous frame images by using a pre-trained deep learning network model to acquire mouth type acoustic parameters;
a singing content obtaining unit 3, configured to obtain a corresponding singing content parameter according to the mouth shape acoustic parameter;
a first audio obtaining unit 4, configured to generate a first audio signal according to the voiceprint parameter and the singing content parameter;
a first audio acquiring unit 5, configured to acquire a second audio signal corresponding to the current song accompaniment music; and
the third audio obtaining unit 6 is configured to obtain a third audio signal after performing audio mixing processing on the first audio signal and the second audio signal, and send the third audio signal to a vehicle-mounted audio playing device, so that the vehicle-mounted audio playing device plays the third audio signal.
In a specific example, the mouth type acoustic parameters comprise a mouth type credibility parameter, a matching degree parameter of the mouth type and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
In a specific example, the acoustic parameter obtaining unit 2 is specifically configured to:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
In a specific example, the continuous frame images of the mouth are acquired at the same time as the voiceprint parameters.
In a specific example, the voiceprint parameters include a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, and a harmonic-to-noise ratio parameter.
The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
It should be noted that the system described in the foregoing embodiment corresponds to the method described in the foregoing embodiment, and therefore, parts of the system described in the foregoing embodiment that are not described in detail may be obtained by referring to the content of the method described in the foregoing embodiment, that is, the specific step content of the method described in the foregoing embodiment may be understood as the functions that can be implemented by the system of this embodiment, and will not be described again here.
In addition, when the car karaoke audio processing system according to the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer readable storage medium.
Another embodiment of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the vehicular karaoke audio processing method according to the above-described embodiment.
Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (10)
1. A vehicle karaoke audio processing method, comprising:
acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment;
acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and identifying the mouth-shaped continuous frame images by using a pre-trained deep learning network model to obtain mouth-shaped acoustic parameters;
obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
generating a first audio signal according to the voiceprint parameters and the singing content parameters;
acquiring a second audio signal corresponding to the current song accompaniment music;
and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
2. The vehicle karaoke audio processing method as claimed in claim 1, wherein the mouth acoustic parameters comprise a mouth confidence parameter, a mouth fit with the lyrics of a current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
3. The vehicular karaoke audio processing method according to claim 2, wherein said obtaining corresponding singing content parameters from said mouth-type acoustic parameters comprises:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
4. The vehicular karaoke audio processing method according to claim 2, wherein said obtaining corresponding singing content parameters from said mouth-type acoustic parameters comprises:
and determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
5. The vehicle karaoke audio processing method as claimed in claim 2, wherein the mouth-shaped continuous frame image is acquired at the same time as the voiceprint parameters.
6. The vehicle karaoke audio processing method as claimed in claim 2, wherein said voiceprint parameters comprise a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, a harmonic noise ratio parameter.
7. A vehicle karaoke audio processing system, comprising:
the voice print acquisition unit is used for acquiring voice print parameters of the singer acquired by the vehicle-mounted voice collecting equipment;
the acoustic parameter acquisition unit is used for acquiring mouth type continuous frame images of the singer, which are acquired by the vehicle-mounted camera equipment, and recognizing the mouth type continuous frame images by using a pre-trained deep learning network model to acquire mouth type acoustic parameters;
the singing content acquisition unit is used for acquiring corresponding singing content parameters according to the mouth type acoustic parameters;
a first audio obtaining unit, configured to generate a first audio signal according to the voiceprint parameter and the singing content parameter;
the first audio acquisition unit is used for acquiring a second audio signal corresponding to the current song accompaniment music; and
and the third audio acquisition unit is used for carrying out audio mixing processing on the first audio signal and the second audio signal to obtain a third audio signal and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
8. The vehicle karaoke audio processing system of claim 7, wherein the mouth acoustic parameters comprise a mouth confidence parameter, an agreement of the mouth with the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
9. The vehicle karaoke audio processing system according to claim 8, wherein said acoustic parameter acquisition unit is specifically configured to:
determining to reserve or correct the lyric content corresponding to each mouth shape action according to the mouth shape credibility parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value;
or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the vehicle karaoke audio processing method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110153880.6A CN114863898A (en) | 2021-02-04 | 2021-02-04 | Vehicle karaoke audio processing method and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110153880.6A CN114863898A (en) | 2021-02-04 | 2021-02-04 | Vehicle karaoke audio processing method and system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114863898A true CN114863898A (en) | 2022-08-05 |
Family
ID=82623104
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110153880.6A Pending CN114863898A (en) | 2021-02-04 | 2021-02-04 | Vehicle karaoke audio processing method and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114863898A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008180794A (en) * | 2007-01-23 | 2008-08-07 | Yamaha Corp | Data reproducing apparatus |
US9853758B1 (en) * | 2016-06-24 | 2017-12-26 | Harman International Industries, Incorporated | Systems and methods for signal mixing |
CN109741723A (en) * | 2018-12-29 | 2019-05-10 | 广州小鹏汽车科技有限公司 | A kind of Karaoke audio optimization method and Caraok device |
CN109949783A (en) * | 2019-01-18 | 2019-06-28 | 苏州思必驰信息科技有限公司 | Song synthetic method and system |
US20200176017A1 (en) * | 2018-12-04 | 2020-06-04 | Samsung Electronics Co., Ltd. | Electronic device for outputting sound and operating method thereof |
-
2021
- 2021-02-04 CN CN202110153880.6A patent/CN114863898A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008180794A (en) * | 2007-01-23 | 2008-08-07 | Yamaha Corp | Data reproducing apparatus |
US9853758B1 (en) * | 2016-06-24 | 2017-12-26 | Harman International Industries, Incorporated | Systems and methods for signal mixing |
US20200176017A1 (en) * | 2018-12-04 | 2020-06-04 | Samsung Electronics Co., Ltd. | Electronic device for outputting sound and operating method thereof |
CN109741723A (en) * | 2018-12-29 | 2019-05-10 | 广州小鹏汽车科技有限公司 | A kind of Karaoke audio optimization method and Caraok device |
CN109949783A (en) * | 2019-01-18 | 2019-06-28 | 苏州思必驰信息科技有限公司 | Song synthetic method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1679694A1 (en) | Improving error prediction in spoken dialog systems | |
CN105304080A (en) | Speech synthesis device and speech synthesis method | |
CN111261151B (en) | Voice processing method and device, electronic equipment and storage medium | |
CN112992109B (en) | Auxiliary singing system, auxiliary singing method and non-transient computer readable recording medium | |
CN106898339B (en) | Song chorusing method and terminal | |
CN109346043B (en) | Music generation method and device based on generation countermeasure network | |
CN112289300B (en) | Audio processing method and device, electronic equipment and computer readable storage medium | |
JP5598516B2 (en) | Voice synthesis system for karaoke and parameter extraction device | |
CN111370024A (en) | Audio adjusting method, device and computer readable storage medium | |
JP6721365B2 (en) | Voice dictionary generation method, voice dictionary generation device, and voice dictionary generation program | |
CN112908302B (en) | Audio processing method, device, equipment and readable storage medium | |
CN105895079A (en) | Voice data processing method and device | |
CN114863898A (en) | Vehicle karaoke audio processing method and system and storage medium | |
JP6406273B2 (en) | Karaoke device and program | |
JP6589521B2 (en) | Singing standard data correction device, karaoke system, program | |
CN110931020B (en) | Voice detection method and device | |
JP6252420B2 (en) | Speech synthesis apparatus and speech synthesis system | |
CN112562668A (en) | Semantic information deviation rectifying method and device | |
CN111429878A (en) | Self-adaptive speech synthesis method and device | |
JP6365483B2 (en) | Karaoke device, karaoke system, and program | |
JP6773840B1 (en) | Karaoke system | |
CN114464151B (en) | Sound repairing method and device | |
CN118245011A (en) | Vehicle-mounted method, system and related equipment based on multi-level music information | |
JP6260499B2 (en) | Speech synthesis system and speech synthesizer | |
CN117636903A (en) | Music score identification method and device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |