CN114863898A - Vehicle karaoke audio processing method and system and storage medium - Google Patents

Vehicle karaoke audio processing method and system and storage medium Download PDF

Info

Publication number
CN114863898A
CN114863898A CN202110153880.6A CN202110153880A CN114863898A CN 114863898 A CN114863898 A CN 114863898A CN 202110153880 A CN202110153880 A CN 202110153880A CN 114863898 A CN114863898 A CN 114863898A
Authority
CN
China
Prior art keywords
mouth
parameter
vehicle
audio signal
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110153880.6A
Other languages
Chinese (zh)
Inventor
李景俊
邓胜
谢鹏鹤
覃小艺
张剑锋
尹苍穹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Automobile Group Co Ltd
Original Assignee
Guangzhou Automobile Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Automobile Group Co Ltd filed Critical Guangzhou Automobile Group Co Ltd
Priority to CN202110153880.6A priority Critical patent/CN114863898A/en
Publication of CN114863898A publication Critical patent/CN114863898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The invention relates to a vehicle karaoke audio processing method and system, and a storage medium, comprising: acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment; acquiring a mouth type continuous frame image of a singer acquired by vehicle-mounted camera equipment, and identifying the mouth type continuous frame image to obtain a mouth type acoustic parameter; obtaining corresponding singing content parameters according to the mouth shape acoustic parameters; generating a first audio signal according to the voiceprint parameters and the singing content parameters; acquiring a second audio signal corresponding to the current song accompaniment music; and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal. The invention can realize the purpose that the singer can sing the song even if the singer forgets words and sings mistakes, thereby improving the user experience effect of karaoke in the vehicle.

Description

Vehicle karaoke audio processing method and system and storage medium
Technical Field
The invention relates to the technical field of audio processing, in particular to a vehicle karaoke audio processing method and system and a computer readable storage medium.
Background
Currently, karaoke in a car mainly mixes a vocal signal and a vocal accompaniment audio signal input by a singer, and then plays the audio signal obtained by mixing the voices. However, in the practical application process, the singer may have the situation of singing with a small voice, forgetting words or singing in a wrong way, and in this situation, the experience effect of the karaoke user in the car is not good.
Disclosure of Invention
The invention aims to provide a vehicle karaoke audio processing method and system and a computer readable storage medium, so that a singer can sing a song even under the condition that the singer has little singing voice, forgets words or sings a mistake, and the user experience effect of karaoke in a vehicle is improved.
The invention provides a vehicle karaoke audio processing method in a first aspect, which comprises the following steps:
acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment;
acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and identifying the mouth-shaped continuous frame images by using a pre-trained deep learning network model to obtain mouth-shaped acoustic parameters;
obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
generating a first audio signal according to the voiceprint parameters and the singing content parameters;
acquiring a second audio signal corresponding to the current song accompaniment music;
and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
Optionally, the mouth acoustic parameters include a mouth reliability parameter, and an agreement parameter between the mouth and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
Optionally, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Optionally, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
and determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
Optionally, the mouth-shaped continuous frame images are acquired at the same time as the voiceprint parameters.
Optionally, the voiceprint parameters include a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, and a harmonic-to-noise ratio parameter.
A second aspect of the present invention provides a car karaoke audio processing system, including:
the voice print acquisition unit is used for acquiring voice print parameters of the singer acquired by the vehicle-mounted voice collecting equipment;
the acoustic parameter acquisition unit is used for acquiring mouth type continuous frame images of singers acquired by the vehicle-mounted camera equipment and identifying the mouth type continuous frame images by utilizing a pre-trained deep learning network model to acquire mouth type acoustic parameters;
the singing content acquisition unit is used for acquiring corresponding singing content parameters according to the mouth shape acoustic parameters;
the first audio acquisition unit is used for generating a first audio signal according to the voiceprint parameter and the singing content parameter;
the first audio acquisition unit is used for acquiring a second audio signal corresponding to the current song accompaniment music; and
and the third audio acquisition unit is used for carrying out audio mixing processing on the first audio signal and the second audio signal to obtain a third audio signal and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
Optionally, the mouth acoustic parameters include a mouth reliability parameter, and an agreement parameter between the mouth and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
Optionally, the acoustic parameter obtaining unit is specifically configured to:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
A third aspect of the present invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the vehicle karaoke audio processing method of the first aspect.
Aspects of the present invention respectively provide a method and a system for processing karaoke audio of a vehicle, and a computer-readable storage medium, which, when implemented, have at least the following advantages:
the method has the advantages that the singing content to be played is obtained through intelligent identification according to the mouth type continuous frame images of the singer, the singing content can be the correction or adjustment of the singing content of the singer, the singing content sung by the singer in an ideal state can be obtained through combination with the unique voiceprint characteristics of the singer, and finally the singing content and accompanying sound mixing are processed, output and played, so that the purpose that the singer can sing a song under the condition that the singer has small singing voice, forgets words or sings mistakes is achieved, and the user experience effect of the karaoke in the vehicle is improved.
Additional features and advantages of the invention will be set forth in the description which follows.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for processing karaoke audio of a vehicle according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a car karaoke audio processing system according to another embodiment of the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.
Referring to fig. 1, an embodiment of the present invention provides a car karaoke audio processing method, including the following steps S1 to S6:
step S1, acquiring vocal print parameters of the singer collected by the vehicle-mounted voice collecting equipment;
specifically, the voiceprint parameter is a parameter characterizing the voice characteristics of the singer, and in a specific example, the voiceprint parameter comprises a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter and a harmonic noise ratio parameter of the voice of the singer;
step S2, acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and recognizing the mouth-shaped continuous frame images by using a pre-trained deep learning network model to acquire mouth-shaped acoustic parameters;
specifically, the mouth type acoustic parameters are used for recording the voice content to be expressed with the mouth type of the singer;
in a specific example, the mouth acoustic parameters include, but are not limited to, mouth reliability parameters, mouth fit with lyrics of a current song; the method comprises the steps that a mouth-shaped action of a multi-frame continuous image corresponds to lyric content, and mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and current song lyrics;
it can be understood that a singer needs a certain time to complete a mouth-type action, during which time the main car camera device will take a plurality of consecutive images, and therefore a mouth-type action corresponds to a plurality of consecutive images, and a mouth-type action actually corresponds to a lyrics content, such as "i", "you", "he";
wherein, the mouth-type reliability parameter indicates whether the mouth-type action is reliable, for example, if the mouth-type action is not obvious, the reliability is relatively low at the moment, and for example, if the mouth-type action is obvious, the reliability is relatively high at the moment; specifically, the mouth shape reliability parameter is represented by a value of 0-100%, and the higher the numerical value is, the higher the reliability is;
the matching degree parameter of the mouth shape and the lyrics of the current song can determine the lyrics of the music playing according to the image frame time stamp corresponding to the mouth shape by using the lyrics corresponding to the mouth shape, and then match 2 lyrics to determine the matching degree parameter of the mouth shape and the lyrics of the current song; specifically, the goodness of fit parameter is represented by a value of 0-100%, and the higher the numerical value is, the higher the goodness of fit is;
it should be noted that the deep learning network model is an intelligent tool that can be used for image frame recognition, and can achieve the recognition purpose through training; only the input layer and the output layer based on the existing deep learning network model need to be adjusted, so that the input layer of the deep learning network model corresponds to the mouth-shaped continuous frame image in the embodiment, the output layer corresponds to the mouth-shaped acoustic parameters in the embodiment, and given training samples, the deep learning network model can be trained by self to achieve the identification purpose required by the embodiment;
step S3, obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
in a specific example, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
according to the mouth shape credibility parameter and the matching degree parameter of the mouth shape and the lyrics of the current song, the lyrics content corresponding to each mouth shape action is reserved or corrected; wherein, the correcting comprises selecting the correct lyrics corresponding to the current song to replace the lyrics content, or adjusting the lyrics content to make the similarity between the lyrics content and the correct lyrics corresponding to the current song larger than a preset threshold value;
specifically, whether lyric content corresponding to the mouth-shaped action is reserved or corrected is determined according to a comparison result of the mouth-shaped reliability parameter, the goodness of fit parameter and a preset threshold, for example, if the mouth-shaped reliability parameter corresponding to the mouth-shaped action is greater than the reliability threshold and the goodness of fit parameter is greater than the goodness of fit threshold, it is determined that one lyric content corresponding to the mouth-shaped action is reserved, otherwise, correction is performed;
more specifically, the similarity of the lyrics may be calculated in a text distance calculation manner, so that the distance between 2 words is smaller than a preset threshold, and the distance may be an euclidean distance, a manhattan distance, or the like;
in another specific example, the obtaining of the corresponding singing content parameter according to the mouth-type acoustic parameter includes:
determining to reserve or correct a sentence of lyric content corresponding to a plurality of mouth shape actions according to the mouth shape credibility parameter, the mouth shape and the matching degree parameter of the current song lyric; wherein, the correcting comprises selecting a correct lyric corresponding to the current song to replace the content of the lyric, or adjusting the content of the lyric to make the similarity between the content of the lyric and the correct lyric corresponding to the current song larger than a preset threshold value;
specifically, whether a lyric content corresponding to the mouth-shaped action is reserved or corrected is determined according to a comparison result of the mouth-shaped reliability parameter, the goodness of fit parameter and a preset threshold, for example, if the mouth-shaped reliability parameter corresponding to the mouth-shaped action is greater than the reliability threshold and the goodness of fit parameter is greater than the goodness of fit threshold, it is determined that a lyric content corresponding to the mouth-shaped action is reserved, otherwise, correction is performed;
more specifically, the similarity of the lyrics may be calculated in a text distance manner, such that the distance between 2 sentences is smaller than a preset threshold, and the distance may be an euclidean distance, a manhattan distance, or the like.
Step S4, generating a first audio signal according to the voiceprint parameter and the singing content parameter;
specifically, the first audio signal may be understood as singing content sung by a singer in an ideal state, thereby improving the user experience effect of karaoke in a car;
step S5, acquiring a second audio signal corresponding to the current song accompaniment music;
step S6, performing sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to a vehicle-mounted audio playing device so that the vehicle-mounted audio playing device plays the third audio signal.
Specifically, the steps S5 to S6 are conventional karaoke mixing processing, and the method of the present embodiment mainly improves the acquisition aspect of the vocal audio signal of the singer, so as to achieve the purpose that the singer can sing a song even when the singer has little singing voice, forgets words or has a wrong singing, and improve the experience effect of the karaoke user in the vehicle.
In one embodiment, the continuous frame images of the mouth are acquired at the same time as the voiceprint parameters, so that the voiceprint of the singer corresponds to the singing content.
Further, when the singer only acts on the mouth and does not make a sound, the singer only acquires the mouth type record at the moment and cannot acquire the voiceprint parameters of the singer, which indicates that the singer may have too little singing sound or may forget words, and at the moment, the previously identified voiceprint parameters are used as the voiceprint parameters of the current singer to perform subsequent audio signal processing.
Referring to fig. 2, another embodiment of the present invention provides a car karaoke audio processing system, including:
a voiceprint acquisition unit 1, configured to acquire voiceprint parameters of a singer acquired by a vehicle-mounted sound collecting device;
the acoustic parameter acquisition unit 2 is used for acquiring mouth type continuous frame images of singers acquired by the vehicle-mounted camera equipment, and recognizing the mouth type continuous frame images by using a pre-trained deep learning network model to acquire mouth type acoustic parameters;
a singing content obtaining unit 3, configured to obtain a corresponding singing content parameter according to the mouth shape acoustic parameter;
a first audio obtaining unit 4, configured to generate a first audio signal according to the voiceprint parameter and the singing content parameter;
a first audio acquiring unit 5, configured to acquire a second audio signal corresponding to the current song accompaniment music; and
the third audio obtaining unit 6 is configured to obtain a third audio signal after performing audio mixing processing on the first audio signal and the second audio signal, and send the third audio signal to a vehicle-mounted audio playing device, so that the vehicle-mounted audio playing device plays the third audio signal.
In a specific example, the mouth type acoustic parameters comprise a mouth type credibility parameter, a matching degree parameter of the mouth type and the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
In a specific example, the acoustic parameter obtaining unit 2 is specifically configured to:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
Or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
In a specific example, the continuous frame images of the mouth are acquired at the same time as the voiceprint parameters.
In a specific example, the voiceprint parameters include a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, and a harmonic-to-noise ratio parameter.
The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
It should be noted that the system described in the foregoing embodiment corresponds to the method described in the foregoing embodiment, and therefore, parts of the system described in the foregoing embodiment that are not described in detail may be obtained by referring to the content of the method described in the foregoing embodiment, that is, the specific step content of the method described in the foregoing embodiment may be understood as the functions that can be implemented by the system of this embodiment, and will not be described again here.
In addition, when the car karaoke audio processing system according to the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer readable storage medium.
Another embodiment of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the vehicular karaoke audio processing method according to the above-described embodiment.
Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A vehicle karaoke audio processing method, comprising:
acquiring vocal print parameters of a singer acquired by vehicle-mounted voice acquisition equipment;
acquiring mouth-shaped continuous frame images of singers acquired by vehicle-mounted camera equipment, and identifying the mouth-shaped continuous frame images by using a pre-trained deep learning network model to obtain mouth-shaped acoustic parameters;
obtaining corresponding singing content parameters according to the mouth shape acoustic parameters;
generating a first audio signal according to the voiceprint parameters and the singing content parameters;
acquiring a second audio signal corresponding to the current song accompaniment music;
and carrying out sound mixing processing on the first audio signal and the second audio signal to obtain a third audio signal, and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
2. The vehicle karaoke audio processing method as claimed in claim 1, wherein the mouth acoustic parameters comprise a mouth confidence parameter, a mouth fit with the lyrics of a current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
3. The vehicular karaoke audio processing method according to claim 2, wherein said obtaining corresponding singing content parameters from said mouth-type acoustic parameters comprises:
and determining to reserve or correct the lyric content corresponding to each mouth type action according to the mouth type credibility parameter, the mouth type and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value.
4. The vehicular karaoke audio processing method according to claim 2, wherein said obtaining corresponding singing content parameters from said mouth-type acoustic parameters comprises:
and determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
5. The vehicle karaoke audio processing method as claimed in claim 2, wherein the mouth-shaped continuous frame image is acquired at the same time as the voiceprint parameters.
6. The vehicle karaoke audio processing method as claimed in claim 2, wherein said voiceprint parameters comprise a fundamental frequency parameter, a formant parameter, a harmonic amplitude parameter, a harmonic noise ratio parameter.
7. A vehicle karaoke audio processing system, comprising:
the voice print acquisition unit is used for acquiring voice print parameters of the singer acquired by the vehicle-mounted voice collecting equipment;
the acoustic parameter acquisition unit is used for acquiring mouth type continuous frame images of the singer, which are acquired by the vehicle-mounted camera equipment, and recognizing the mouth type continuous frame images by using a pre-trained deep learning network model to acquire mouth type acoustic parameters;
the singing content acquisition unit is used for acquiring corresponding singing content parameters according to the mouth type acoustic parameters;
a first audio obtaining unit, configured to generate a first audio signal according to the voiceprint parameter and the singing content parameter;
the first audio acquisition unit is used for acquiring a second audio signal corresponding to the current song accompaniment music; and
and the third audio acquisition unit is used for carrying out audio mixing processing on the first audio signal and the second audio signal to obtain a third audio signal and sending the third audio signal to vehicle-mounted audio playing equipment so that the vehicle-mounted audio playing equipment plays the third audio signal.
8. The vehicle karaoke audio processing system of claim 7, wherein the mouth acoustic parameters comprise a mouth confidence parameter, an agreement of the mouth with the lyrics of the current song;
the mouth-shaped acoustic parameters of each mouth-shaped action comprise a mouth-shaped credibility parameter and an inosculation degree parameter of a mouth shape and the lyrics of the current song.
9. The vehicle karaoke audio processing system according to claim 8, wherein said acoustic parameter acquisition unit is specifically configured to:
determining to reserve or correct the lyric content corresponding to each mouth shape action according to the mouth shape credibility parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting the correct lyrics corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyrics corresponding to the current song to be larger than a preset threshold value;
or, determining to reserve or correct a lyric content corresponding to a plurality of mouth-shaped actions according to the mouth-shaped reliability parameter, the mouth shape and the coincidence degree parameter of the lyrics of the current song, wherein the correction comprises selecting a correct lyric corresponding to the current song to replace the lyric content, or adjusting the lyric content to enable the similarity between the lyric content and the correct lyric corresponding to the current song to be larger than a preset threshold value.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the vehicle karaoke audio processing method according to any one of claims 1 to 6.
CN202110153880.6A 2021-02-04 2021-02-04 Vehicle karaoke audio processing method and system and storage medium Pending CN114863898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110153880.6A CN114863898A (en) 2021-02-04 2021-02-04 Vehicle karaoke audio processing method and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110153880.6A CN114863898A (en) 2021-02-04 2021-02-04 Vehicle karaoke audio processing method and system and storage medium

Publications (1)

Publication Number Publication Date
CN114863898A true CN114863898A (en) 2022-08-05

Family

ID=82623104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110153880.6A Pending CN114863898A (en) 2021-02-04 2021-02-04 Vehicle karaoke audio processing method and system and storage medium

Country Status (1)

Country Link
CN (1) CN114863898A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008180794A (en) * 2007-01-23 2008-08-07 Yamaha Corp Data reproducing apparatus
US9853758B1 (en) * 2016-06-24 2017-12-26 Harman International Industries, Incorporated Systems and methods for signal mixing
CN109741723A (en) * 2018-12-29 2019-05-10 广州小鹏汽车科技有限公司 A kind of Karaoke audio optimization method and Caraok device
CN109949783A (en) * 2019-01-18 2019-06-28 苏州思必驰信息科技有限公司 Song synthetic method and system
US20200176017A1 (en) * 2018-12-04 2020-06-04 Samsung Electronics Co., Ltd. Electronic device for outputting sound and operating method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008180794A (en) * 2007-01-23 2008-08-07 Yamaha Corp Data reproducing apparatus
US9853758B1 (en) * 2016-06-24 2017-12-26 Harman International Industries, Incorporated Systems and methods for signal mixing
US20200176017A1 (en) * 2018-12-04 2020-06-04 Samsung Electronics Co., Ltd. Electronic device for outputting sound and operating method thereof
CN109741723A (en) * 2018-12-29 2019-05-10 广州小鹏汽车科技有限公司 A kind of Karaoke audio optimization method and Caraok device
CN109949783A (en) * 2019-01-18 2019-06-28 苏州思必驰信息科技有限公司 Song synthetic method and system

Similar Documents

Publication Publication Date Title
EP1679694A1 (en) Improving error prediction in spoken dialog systems
CN105304080A (en) Speech synthesis device and speech synthesis method
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN112992109B (en) Auxiliary singing system, auxiliary singing method and non-transient computer readable recording medium
CN106898339B (en) Song chorusing method and terminal
CN109346043B (en) Music generation method and device based on generation countermeasure network
CN112289300B (en) Audio processing method and device, electronic equipment and computer readable storage medium
JP5598516B2 (en) Voice synthesis system for karaoke and parameter extraction device
CN111370024A (en) Audio adjusting method, device and computer readable storage medium
JP6721365B2 (en) Voice dictionary generation method, voice dictionary generation device, and voice dictionary generation program
CN112908302B (en) Audio processing method, device, equipment and readable storage medium
CN105895079A (en) Voice data processing method and device
CN114863898A (en) Vehicle karaoke audio processing method and system and storage medium
JP6406273B2 (en) Karaoke device and program
JP6589521B2 (en) Singing standard data correction device, karaoke system, program
CN110931020B (en) Voice detection method and device
JP6252420B2 (en) Speech synthesis apparatus and speech synthesis system
CN112562668A (en) Semantic information deviation rectifying method and device
CN111429878A (en) Self-adaptive speech synthesis method and device
JP6365483B2 (en) Karaoke device, karaoke system, and program
JP6773840B1 (en) Karaoke system
CN114464151B (en) Sound repairing method and device
CN118245011A (en) Vehicle-mounted method, system and related equipment based on multi-level music information
JP6260499B2 (en) Speech synthesis system and speech synthesizer
CN117636903A (en) Music score identification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination