CN112017636B - User pronunciation simulation method, system, equipment and storage medium based on vehicle - Google Patents

User pronunciation simulation method, system, equipment and storage medium based on vehicle Download PDF

Info

Publication number
CN112017636B
CN112017636B CN202010881113.2A CN202010881113A CN112017636B CN 112017636 B CN112017636 B CN 112017636B CN 202010881113 A CN202010881113 A CN 202010881113A CN 112017636 B CN112017636 B CN 112017636B
Authority
CN
China
Prior art keywords
playing
vehicle
play
target
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010881113.2A
Other languages
Chinese (zh)
Other versions
CN112017636A (en
Inventor
张文瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Original Assignee
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Volkswagen Mobvoi Beijing Information Technology Co Ltd filed Critical Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority to CN202010881113.2A priority Critical patent/CN112017636B/en
Publication of CN112017636A publication Critical patent/CN112017636A/en
Application granted granted Critical
Publication of CN112017636B publication Critical patent/CN112017636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the invention discloses a user pronunciation simulation method, system, equipment and storage medium based on a vehicle. The method comprises the following steps: acquiring target playing data from a pre-stored voice data set; according to the vehicle environmental sound and/or semantic information of target playing data, determining playing control parameters matched with playing equipment in the vehicle, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth; and controlling the playing device to play the target playing data according to the playing control parameters. According to the method, target play data can be played through the play equipment, a mode of voice acquisition by a speaker in the prior art is replaced, the cost of pronunciation simulation can be reduced, the implementation mode is simple, and the pronunciation simulation is efficient; the playing control parameters of the playing equipment are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing equipment can achieve the same quality effect as that of voice collection carried out by a speaker when the speaker gets on the vehicle.

Description

User pronunciation simulation method, system, equipment and storage medium based on vehicle
Technical Field
The embodiment of the invention relates to the technical field of intelligent automobiles, in particular to a user pronunciation simulation method, system, equipment and storage medium based on a vehicle.
Background
With the development of intelligent vehicle applications, more and more automobile built-in voice recognition technologies are used for collecting user voices and recognizing voice commands, and corresponding functions are realized according to the voice commands.
In the prior art, experiments of different real environments are usually carried out by collecting voices of a plurality of speakers in advance, so that voice recognition is accurate. In order to ensure the actual driving environment of the experiment, a speaker needs to get on the vehicle to collect voice, and meanwhile, the vehicle needs to reach a specified speed, and windows and an air conditioner need to be adjusted to a specified state.
However, the prior art method occupies the time of the speaker including the waiting time for the vehicle to reach the specified state and the voice collection time, and the time of occupying the speaker is long, and the cost of paying the speaker is high. Meanwhile, in some driving states, the voice collected by the getting-on vehicle may cause discomfort to the speaker, for example, when the vehicle jolts or the temperature in the vehicle is low, the speaker is not willing to participate in the voice collection, the voice collection is difficult, and the number of samples for voice collection is affected. In addition, the pronunciation of different motorcycle types can not duplicate, need pronunciation people to get on the bus for collection many times, and pronunciation collection inefficiency.
Disclosure of Invention
The embodiment of the invention provides a user pronunciation simulation method, system, equipment and storage medium based on a vehicle, which can reduce pronunciation simulation cost, improve pronunciation simulation efficiency and ensure pronunciation simulation quality.
In a first aspect, an embodiment of the present invention provides a vehicle-based user pronunciation simulation method, including:
acquiring target playing data from a pre-stored voice data set;
according to the vehicle environmental sound and/or the semantic information of the target playing data, determining a playing control parameter matched with playing equipment in the vehicle, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth;
and controlling the playing equipment to play the target playing data according to the playing control parameters.
In a second aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation apparatus, where the apparatus includes:
the target playing data acquisition module is used for acquiring target playing data from a pre-stored voice data set;
the playing control parameter determining module is used for determining playing control parameters matched with playing equipment in the vehicle according to the environmental sound of the vehicle and/or the semantic information of the target playing data, and the position of the playing equipment in the vehicle is matched with the position of a real human mouth;
And the play control module is used for controlling the play equipment to play the target play data according to the play control parameters.
In a third aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation system, including: the device comprises a processor, playing equipment, a sound collecting assembly, a sound card, a digital signal processor and an audio detection module;
the playing device, the sound collecting assembly, the sound card, the digital signal processor and the audio detection module are all electrically connected with the processor; the playing device, the sound collecting assembly, the sound card, the digital signal processor and the audio detection module are electrically connected in sequence; the position of the playing device in the vehicle is matched with the position of the real human mouth;
the processor is used for acquiring target playing data from a pre-stored voice data set; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of the target play data; transmitting the play control parameters and the target play data to the play equipment;
the playing device is used for playing the target playing data according to the received playing control parameters;
The sound collection component is used for collecting the audio frequency of the target playing data played by the playing device, obtaining an audio signal corresponding to the target playing data and transmitting the audio signal to the sound card;
the sound card is used for converting the received audio signal into a digital signal and transmitting the digital signal to the digital signal processor;
the digital signal processor is used for carrying out noise reduction processing and/or echo cancellation processing on the received digital signals to obtain digital signals to be detected, and transmitting the digital signals to be detected to the audio detection module;
the audio detection module is used for displaying the received digital signal to be detected so as to determine whether the user pronunciation simulation system based on the vehicle is normal or not according to the displayed digital signal to be detected.
In a fourth aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation method, where the method includes:
acquiring target playing data from a pre-stored voice data set through a processor; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of the target play data; transmitting the play control parameters and the target play data to the play equipment;
Playing the target playing data through the playing equipment according to the received playing control parameters;
the target playing data played by the playing device are subjected to audio acquisition through a sound acquisition component, so that an audio signal corresponding to the target playing data is obtained and transmitted to a sound card;
converting the received audio signal into a digital signal through the sound card, and transmitting the digital signal to the digital signal processor;
carrying out noise reduction processing and/or echo cancellation processing on the received digital signals through the digital signal processor to obtain digital signals to be detected, and transmitting the digital signals to be detected to an audio detection module;
and displaying the received digital signal to be detected through the audio detection module so as to determine whether a user pronunciation simulation system based on the vehicle is normal or not according to the displayed digital signal to be detected.
In a fifth aspect, embodiments of the present invention also provide a computer apparatus, the apparatus comprising:
one or more processors;
the playing device is used for playing the set playing data according to the set playing control parameters;
storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement a vehicle-based user pronunciation simulation method as described in any embodiment of the present invention.
In a sixth aspect, embodiments of the present invention further provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements a vehicle-based user pronunciation simulation method according to any embodiment of the present invention.
According to the technical scheme, target playing data are obtained from a pre-stored voice data set; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of target play data; the playing device is controlled to play target playing data according to the playing control parameters, so that the problems that in the prior art, when a speaker gets on a vehicle to collect voice, the cost is high, the efficiency is low, and the voice collection is difficult due to the influence of collection conditions are solved; the playing control parameters of the playing equipment are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing equipment can achieve the effect of the same quality as that of voice collection carried out by a speaker when the speaker gets on the vehicle.
Drawings
FIG. 1 is a flow chart of a vehicle-based user pronunciation simulation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a vehicle-based user pronunciation simulation method according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a vehicle-based user pronunciation simulation method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a vehicle-based user pronunciation simulation device according to a fourth embodiment of the present invention;
FIG. 5 is a schematic diagram of a vehicle-based user pronunciation simulation system according to a fifth embodiment of the present invention;
FIG. 6 is a flowchart of a vehicle-based user pronunciation simulation method according to a sixth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device according to a seventh embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a vehicle-based user pronunciation simulation method according to an embodiment of the present invention, where the method may be applied to playing and collecting voices in a vehicle to detect whether a voice collecting device of the vehicle is normal, and the method may be performed by a vehicle-based user pronunciation simulation device, where the device may be implemented by software and/or hardware, and the device may be integrated in a controller of the vehicle, as shown in fig. 1, and the method specifically includes:
step 110, obtaining target playing data from a pre-stored voice data set.
In order to ensure high quality of each audio in the voice data set and avoid distortion of the audio, in the embodiment of the invention, the voice data set is a set formed by high-fidelity voice data pre-recorded by at least one speaker in a recording studio.
The speaker is a person for broadcasting the specified voice data to form audio. The recording studio can ensure that the speaker does not have noise when playing the newspaper to form the audio, and can ensure the high quality of the audio. High-fidelity voice data refers to voice data generated by a device or carrier capable of perfectly reproducing original sound. When the voice data set is formed by collecting the voice frequency of the speaker, the sampling frequency of 48 kilohertz can be adopted, and the voice data can be ensured to be distorted near 0.
The voice data may be voice data corresponding to a functional instruction occurring in the vehicle, for example, "turn on air conditioner", "query weather", or "turn on navigation to a place", etc.; alternatively, the voice data may be voice data corresponding to a non-functional instruction that occurs in the vehicle, for example, "who is your parent? "," where you are? "or" do you have an object? "etc. Accurate collection of voice data is convenient for accurately realizing functional instructions of users or can avoid unexpected safety conditions of users in vehicles. The voice data set constituted by the voice data may be stored in advance in the memory of the vehicle. The same or different sets of speech data may be stored for different models of vehicles. The vehicles with various models share the same voice data set, so that the multiple times of recording of voice data can be reduced, the cost of recording the voice data by a speaker can be reduced, and the efficiency of recording the voice data can be improved.
The target playing data is any one or more items of voice data in the voice data set, namely, voice data pre-recorded by a speaker can be used as target playing data. The target playing data may be obtained by randomly selecting the voice data in the voice data set as the target playing data; alternatively, the voice data in the voice data set may be selected as the target play data according to a preset sequence. The preset sequence may be a storage sequence of voice data, or a sequence designated by a manager, etc., and the embodiment of the present invention is not particularly limited.
Step 120, determining a play control parameter matched with a play device in the vehicle according to the vehicle environmental sound and/or semantic information of the target play data.
Wherein the position of the playing device in the vehicle matches the position of the real human mouth. The playing device is an audio playing device, which simulates a human mouth to play voice data, such as a manual mouth, a hi-fi sound or a loudspeaker, etc. The real mouth refers to the mouth when the speaker gets on the bus to play the voice.
In the embodiment of the invention, the optional playing device is a manual mouth, and the manual mouth has better playing effect than high-fidelity sound, which can be understood as a special loudspeaker, and can make sound more similar to that of a real person. In order to ensure that the playing device is close to the effect of a real person, the position of the playing device in the vehicle is matched with the position of the real person mouth, and the playing device can be arranged at the position close to the position of the real person mouth. For example, the position of the mouth when the speaker gets on the vehicle for voice broadcasting can be determined, and the placement position of the playing device is kept consistent with the position of the mouth.
The playing device may use different playing control parameters, such as playing volume, playing sound source direction, playing speed, etc. when playing the voice data. In order to ensure that the playing device has the same effect as the speaker, the playing speed may be the speed of the voice data itself in the embodiment, and no speed change processing is performed. The playing volume and the playing sound source direction can be determined according to semantic information of the vehicle environment sound and the target playing data respectively.
The vehicle environment sound may be a noise volume of the vehicle interior, and may be, for example, a noise decibel value of the vehicle interior. The vehicle environmental sound may be obtained by a controller or processor of the vehicle receiving the volume of noise collected by the microphone. The controller or processor of the vehicle may determine the playback control parameters of the playback device based on the vehicle ambient sound, for example, the playback volume of the playback device may be determined based on the vehicle ambient sound. The more the vehicle environment sound is, the larger the play volume of the determined play equipment is; the smaller the vehicle environment sound, the smaller the play volume of the determined play device.
Further, in order to precisely control the playing volume of the playing device, a mapping table between the vehicle environment sound and the playing volume of the playing device may be stored in the memory of the vehicle in advance. Illustratively, table 1 is a mapping table between vehicle ambient sound and playback volume of a playback device. From table 1, it can be seen that the play volume of different play devices can be determined for different vehicle environment sounds. The mapping table shown in table 1 may be determined according to broadcasting habits of the speaker in different vehicle environment sounds in an actual environment. The playing volume of the playing device can be ensured to be consistent with the broadcasting volume of the speaker under the same condition, the sound emitted by the playing device can be ensured to be close to the sound of a real person, and the distortion of voice acquisition is avoided.
TABLE 1
The semantic information of the target play data may include content classification of the target play data, for example, "turn on air conditioner" or "turn up temperature" etc. belonging to the air conditioner class; "inquiring weather" or "dressing index" etc. belong to the weather class; and "navigate to B menu" belongs to the navigation class, etc. The target play data for different content categories may have different play sound source directions when played by a real person. In order to ensure that the playing device is close to the effect of a real person, the playing sound source direction of the playing device can be determined according to semantic information of target playing data.
Illustratively, table 2 is a mapping table of semantic information and direction of a play sound source. The semantic information of the voice data in the pre-stored voice data set can be determined manually, that is, the voice data set can include at least one piece of voice data and the semantic information corresponding to each piece of voice data. The mapping table of semantic information and the direction of the playing sound source may be stored in the memory of the vehicle in advance. The processor or the controller of the vehicle can determine the playing sound source direction when the playing device plays the target playing data according to the mapping relation between the semantic information and the playing sound source direction in the table 2, and the playing device can be ensured to be close to the effect of a real person.
It should be noted that, in table 2, the same semantic information may correspond to a plurality of play sound source directions, and for a plurality of pieces of voice data belonging to the same semantic information, the play sound source directions may be determined according to the proportions of the corresponding play sound source directions. For example, the air conditioner has 1000 pieces of voice data, and can play any 500 pieces of voice data in the right-ahead downward direction according to the proportion, and the rest 500 pieces of voice data are determined to play right-ahead. The specific division manner and specific playing sequence of the voice data are not limited in the embodiment of the invention.
TABLE 2
And 130, controlling the playing device to play the target playing data according to the playing control parameters.
The controller or the processor of the vehicle may transmit the playing control parameters to the playing device, and control the playing device to play the target playing data according to the playing control parameters, for example, play the target playing data according to the determined playing volume, the playing sound source direction and the playing speed. Wherein, the control of the playing volume of the playing device can be realized according to the current volume adjustment technology; the adjustment of the direction of the playing sound source can be realized by the brain rotation technology of the robot.
According to the technical scheme, target playing data are obtained from a pre-stored voice data set; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of target play data; the playing device is controlled to play target playing data according to the playing control parameters, so that the problems that in the prior art, when a speaker gets on a vehicle to collect voice, the cost is high, the efficiency is low, and the voice collection is difficult due to the influence of collection conditions are solved; the playing control parameters of the playing equipment are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing equipment can achieve the effect of the same quality as that of voice collection carried out by a speaker when the speaker gets on the vehicle.
Example two
Fig. 2 is a flowchart of a user pronunciation simulation method based on a vehicle according to a second embodiment of the present invention. The present embodiment is a further refinement of the foregoing technical solution, and the technical solution in this embodiment may be combined with each alternative solution in one or more embodiments described above. As shown in fig. 2, the method includes:
step 210, obtaining target playing data from a pre-stored voice data set.
The voice data set is a set formed by high-fidelity voice data pre-recorded by at least one speaker in a recording studio.
Step 220, obtaining a noise decibel value of the environmental sound of the vehicle, and determining the playing volume matched with the playing equipment in the vehicle according to the noise decibel value.
In this case, for the accuracy of the noise decibel value acquisition of the vehicle environmental sound, a sound pressure detection device such as a sound pressure meter may be installed at the in-vehicle microphone. And acquiring a noise decibel value of the environmental sound of the vehicle through a sound pressure meter arranged at the microphone, and transmitting the noise decibel value to a processor or a controller of the vehicle. The sound pressure meter at the microphone is used for detecting the volume of sound collected at the microphone. When no voice data is played, the sound pressure meter at the microphone collects noise decibel values of the environmental sounds of the vehicle. The noise decibel value of the vehicle environment sound can be obtained before the voice data is played.
The processor or controller of the vehicle may transmit the received noise decibel value of the vehicle ambient sound to the playback device, which is controlled to automatically adjust to the matched playback volume. The mapping relationship between the noise decibel value of the vehicle environment sound and the playing volume of the playing device may be preset and stored in the memory of the vehicle. The processor or the controller of the vehicle can directly call the mapping relation from the memory, and the playing volume of the corresponding playing device is determined according to the noise decibel value of the vehicle environment sound so as to control the playing device to play according to the playing volume. Specifically, the mapping relationship between the noise decibel value of the vehicle environment sound and the playing volume of the playing device may be as shown in table 1.
To ensure that the actual playing volume of the playing device matches the noise decibel value of the environmental sound of the vehicle, in an alternative implementation of this embodiment, determining the playing volume matching the playing device in the vehicle according to the noise decibel value includes: adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value; controlling the playing equipment to play the set audio according to the playing volume, and acquiring a playing decibel value matched with the set audio; if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value; if the play decibel value is not matched with the noise decibel value, the play volume of the play device is readjusted, and then the operation of controlling the play device to play the set audio according to the play volume is performed.
The processor or the controller of the vehicle can determine the noise decibel value according to the sound pressure meter arranged at the microphone, determine the playing volume of the matched playing device according to the mapping relation between the noise decibel value and the playing volume, and adjust the playing volume of the playing device. Wherein, the matching of the playing volume and the noise decibel value means that the two have a mapping relation. The playback device may then be controlled to play the set audio at the matched playback volume. The set audio may be preset audio for testing, or may be any voice data in the voice data set, which is not specifically limited in the embodiment of the present invention.
Sound pressure means, such as a sound pressure meter, may be mounted at the playback device. The sound pressure meter at the playing device can detect the playing decibel value when the playing device plays the set audio. If the playing decibel value of the playing device is matched with the noise decibel value, the current playing volume of the playing device can reach the expected requirement, and the playing device can continue to take the playing volume as the playing volume under the current environment condition of the vehicle without adjustment. If the play decibel value of the play device is not matched with the noise decibel value, the matched play decibel value can be determined according to the noise decibel value. The fact that the play decibel value is matched with the noise decibel value means that the current play decibel value of the play equipment when audio play is set according to the play volume meets the requirement of the current noise decibel value on the play decibel value. And determining whether the play decibel value of the current play device is larger or smaller according to the matched play decibel value. And adjusting the playing volume of the playing device according to the determined bigger or smaller result. For example, when the determination result is that the size is larger, the playing volume of the playing device may be reduced; or when the determination result is smaller, the playing volume of the playing device may be increased. After the playing equipment determines the playing volume again, the set audio can be continuously played, and the playing decibel value is determined until the playing decibel value is matched with the noise decibel value when the playing equipment adopts the playing volume to play the set audio.
The mapping relationship between the playing decibel value of the playing device playing set audio and the noise decibel value can be obtained in advance through experimental means. Table 3 is a mapping table of play db values and noise db values. As shown in table 3, when the sound pressure meter at the microphone detects the noise decibel value, the play volume of the play device may be predetermined according to the map table of table 3. The sound pressure meter at the playing device detects the playing decibel value when the playing device plays the set audio according to the preset playing volume, determines whether the playing device is matched with the noise decibel value, and further determines whether the playing volume of the playing device is adjusted.
In fact, when the playback device plays back the set audio at the predetermined playback volume, a playback decibel value that matches the noise decibel value can be achieved. However, in some cases, the playing device may have a certain fault, so that when the playing device plays the set audio through the predetermined playing volume, the playing decibel value matched with the noise decibel value cannot be reached, and the playing volume of the playing device needs to be adjusted, for example, the playing volume has deviation due to long-term use of the playing device.
TABLE 3 Table 3
Illustratively, as shown in Table 3, in a first scenario, when the vehicle is in a parked state, the sound pressure meter at the microphone detects that the noise decibel value of the vehicle ambient sound is within the interval 53-58 decibels. At this time, the decibel value of the normal speaking of the real person is between 72-77 decibel, and the microphone can collect clearer audio. Therefore, when the noise decibel value of the vehicle environment sound in the first scene is within the interval 53-58 decibels, the play decibel value of the matched play device is 72-77 decibels. According to the play decibel value of 72-77 decibels, the play volume of the play device can be predetermined to be 3 rd gear. The playback device can be controlled to play the set audio according to the playback volume of 3 rd gear. The sound pressure meter at the playback device may detect whether the current playback decibel value of the playback device is within the interval 72-77 decibels.
If the current playing decibel value of the playing device is within the interval 72-77 decibels, determining that the current playing volume of the playing device is matched with the noise decibel value. In the first scenario, the playback device may play the voice data in the voice data set according to the playback volume of the 3 rd gear. If the current playing decibel value of the playing device is not within the interval 72-77 decibels, it can be determined that the current playing decibel value of the playing device is greater than 77 decibels or less than 72 decibels.
If the current playing decibel value of the playing device is greater than 77 decibels, the current playing volume of the playing device is reduced, for example, the 2 nd gear is adopted for playing. If the current playing decibel value of the playing device is smaller than 72 decibels, the current playing volume of the playing device is increased, for example, 4 th grade is adopted for playing. The play decibel value can be determined by a sound pressure meter at the play device according to the adjusted play volume. And if the play decibel value is within the interval 72-77 decibels, determining that the adjusted play volume is matched with the noise decibel value. If the playing decibel value is not in the interval 72-77 decibel, the playing volume of the playing device can be continuously adjusted until the playing decibel value detected by the sound pressure meter at the playing device is in the interval 72-77 decibel.
Step 230, controlling the playing device to play the target playing data according to the playing volume.
The controller or the processor of the vehicle can control the playing equipment to play the target playing data according to the playing volume matched with the noise decibel value of the vehicle environment sound, the effect same as that of real person broadcasting can be achieved, the real person can be prevented from getting on the vehicle to play voice and collect voice, the cost is reduced, and the efficiency is improved.
According to the technical scheme, the noise decibel value of the environmental sound of the vehicle is determined through the sound pressure device, and the playing volume matched with playing equipment in the vehicle is determined according to the noise decibel value; the playing equipment is controlled to play the target playing data according to the matched playing volume, the problems that in the prior art, when a speaker gets on a vehicle to collect voice, the cost is high, the efficiency is low, and the voice collection is difficult to collect due to the influence of collection conditions are solved, the accurate playing volume is determined for the playing equipment, the effect identical to that of the speaker getting on the vehicle to collect voice is achieved by playing the voice data according to the playing volume through the playing equipment, meanwhile, the sounding simulation cost is reduced, and the sounding simulation efficiency is improved are also achieved.
Example III
Fig. 3 is a flowchart of a user pronunciation simulation method based on a vehicle according to a third embodiment of the present invention. The present embodiment is a further refinement of the foregoing technical solution, and the technical solution in this embodiment may be combined with each alternative solution in one or more embodiments described above. As shown in fig. 3, the method includes:
step 310, obtaining target playing data from a pre-stored voice data set.
The voice data set is a set formed by high-fidelity voice data pre-recorded by at least one speaker in a recording studio.
Step 320, obtaining semantic information of the target playing data, and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information.
The semantic information may be acquired in various ways, for example, may be manually set in advance; alternatively, it may be implemented by semantic analysis techniques. A chip may be provided in the processor or controller of the vehicle, or in the playback device, in which a program related to semantic analysis may be provided. The voice information of the voice data in the voice data set, for example, the content classification of the voice data can be determined in real time by the semantic analysis program. When the playing device acquires the target playing data, the semantic information of the target playing data can be synchronously determined according to the semantic analysis technology.
The matching relationship between the semantic information and the direction of the playing sound source can be implemented in various ways, for example, can be manually predetermined, or can be determined in real time by a processor or a controller of the vehicle according to the semantic information.
For example, when the semantic information of the target play data is an air conditioner class, the processor or the controller of the vehicle may determine in real time that the play sound source direction of the play device is toward the front or is biased downward toward the front according to the air conditioner class. For a plurality of target playing data with semantic information of air conditioning, a processor or a controller of the vehicle can control the playing sound source direction of the playing device to switch back and forth between the right front direction or the right front direction and the lower direction.
In order that the playing sound source direction determined according to the semantic information in the user pronunciation simulation can represent the real situation of the user, in an optional implementation manner of this embodiment, the semantic information includes at least one of the following: air conditioning, weather, smart home or navigation; determining a playback sound source direction matching a playback device in a vehicle from the semantic information, comprising: if the semantic information is air conditioning, determining that the direction of a playing sound source matched with playing equipment in the vehicle is right front downward or right front; if the semantic information is a linking class, determining that the playing sound source direction matched with the playing equipment in the vehicle is the control direction of the vehicle terminal or the right front; if the semantic information is weather, determining that the playing sound source direction matched with the playing equipment in the vehicle is out of a window, right in front of the window or the control direction of a vehicle terminal; if the semantic information is smart home, determining that the playing sound source direction matched with the playing equipment in the vehicle is the front direction or the vehicle terminal control direction; and if the semantic information is a navigation type, determining that the direction of a playing sound source matched with the playing equipment in the vehicle is the right front direction or the control direction of the vehicle terminal.
Wherein, for the same semantic information, there may be a plurality of corresponding pieces of voice data, and the playing sound source directions of the voice data may be the same or different. For the same semantic information, there may be one or more play sound source directions. In order to determine the specific playing sound source direction of the voice data, a proportion can be set for the playing sound source direction, and the real playing condition can be reflected. Specifically, the mapping relationship between the semantic information and the direction of the playing sound source may be as shown in table 2. When the semantic information of the target playing data is determined, the playing sound source direction of the target playing data can be determined according to the playing sound source direction corresponding to the semantic information and the corresponding proportion.
The semantic information may represent meaning classification of voice data, for example, air conditioning refers to that voice data is data related to air conditioning in a vehicle, such as adjusting the temperature of the vehicle, turning on or off the air conditioning, etc.; the tening class refers to data which is generated when voice data is chatting and does not have a function of starting a vehicle, such as discussion about family or working conditions, and the like; weather classes refer to speech data being weather-related data, such as querying weather; the intelligent home class refers to voice data which is data of other home devices connected with a vehicle through the vehicle, such as turning on an air conditioner in a home or turning on a washing machine in the home; navigation class refers to that voice data is data related to navigation, such as navigating to a place or turning on a certain navigation software, etc.
The semantic information of the target playing data is weather, and the processor or the controller of the vehicle can control the playing device to switch between the three directions, namely the direction outside the window, the direction right in front of the window and the direction controlled by the vehicle terminal, wherein the switching ratio is 5:3:2. For example, the memory of the vehicle may record the playback sound source direction of target playback data whose semantic information is weather-like. The target play data of 10 weather classes can be grouped. If the play sound source direction of the target play data for which there are no more than 4 weather types within the same group is out of the window, the play sound source direction of the target play data within the group may still be out of the window. If there are already 5 pieces of target play data in the same group whose play sound source direction is out of the window, the play sound source direction of the target play data may be the right front or the vehicle terminal control. If the playing sound source direction of the target playing data of which the number of weather types is not more than 2 is in the same group, the playing sound source direction of the target playing data in the group can still be in the right front. If there are already 3 pieces of target play data whose play sound source directions are right ahead within the same group, the play sound source direction of the target play data may be controlled for the vehicle terminal.
Step 330, the playing device is controlled to play the target playing data according to the playing sound source direction.
The processor or the controller of the vehicle can control the playing device to play the target playing data according to the playing sound source direction matched with the semantic information of the target playing data. The direction of the playing sound source of the playing device can be kept consistent with the direction of real broadcasting time.
According to the technical scheme, semantic information of target playing data is obtained, and the playing sound source direction matched with playing equipment in a vehicle is determined according to the semantic information; the playing device is controlled to play target playing data according to the playing sound source direction, the problems that in the prior art, when a speaker gets on a vehicle to collect voice, the cost is high, the efficiency is low, and the voice collection is affected by collection conditions to be difficult are solved, the target playing data is played according to the playing sound source direction matched with semantic information of the target playing data by the playing device, the effect consistent with the playing direction of the speaker can be achieved, meanwhile, the sounding simulation cost can be reduced, the implementation mode is simple, and the sounding simulation is efficient.
On the basis of the foregoing embodiment, optionally, before controlling the playing device to play the target playing data according to the playing control parameter, the method further includes: detecting the jolt state of the vehicle, and determining a play frequency matched with play equipment in the vehicle according to the jolt state; controlling the playing device to play the target playing data according to the playing control parameters, including: and controlling the playing equipment to play the target playing data according to the playing frequency.
The "bump state" of the vehicle is a bump state of the vehicle during running, for example, whether or not the vehicle is bumped or the degree of bump. The determination of the jolt state may be various, for example, detection by a detecting instrument or experimental test. Wherein, the detecting instrument can be an automobile bump tester. The experimental test method can be that the vehicle transports the product to be tested, the product to be tested can be arranged on the tray, and the bumpy state of the vehicle is determined by carrying out real-time state monitoring on the product to be tested.
There may also be a mapping relationship between the jolt state of the vehicle and the play frequency of the play device, for example, the higher the play frequency the more jolt the vehicle. By way of example, the bump status of a vehicle may be classified into different bump classes, such as a slight bump, with a play frequency of 20 hz to 200 hz; moderately bumpy, wherein the playing frequency is 200 Hz to 2000 Hz; and the playing frequency is 2000-20000 Hz. Still further exemplary, the vibration frequency of the vehicle may be measured by an instrument as a bump state of the vehicle, and the play frequency may be determined according to the vibration frequency, e.g., the bump state of the vehicle is a vibration frequency of less than 1 hz, and the play frequency is 20 hz to 200 hz; the vibration frequency is 1 Hz to 10 Hz, and the play frequency is 200 Hz to 2000 Hz; the vibration frequency is more than 10 Hz, and the playing frequency is 2000 Hz to 20000 Hz.
The current playing frequency of the playing device in the current bumpy state can be determined according to the mapping relation between the bumpy state and the playing frequency; the playing device can play the target playing data under the current playing frequency, so that the playing of the playing device can be ensured to be clear under the bumpy state of the vehicle, and the effect of real broadcasting is achieved.
Example IV
Fig. 4 is a schematic structural diagram of a vehicle-based user pronunciation simulation device according to a fourth embodiment of the present invention. Referring to fig. 4, the apparatus includes: a target play data acquisition module 410, a play control parameter determination module 420 and a play control module 430.
The target play data obtaining module 410 is configured to obtain target play data from a pre-stored voice data set;
the play control parameter determining module 420 is configured to determine, according to the environmental sound of the vehicle and/or semantic information of the target play data, a play control parameter that matches with a play device in the vehicle, where the position of the play device in the vehicle matches with the position of the real human mouth;
and the play control module 430 is used for controlling the play device to play the target play data according to the play control parameters.
Optionally, the play control parameter determining module 420 includes:
And the play volume determining unit is used for acquiring the noise decibel value of the environmental sound of the vehicle and determining the play volume matched with the play equipment in the vehicle according to the noise decibel value.
Optionally, the play control module 430 includes:
and the play volume control unit is used for controlling the play equipment to play the target play data according to the play volume.
Optionally, the play volume determining unit is specifically configured to:
adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value;
controlling the playing equipment to play the set audio according to the playing volume, and acquiring a playing decibel value matched with the set audio;
if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value;
if the play decibel value is not matched with the noise decibel value, the play volume of the play device is readjusted, and then the operation of controlling the play device to play the set audio according to the play volume is performed.
Optionally, the play control parameter determining module 420 includes:
the playing sound source direction determining unit is used for acquiring semantic information of the target playing data and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information.
Optionally, the play control module 430 includes:
and the playing sound source direction control unit is used for controlling the playing device to play the target playing data according to the playing sound source direction.
Optionally, the semantic information includes at least one of: air conditioning, weather, smart home or navigation;
the play sound source direction determining unit is specifically configured to:
if the semantic information is air-conditioning, determining that the direction of a playing sound source matched with playing equipment in the vehicle is right front downward or right front;
if the semantic information is a resolution class, determining that a playing sound source direction matched with playing equipment in the vehicle is a vehicle terminal control direction or a right front direction;
if the semantic information is weather, determining that the direction of a playing sound source matched with playing equipment in the vehicle is out of a window, right in front of the window or the control direction of a vehicle terminal;
if the semantic information is smart home, determining that the playing sound source direction matched with the playing equipment in the vehicle is the right front or the control direction of the vehicle terminal;
if the semantic information is navigation, determining that the direction of a playing sound source matched with the playing equipment in the vehicle is the right front direction or the control direction of the vehicle terminal.
Optionally, the voice data set is a set of high-fidelity voice data pre-recorded by at least one speaker at a recording studio.
Optionally, the apparatus further comprises:
the playing frequency determining module is used for detecting the jolt state of the vehicle and determining the playing frequency matched with the playing equipment in the vehicle according to the jolt state before controlling the playing equipment to play the target playing data according to the playing control parameters;
the play control module 430 includes:
and the playing frequency control unit is used for controlling the playing equipment to play the target playing data according to the playing frequency.
Optionally, the playing device is a manual mouth.
The user pronunciation simulation device based on the vehicle provided by the embodiment of the invention can execute the user pronunciation simulation method based on the vehicle provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 5 is a schematic structural diagram of a vehicle-based user pronunciation simulation system according to a fifth embodiment of the present invention. With reference to fig. 5, the system comprises: processor (or controller) 510, playback device 520, sound collection assembly 530, sound card 540, digital signal processor 550, and audio detection module 560.
Wherein the playing device 520, the sound collection assembly 530, the sound card 540, the digital signal processor 550 and the audio detection module 560 are all electrically connected with the processor 510; the playing device 520, the sound collection assembly 530, the sound card 540, the digital signal processor 550 and the audio detection module 560 are electrically connected in sequence; the position of the playback device 520 in the vehicle matches the position of the real human mouth.
A processor 510 for acquiring target play data from a pre-stored voice data set; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of target play data; and sending the play control parameters and the target play data to the playing device.
The voice data set is a set formed by high-fidelity voice data pre-recorded by at least one speaker in a recording studio.
Optionally, determining the playback control parameters that match the playback device 520 in the vehicle according to the vehicle environmental sound, and/or semantic information of the target playback data, includes:
acquiring a noise decibel value of the environmental sound of the vehicle, and determining a playing volume matched with playing equipment 520 in the vehicle according to the noise decibel value;
transmitting the playback control parameters and the target playback data to the playback device 520, including:
the playback volume and the target playback data are sent to the playback device 520.
Optionally, determining the play volume matching the play device 520 in the vehicle according to the noise decibel value includes:
adjusting the play volume of the play device 520 in the vehicle according to the noise decibel value;
controlling the playing device 520 to play the set audio according to the playing volume, and obtaining a playing decibel value matched with the set audio;
If the play decibel value matches the noise decibel value, determining that the current play volume of the play device 520 corresponds to the noise decibel value;
if the play db value does not match the noise db value, the play volume of the play device 520 is readjusted, and then the operation of controlling the play device 520 to play the set audio according to the play volume is performed.
Optionally, determining the playback control parameters that match the playback device 520 in the vehicle according to the vehicle environmental sound, and/or semantic information of the target playback data, includes:
acquiring semantic information of target playing data, and determining a playing sound source direction matched with playing equipment 520 in a vehicle according to the semantic information;
transmitting the playback control parameters and the target playback data to the playback device 520, including:
the direction of the play sound source and the target play data are transmitted to the play device 520.
Optionally, the semantic information includes at least one of: air conditioning, weather, smart home or navigation;
determining a playback sound source direction matching a playback device 520 in the vehicle from the semantic information includes:
if the semantic information is air-conditioning, determining that the direction of a playing sound source matched with playing equipment in the vehicle is right front downward or right front;
If the semantic information is a resolution class, determining that a playing sound source direction matched with playing equipment in the vehicle is a vehicle terminal control direction or a right front direction;
if the semantic information is weather, determining that the direction of a playing sound source matched with playing equipment in the vehicle is out of a window, right in front of the window or the control direction of a vehicle terminal;
if the semantic information is smart home, determining that the playing sound source direction matched with the playing equipment in the vehicle is the right front or the control direction of the vehicle terminal;
if the semantic information is navigation, determining that the direction of a playing sound source matched with the playing equipment in the vehicle is the right front direction or the control direction of the vehicle terminal.
And the playing device 520 is configured to play the target playing data according to the received playing control parameter.
Optionally, the playing device 520 specifically is configured to: and playing the target playing data according to the received playing volume and/or the playing sound source direction.
Optionally, the processor 510 is further configured to detect a jolt state of the vehicle, and determine a play frequency matched with a play device in the vehicle according to the jolt state; transmitting the playing frequency to playing equipment;
the playing device 520 is specifically further configured to: and playing the target playing data according to the received playing frequency.
Optionally, the playback device 520 is a human mouth.
The sound collection component 530 is configured to perform audio collection on the target playing data played by the playing device, obtain an audio signal corresponding to the target playing data, and transmit the audio signal to the sound card 540.
The sound collection assembly 530 may be a microphone or a microphone array, among others. The sound collection assembly 530 may be mounted in a vehicle terminal control accessory in a vehicle to facilitate collection of voice data played by a person or a playback device.
The sound card 540 converts the received audio signal into a digital signal and transmits the digital signal to the digital signal processor 550.
The digital signal processor 550 is configured to perform noise reduction processing and/or echo cancellation processing on the received digital signal, obtain a digital signal to be detected, and transmit the digital signal to be detected to the audio detection module 560.
The audio signal collected by the sound collection component 530 includes various noise signals, such as wind noise, tire noise, air conditioning noise, nearby vehicle noise, and noise with different frequencies and intensities, such as external vehicle noise. The digital signal processor 550 can attenuate noise to a certain extent, reduce noise interference, make the signal to noise ratio of the digital information to be detected higher, and achieve the effect that the sound of the target playing data broadcasted by the playing device is easier to distinguish and clearer.
The audio detection module 560 is configured to display the received digital signal to be detected, so as to determine whether the user pronunciation analog system based on the vehicle is normal according to the displayed digital signal to be detected.
The audio detection module 560 may include a display, which may be a stand-alone module, a display installed in a vehicle, or a display of a computer. The result of correlation of the audio waveform diagram, spectrogram, etc. of the digital signal to be detected can be displayed by the audio detection module 560. A listener of voice acquisition can judge whether a digital signal to be detected is normal or not through a displayed audio waveform diagram, a displayed spectrogram and the like, for example, whether a clipping condition exists or not, whether an audio sampling rate is correct or not, or whether signal interference of a special frequency exists or not, and the like. According to the result of determining whether the digital signal to be detected is normal, whether the user pronunciation simulation system based on the vehicle is normal can be determined in real time.
For example, when voice collection is performed in a vehicle, the environment is relatively complex, for example, the problem of voice collection is caused by accident caused by vehicle jolting generated by a deceleration strip. Or, the wire between the devices in the vehicle-based user pronunciation simulation system is disconnected due to the dead engine and the bump of the vehicle, and the voice acquisition is interrupted. As another example, a battery in a new energy automobile may produce stable interference at a certain frequency.
In the embodiment of the invention, the target playing data played by the playing device can be subjected to real-time voice acquisition and real-time detection. Real-time sampling can avoid batch processing to cause the problem in the voice acquisition not to be found in time, can not adjust in time based on the user pronunciation analog system of vehicle, causes to need to gather the speech signal repeatedly, inefficiency's problem.
According to the technical scheme, a user pronunciation simulation system based on a vehicle is adopted, and target playing data in a high-fidelity voice data set are obtained through a processor; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of target play data; playing the target playing data according to the playing control parameters by the playing equipment; acquiring target playing data played by playing equipment through a sound acquisition component to obtain an audio signal; converting the audio signal into a digital signal through a sound card; carrying out noise reduction processing and/or echo cancellation processing on the digital signal through a digital signal processor to obtain a digital signal to be detected; the audio detection module is used for displaying the digital signal to be detected, so that whether the user pronunciation analog system based on the vehicle is normal or not is determined according to the digital signal to be detected, the problems of high cost, low efficiency and difficult sampling when the voice data are collected by the live person on the bus and then the voice collection system is normal or not are solved, the live person on the bus is replaced by the playing equipment, the effect same as that of broadcasting the live person on the bus is achieved, meanwhile, the cost is saved, the sampling is efficient, the sampling scene can be more abundant without being limited by the live person, the voice data set can be repeatedly used for many times, the problems of unclear broadcasting and the like caused by fatigue of the live person during long-term working are avoided, the audio detection can be carried out in real time, the problem of finding the audio signal is timely, and whether the voice sampling equipment has the problem or not is determined.
Example six
Fig. 6 is a flowchart of a vehicle-based user pronunciation simulation method according to a sixth embodiment of the present invention. Referring to fig. 6, the method includes:
step 610, obtaining, by a processor, target play data from a pre-stored voice data set; determining a play control parameter matched with play equipment in the vehicle according to the vehicle environmental sound and/or semantic information of target play data; and sending the play control parameters and the target play data to the playing device.
The voice data set is a set formed by high-fidelity voice data pre-recorded by at least one speaker in a recording studio.
Optionally, determining the play control parameter matched with the play device in the vehicle according to the vehicle environmental sound and/or the semantic information of the target play data includes:
acquiring a noise decibel value of a vehicle environment sound, and determining a playing volume matched with playing equipment in a vehicle according to the noise decibel value;
transmitting the play control parameters and the target play data to the playing device, including:
and sending the playing volume and the target playing data to the playing equipment.
Optionally, determining the play volume matched with the play device in the vehicle according to the noise decibel value includes:
Adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value;
controlling the playing equipment to play the set audio according to the playing volume, and acquiring a playing decibel value matched with the set audio;
if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value;
if the play decibel value is not matched with the noise decibel value, the play volume of the play device is readjusted, and then the operation of controlling the play device to play the set audio according to the play volume is performed.
Optionally, determining the play control parameter matched with the play device in the vehicle according to the vehicle environmental sound and/or the semantic information of the target play data includes:
acquiring semantic information of target playing data, and determining a playing sound source direction matched with playing equipment in a vehicle according to the semantic information;
transmitting the play control parameters and the target play data to the playing device, including:
and transmitting the playing sound source direction and the target playing data to the playing device.
Optionally, determining, according to the semantic information, a direction of a playing sound source matched with a playing device in the vehicle includes:
and obtaining a preset mapping relation between the semantic information and the playing sound source direction, and determining the playing sound source direction matched with the semantic information according to the mapping relation.
Step 620, playing the target playing data by the playing device according to the received playing control parameters.
Optionally, playing, by the playing device, the target playing data according to the received playing control parameter, including: and playing the target playing data according to the received playing volume and/or the playing sound source direction.
And 630, carrying out audio acquisition on the target playing data played by the playing device through the sound acquisition component to obtain an audio signal corresponding to the target playing data, and transmitting the audio signal to the sound card.
Step 640, convert the received audio signal into a digital signal by the sound card, and transmit the digital signal to the digital signal processor.
Step 650, performing noise reduction processing and/or echo cancellation processing on the received digital signal by using a digital signal processor to obtain a digital signal to be detected, and transmitting the digital signal to be detected to an audio detection module.
Step 660, displaying the received digital signal to be detected by the audio detection module, so as to determine whether the user pronunciation simulation system based on the vehicle is normal according to the displayed digital signal to be detected.
The user pronunciation simulation method based on the vehicle provided by the embodiment of the invention is an execution method corresponding to the user pronunciation simulation system based on the vehicle, and has the same or similar technical characteristics and beneficial effects as the user pronunciation simulation system based on the vehicle.
Example seven
Fig. 7 is a schematic structural diagram of a computer device according to a seventh embodiment of the present invention, as shown in fig. 7, where the device includes:
one or more processors 510, one processor 510 being illustrated in fig. 7;
a playing device 520 for playing the set playing data according to the set playing control parameters;
a memory 720;
the apparatus may further include: an input device 730 and an output device 740.
The processor 510, memory 720, input means 730, and output means 740 in the device may be connected by a bus or other means, for example in fig. 7.
The memory 720 is a non-transitory computer readable storage medium, and may be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to a vehicle-based user pronunciation simulation method in an embodiment of the present invention (e.g., the target play data acquisition module 410, the play control parameter determination module 420, and the play control module 430 shown in fig. 4). The processor 510 executes various functional applications and data processing of the computer device by running software programs, instructions and modules stored in the memory 720, i.e. implements a vehicle-based user pronunciation simulation method of the above method embodiments, i.e.:
Acquiring target playing data from a pre-stored voice data set;
according to the vehicle environmental sound and/or the semantic information of the target playing data, determining a playing control parameter matched with playing equipment in the vehicle, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth;
and controlling the playing equipment to play the target playing data according to the playing control parameters.
Memory 720 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, memory 720 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 720 may optionally include memory located remotely from processor 510, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 730 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the computer device. The output device 740 may include a display device such as a display screen.
Example eight
An eighth embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a vehicle-based user pronunciation simulation method as provided by the embodiments of the present invention:
acquiring target playing data from a pre-stored voice data set;
according to the vehicle environmental sound and/or the semantic information of the target playing data, determining a playing control parameter matched with playing equipment in the vehicle, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth;
and controlling the playing equipment to play the target playing data according to the playing control parameters.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (11)

1. A vehicle-based user pronunciation simulation method, comprising:
acquiring target playing data from a pre-stored voice data set;
according to the vehicle environmental sound and the semantic information of the target playing data, determining playing control parameters matched with playing equipment in the vehicle, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth; the semantic information of the target playing data comprises content classification of the target playing data;
controlling the playing device to play the target playing data according to the playing control parameters;
Wherein, according to the semantic information of the target playing data, determining the playing control parameters matched with the playing equipment in the vehicle comprises:
acquiring semantic information of the target playing data, and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information;
wherein, the controlling the playing device to play the target playing data according to the playing control parameter includes:
and controlling the playing device to play the target playing data according to the playing sound source direction.
2. The method of claim 1, wherein determining playback control parameters that match playback devices in the vehicle based on the vehicle ambient sound comprises:
acquiring a noise decibel value of a vehicle environment sound, and determining a playing volume matched with playing equipment in the vehicle according to the noise decibel value;
the control of the playing device to play the target playing data according to the playing control parameters further includes:
and controlling the playing equipment to play the target playing data according to the playing volume.
3. The method of claim 2, wherein determining a playback volume that matches a playback device in the vehicle based on the noise decibel value comprises:
Adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value;
controlling the playing equipment to play the set audio according to the playing volume, and acquiring a playing decibel value matched with the set audio;
if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value;
and if the play decibel value is not matched with the noise decibel value, after the play volume of the play device is readjusted, returning to execute the operation of controlling the play device to play the set audio according to the play volume.
4. The method of claim 1, wherein the semantic information comprises at least one of: air conditioning, weather, smart home or navigation;
the determining, according to the semantic information, a playback sound source direction that matches a playback device in the vehicle includes:
if the semantic information is the air conditioner, determining that the direction of a playing sound source matched with playing equipment in the vehicle is right front downward or right front;
if the semantic information is the Tilt class, determining that the playing sound source direction matched with the playing equipment in the vehicle is the control direction of the vehicle terminal or the right front;
If the semantic information is the weather, determining that the playing sound source direction matched with the playing equipment in the vehicle is out of a window, right in front of the window or the control direction of a vehicle terminal;
if the semantic information is the intelligent home class, determining that a playing sound source direction matched with playing equipment in the vehicle is the right front direction or the vehicle terminal control direction;
and if the semantic information is the navigation class, determining that the playing sound source direction matched with the playing equipment in the vehicle is the front or the control direction of the vehicle terminal.
5. The method of claim 1, further comprising, prior to said controlling said playback device to play back said target playback data in accordance with said playback control parameter:
detecting a bump state of a vehicle, and determining a play frequency matched with a play device in the vehicle according to the bump state;
the control of the playing device to play the target playing data according to the playing control parameters further includes:
and controlling the playing equipment to play the target playing data according to the playing frequency.
6. The method of any one of claims 1-5, wherein the playback device is a manual nozzle.
7. A vehicle-based user pronunciation simulation system, comprising: the device comprises a processor, playing equipment, a sound collecting assembly, a sound card, a digital signal processor and an audio detection module;
the playing device, the sound collecting assembly, the sound card, the digital signal processor and the audio detection module are all electrically connected with the processor; the playing device, the sound collecting assembly, the sound card, the digital signal processor and the audio detection module are electrically connected in sequence; the position of the playing device in the vehicle is matched with the position of the real human mouth;
the processor is used for acquiring target playing data from a pre-stored voice data set; according to the vehicle environment sound and the semantic information of the target playing data, determining playing control parameters matched with playing equipment in the vehicle; transmitting the play control parameters and the target play data to the play equipment; the semantic information of the target playing data comprises content classification of the target playing data; the determining, according to the semantic information of the target playing data, a playing control parameter matched with a playing device in a vehicle includes: acquiring semantic information of the target playing data, and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information; the sending the play control parameter and the target play data to the playing device includes: transmitting the playing sound source direction and the target playing data to the playing device;
The playing device is used for playing the target playing data according to the received playing control parameters; wherein playing the target playing data according to the received playing control parameters comprises: playing the target playing data according to the received playing sound source direction;
the sound collection component is used for collecting the audio frequency of the target playing data played by the playing device, obtaining an audio signal corresponding to the target playing data and transmitting the audio signal to the sound card;
the sound card is used for converting the received audio signal into a digital signal and transmitting the digital signal to the digital signal processor;
the digital signal processor is used for carrying out noise reduction processing and/or echo cancellation processing on the received digital signals to obtain digital signals to be detected, and transmitting the digital signals to be detected to the audio detection module;
the audio detection module is used for displaying the received digital signal to be detected so as to determine whether the user pronunciation simulation system based on the vehicle is normal or not according to the displayed digital signal to be detected.
8. A vehicle-based user pronunciation simulation apparatus, comprising:
The target playing data acquisition module is used for acquiring target playing data from a pre-stored voice data set;
the playing control parameter determining module is used for determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environmental sound and the semantic information of the target playing data, and the position of the playing equipment in the vehicle is matched with the position of a real human mouth; the semantic information of the target playing data comprises content classification of the target playing data;
the playing control module is used for controlling the playing device to play the target playing data according to the playing control parameters;
the play control parameter determining module includes:
a play sound source direction determining unit, configured to obtain semantic information of the target play data, and determine a play sound source direction matched with a play device in the vehicle according to the semantic information;
the play control module comprises:
and the playing sound source direction control unit is used for controlling the playing device to play the target playing data according to the playing sound source direction.
9. The apparatus of claim 8, wherein the play control parameter determination module comprises:
The playing volume determining unit is used for obtaining a noise decibel value of the environmental sound of the vehicle and determining the playing volume matched with playing equipment in the vehicle according to the noise decibel value;
the play control module comprises:
and the play volume control unit is used for controlling the play equipment to play the target play data according to the play volume.
10. A computer device, comprising:
one or more processors;
the playing device is used for playing the set playing data according to the set playing control parameters;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a vehicle-based user pronunciation simulation method as claimed in any one of claims 1-6.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a vehicle-based user pronunciation simulation method as claimed in any one of claims 1-6.
CN202010881113.2A 2020-08-27 2020-08-27 User pronunciation simulation method, system, equipment and storage medium based on vehicle Active CN112017636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010881113.2A CN112017636B (en) 2020-08-27 2020-08-27 User pronunciation simulation method, system, equipment and storage medium based on vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010881113.2A CN112017636B (en) 2020-08-27 2020-08-27 User pronunciation simulation method, system, equipment and storage medium based on vehicle

Publications (2)

Publication Number Publication Date
CN112017636A CN112017636A (en) 2020-12-01
CN112017636B true CN112017636B (en) 2024-02-23

Family

ID=73502609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010881113.2A Active CN112017636B (en) 2020-08-27 2020-08-27 User pronunciation simulation method, system, equipment and storage medium based on vehicle

Country Status (1)

Country Link
CN (1) CN112017636B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114061941B (en) * 2021-10-18 2023-12-19 吉林大学 Experimental environment adjustment test method and system for new energy vehicle gearbox and test box
CN113823334B (en) * 2021-11-22 2022-02-08 腾讯科技(深圳)有限公司 Environment simulation method applied to vehicle-mounted equipment, related device and equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000002059U (en) * 1998-06-30 2000-01-25 전주범 Car audio control circuit with automatic volume control
JP2003235092A (en) * 2003-02-21 2003-08-22 Yamaha Corp Directive loudspeaker
CN101141118A (en) * 2007-10-25 2008-03-12 南京工业职业技术学院 Acoustics volume automatic controller along with environmental noise size for automobile
KR20090047643A (en) * 2007-11-08 2009-05-13 에스케이 텔레콤주식회사 Terminal for providing multimedia contents and method therefor
CN102695112A (en) * 2012-06-09 2012-09-26 九江妙士酷实业有限公司 Automobile player and volume control method thereof
CN102904536A (en) * 2011-07-28 2013-01-30 富泰华工业(深圳)有限公司 Volume adjusting device and volume adjusting method
CN105391837A (en) * 2014-09-01 2016-03-09 三星电子株式会社 Method and apparatus for managing audio signals
CN105632521A (en) * 2015-12-22 2016-06-01 深圳市智行畅联科技有限公司 Automobile-based random sound source automatic sound control device
CN105788588A (en) * 2014-12-23 2016-07-20 深圳市腾讯计算机***有限公司 Navigation voice broadcasting method and apparatus
CN108573718A (en) * 2017-03-10 2018-09-25 厦门歌乐电子企业有限公司 A kind of in-vehicle player
DE102017213252A1 (en) * 2017-08-01 2019-02-07 Bayerische Motoren Werke Aktiengesellschaft A method, apparatus and computer program for varying an audio content to be output in a vehicle
CN109979487A (en) * 2019-03-07 2019-07-05 百度在线网络技术(北京)有限公司 Voice signal detection method and device
CN111412587A (en) * 2020-03-31 2020-07-14 广东美的制冷设备有限公司 Voice processing method and device of air conditioner, air conditioner and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2956515A1 (en) * 2010-02-15 2011-08-19 France Telecom NAVIGATION METHOD IN SOUND CONTENT
KR20170101629A (en) * 2016-02-29 2017-09-06 한국전자통신연구원 Apparatus and method for providing multilingual audio service based on stereo audio signal
WO2018101459A1 (en) * 2016-12-02 2018-06-07 ヤマハ株式会社 Content playback device, sound collection device, and content playback system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000002059U (en) * 1998-06-30 2000-01-25 전주범 Car audio control circuit with automatic volume control
JP2003235092A (en) * 2003-02-21 2003-08-22 Yamaha Corp Directive loudspeaker
CN101141118A (en) * 2007-10-25 2008-03-12 南京工业职业技术学院 Acoustics volume automatic controller along with environmental noise size for automobile
KR20090047643A (en) * 2007-11-08 2009-05-13 에스케이 텔레콤주식회사 Terminal for providing multimedia contents and method therefor
CN102904536A (en) * 2011-07-28 2013-01-30 富泰华工业(深圳)有限公司 Volume adjusting device and volume adjusting method
CN102695112A (en) * 2012-06-09 2012-09-26 九江妙士酷实业有限公司 Automobile player and volume control method thereof
CN105391837A (en) * 2014-09-01 2016-03-09 三星电子株式会社 Method and apparatus for managing audio signals
CN105788588A (en) * 2014-12-23 2016-07-20 深圳市腾讯计算机***有限公司 Navigation voice broadcasting method and apparatus
CN105632521A (en) * 2015-12-22 2016-06-01 深圳市智行畅联科技有限公司 Automobile-based random sound source automatic sound control device
CN108573718A (en) * 2017-03-10 2018-09-25 厦门歌乐电子企业有限公司 A kind of in-vehicle player
DE102017213252A1 (en) * 2017-08-01 2019-02-07 Bayerische Motoren Werke Aktiengesellschaft A method, apparatus and computer program for varying an audio content to be output in a vehicle
CN109979487A (en) * 2019-03-07 2019-07-05 百度在线网络技术(北京)有限公司 Voice signal detection method and device
CN111412587A (en) * 2020-03-31 2020-07-14 广东美的制冷设备有限公司 Voice processing method and device of air conditioner, air conditioner and storage medium

Also Published As

Publication number Publication date
CN112017636A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN103208287B (en) Enhance the method and system of voice dialogue using the relevant information of vehicles of sound
CN1941079B (en) Speech recognition method and system
CN105489063B (en) A kind of new-energy automobile pedestrian's sound alarming device with active forewarning
US10861480B2 (en) Method and device for generating far-field speech data, computer device and computer readable storage medium
CN106782589B (en) Mobile terminal and voice input method and device thereof
CN112017636B (en) User pronunciation simulation method, system, equipment and storage medium based on vehicle
CN109273006B (en) Voice control method of vehicle-mounted system, vehicle and storage medium
CN112435682B (en) Vehicle noise reduction system, method and device, vehicle and storage medium
CN111554317B (en) Voice broadcasting method, equipment, computer storage medium and system
CN109361995B (en) Volume adjusting method and device for electrical equipment, electrical equipment and medium
CN105761532B (en) Dynamic voice reminding method and onboard system
CN113674763B (en) Method, system, device and storage medium for identifying whistle by utilizing line spectrum characteristics
Genuit et al. Human hearing–related measurement and analysis of acoustic environments
CN109600703A (en) Sound reinforcement system and its public address method and computer readable storage medium
CN112995882B (en) Intelligent equipment audio open loop test method
CN115904299A (en) Method and device for adjusting audio playing volume and audio playing system
CN114187906A (en) Vehicle controller and voice awakening method
CN115811681A (en) Earphone working mode control method, device, terminal and medium
CN108919277B (en) Indoor and outdoor environment identification method and system based on sub-ultrasonic waves and storage medium
CN112147780A (en) Vehicle-mounted head-up display device, control system, control method, and storage medium
CN112003666B (en) Vehicle-mounted radio control method, device, equipment and storage medium
CN217124669U (en) Self-adaptive in-vehicle prompt tone system and vehicle
CN116665713A (en) Cabin voice test system, method, electronic equipment and readable storage medium
CN215552855U (en) Indicator sound control system of steering lamp and vehicle
CN111610947B (en) Vehicle-mounted end conversation volume automatic regulating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant