CN109887493B - Character audio pushing method - Google Patents

Character audio pushing method Download PDF

Info

Publication number
CN109887493B
CN109887493B CN201910188890.6A CN201910188890A CN109887493B CN 109887493 B CN109887493 B CN 109887493B CN 201910188890 A CN201910188890 A CN 201910188890A CN 109887493 B CN109887493 B CN 109887493B
Authority
CN
China
Prior art keywords
audio
sound
end point
characters
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910188890.6A
Other languages
Chinese (zh)
Other versions
CN109887493A (en
Inventor
虞焰兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Semxum Information Technology Co ltd
Original Assignee
Anhui Semxum Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Semxum Information Technology Co ltd filed Critical Anhui Semxum Information Technology Co ltd
Priority to CN201910188890.6A priority Critical patent/CN109887493B/en
Publication of CN109887493A publication Critical patent/CN109887493A/en
Application granted granted Critical
Publication of CN109887493B publication Critical patent/CN109887493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Auxiliary Devices For Music (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a text audio pushing method, which belongs to the technical field of audio processing and comprises S1, sound processing; s2, identifying in a segmented mode; s3, audio frequency memory; s4, identifying audio according to the probability; and S5, audio character pushing. The collected audio is firstly processed into sound waves through the audio identification equipment, then the collected audio is segmented and identified according to the front end point and the rear end point which are set by the equipment and used as audio identification intervals, and each segment is pushed to a user after being identified, so that the audio is identified into characters and then pushed out after the user speaks a sentence in a pause interval when speaking, therefore, the audio characters received by the user are divided into segments, the occupied capacity of the audio characters sent by each segment is small, the audio characters can be pushed to the user quickly even if the network speed is slow, and the segmented characters are convenient for the user to watch.

Description

Character audio pushing method
Technical Field
The invention relates to the technical field of audio processing, in particular to a text audio pushing method.
Background
Automatic speech recognition technology has developed rapidly in recent years, making it possible for people to communicate with computers using speech. Compared with the traditional man-machine interaction modes such as a keyboard, a mouse and the like, the voice provides a more natural man-machine interaction interface, the automatic extraction of the audio characters is based on a core module of a voice recognition system, and the process of forcibly aligning the reference text and the corresponding voice is carried out, so that the purpose of converting the audio characters into text characters is achieved. As a common preprocessing technology in the field of voice recognition, the automatic extraction of audio characters is widely applied to aspects of model training, multimedia retrieval, broadcast television media, computer-aided language teaching and the like, and subtitles can be generated for live news, speeches, meetings and the like; generating a multimedia library for language teaching, game entertainment, movie production and the like; making a synchronized lyric display for the song, etc.
Traditional automatic speech recognition technique still has certain defect when using, traditional characters audio frequency discernment does not possess the function of segmentation discernment and propelling movement, a big section discernment propelling movement together usually, the capacity that occupies when characters audio content is more is great, run into the network delay when higher, the characters audio frequency propelling movement of great capacity is slower for user's speed, lead to the condition that the card is pause to appear to take place, and traditional speech recognition adopts the people for setting for the identification interval of audio frequency, this identification interval can adapt to most people, because everyone speaks the tone is different, therefore the sound of few people's speaking appears the mistake easily when discerning, the rate of accuracy is not high.
Disclosure of Invention
The invention aims to solve the problems that the speed of pushing characters is slow when the network delay is high and the recognition accuracy of the traditional voice recognition technology is low because the voice recognition technology does not have the function of segmented recognition pushing, and provides a character audio pushing method which has the advantages of segmented recognition pushing audio and characters similar to heartbeat, audio recognition has a memory function and high recognition accuracy.
The invention realizes the purpose through the following technical scheme, and a character audio pushing method comprises the following steps:
s1, sound processing: collecting sound by an audio recognition device, processing the collected sound audio data by a voice coding technology, and generating a sound waveform, wherein the X axis of the waveform is a time axis, the unit is millisecond, the Y axis is a volume axis, and the unit is decibel;
s2, segmented identification: setting an audio recognition device to recognize characters in a period from a front end point to a rear end point, wherein the front end point and the rear end point are both time intervals from sound generation to sound termination, the front end point is set to be 100 milliseconds, the rear end point is set to be 500 milliseconds, when the time interval from sound generation to sound activation is within 100 milliseconds, the audio device starts to recognize the sound, and when the time interval from sound activation to sound termination reaches 500 milliseconds, the audio device stops recognizing the sound;
s3, audio frequency memory: the audio recognition equipment recognizes the frequency of the recorded sound from the beginning to the end after the sound is once recognized, calculates the balance value of the audio through a distribution function, and sequentially stores different probabilities of the balance value in a high-to-low sequence into an internal processor after the sound is recognized for multiple times;
s4, identifying audio according to probability: because each person speaks in different timbres, the front end point and the rear end point adopted by each audio segment are different, when the audio recognition equipment recognizes a segment of audio, the audio segment with the balance value with the highest occurrence probability is taken as a recognition basis, if the audio is a new audio segment, the S3 is repeated to memorize the balance value, and if the audio is the audio segment memorized in the S3 mode, the front end point and the rear end point of the audio segment are directly selected;
s5, audio character pushing: and the audio recognition device pushes the characters recognized according to the steps S1-S4 to the user through an internal push module.
Preferably, when the audio recognition device collects the sound, the sound of the user is collected through an external microphone or a recording device.
Preferably, the start sound is an effective sound which is audible to human ears at a frequency of 20-20000Hz in S2, and the end sound is an ineffective sound which is not audible to human ears at a frequency of 0-20 Hz.
Preferably, the distribution function of S3 is used to calculate the balance value by the formula
Discrete type:
Figure GDA0002890365090000031
continuous type:
Figure GDA0002890365090000032
where e (x) is a balance value, x of the discrete formula is a volume variable of sound, and x of the continuous formula is a volume variable of sound.
Preferably, the pushing module of S5 is composed of a main control chip, a decoder, and a wireless network transmission module, and pushes the audio and text to a server or a mobile terminal through a wireless network.
Compared with the prior art, the invention has the beneficial effects that: the collected audio is firstly processed into sound waves by the audio identification equipment, then the collected audio is segmented and identified according to the front end point and the rear end point which are set by the equipment as audio identification intervals, each segment is identified and pushed to the user, thus the audio is identified into characters and then pushed out when the user speaks every sentence in a pause interval, the audio received by the user is divided into segments, the audio characters sent by the user occupy small volume, the audio characters can be pushed to the user quickly even if the network speed is slow, the segmented characters are convenient for the user to watch, the audio identification has a memory function, and as the speaking tone and the volume of each person are different, the balance value of the new segment of audio is calculated by an algorithm of taking the balance value through a distribution function after the new segment of audio is identified, and the balance value is classified and stored according to the probability of the balance value, when the audio with the same balance value is encountered next time, the front endpoint and the rear endpoint with the corresponding probabilities can be directly selected to identify the audio, so that the more the audio identification equipment is used, the more the interval between the front endpoint and the rear endpoint is counted, and the more accurate the identification is.
Drawings
Fig. 1 is a schematic flow chart of the heartbeat technique of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a text audio pushing method includes the following steps:
s1, sound processing: collecting sound by an audio recognition device, processing the collected sound audio data by a voice coding technology, and generating a sound waveform, wherein the X axis of the waveform is a time axis, the unit is millisecond, the Y axis is a volume axis, and the unit is decibel;
s2, segmented identification: setting an audio recognition device to recognize characters in a period from a front end point to a rear end point, wherein the front end point and the rear end point are both time intervals from sound generation to sound termination, the front end point is set to be 100 milliseconds, the rear end point is set to be 500 milliseconds, when the time interval from sound generation to sound activation is within 100 milliseconds, the audio device starts to recognize the sound, and when the time interval from sound activation to sound termination reaches 500 milliseconds, the audio device stops recognizing the sound;
s3, audio frequency memory: the audio recognition equipment recognizes the frequency of the recorded sound from the beginning to the end after the sound is once recognized, calculates the balance value of the audio through a distribution function, and sequentially stores different probabilities of the balance value in a high-to-low sequence into an internal processor after the sound is recognized for multiple times;
s4, identifying audio according to probability: because each person speaks in different timbres, the front end point and the rear end point adopted by each audio segment are different, when the audio recognition equipment recognizes a segment of audio, the audio segment with the balance value with the highest occurrence probability is taken as a recognition basis, if the audio is a new audio segment, the S3 is repeated to memorize the balance value, and if the audio is the audio segment memorized in the S3 mode, the front end point and the rear end point of the audio segment are directly selected;
s5, audio character pushing: and the audio recognition device pushes the characters recognized according to the steps S1-S4 to the user through an internal push module.
When the audio recognition device collects the sound, the sound of the user is collected through an external microphone or a recording device, in S2, the start sound is an effective sound which can be heard by human ears, the sound frequency is 20-20000Hz, the end sound is an ineffective sound which can not be heard by human ears, and the sound frequency is 0-20 Hz.
The distribution function of S3 calculates the balance value by the formula
Discrete type:
Figure GDA0002890365090000051
continuous type:
Figure GDA0002890365090000052
where e (x) is a balance value, x of the discrete formula is a volume variable of sound, and x of the continuous formula is a volume variable of sound.
The pushing module of S5 is composed of a main control chip, a decoder and a wireless network transmission module, and pushes the audio and the characters to a server or a mobile terminal through a wireless network.
The working principle of the invention is as follows: the audio recognition device generates sound waveforms through a coding technology after collecting sound, the set values of a front end point and a rear end point are not a specific numerical value, generally the front end point is set to be 100 milliseconds, the rear end point is set to be 500 milliseconds, when the device converts voice into characters, the time interval from invalid sound to valid sound starts to be converted into characters within 100 milliseconds, the character conversion is stopped until the time interval between the valid sound and the invalid sound is 500 milliseconds, the conversion is finished, character audio is pushed to a user, after a new section of audio data is collected, the audio recognition device calculates the frequency band balance value in a recognizable interval according to a distribution function, the probability of each front end point and each rear end point is calculated through multiple times of statistical balance values, and characters can be recognized by using the corresponding front end point and the corresponding rear end point when the audio with the same balance value is encountered next time, the accuracy of voice and text conversion is ensured to be high.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (4)

1. A text audio pushing method is characterized by comprising the following steps:
s1, sound processing: collecting sound by an audio recognition device, processing the collected sound audio data by a voice coding technology, and generating a sound waveform, wherein the X axis of the waveform is a time axis, the unit is millisecond, the Y axis is a volume axis, and the unit is decibel;
s2, segmented identification: setting an audio recognition device to recognize characters in a period from a front end point to a rear end point, wherein the front end point and the rear end point are both time intervals from sound generation to sound termination, the front end point is set to be 100 milliseconds, the rear end point is set to be 500 milliseconds, when the time interval from sound generation to sound activation is within 100 milliseconds, the audio device starts to recognize the sound, and when the time interval from sound activation to sound termination reaches 500 milliseconds, the audio device stops recognizing the sound;
s3, audio frequency memory: the audio recognition equipment recognizes the frequency of the recorded sound from the beginning to the end after the sound is once recognized, calculates the balance value of the audio through a distribution function, and sequentially stores different probabilities of the balance value in a high-to-low sequence into an internal processor after the sound is recognized for multiple times;
s4, identifying audio according to probability: because each person speaks in different timbres, the front end point and the rear end point adopted by each audio segment are different, when the audio recognition equipment recognizes a segment of audio, the audio segment with the balance value with the highest occurrence probability is taken as a recognition basis, if the audio is a new audio segment, the S3 is repeated to memorize the balance value, and if the audio is the audio segment memorized in the S3 mode, the front end point and the rear end point of the audio segment are directly selected;
s5, audio character pushing: the audio recognition device pushes the characters recognized by the steps S1-S4 to the user through an internal push module;
the distribution function of S3 calculates the balance value by the formula
Continuous type:
Figure DEST_PATH_IMAGE002
where E (X) is the balance value, and x of the continuous calculation formula is the volume variable of the sound.
2. The text audio pushing method according to claim 1, wherein: when the audio recognition device collects the sound, the sound of the user is collected through an external microphone or a recording device.
3. The text audio pushing method according to claim 1, wherein: in S2, the start sound is an effective sound that can be heard by human ears, the sound frequency is 20-20000Hz, the end sound is an ineffective sound that cannot be heard by human ears, and the sound frequency is 0-20 Hz.
4. The text audio pushing method according to claim 1, wherein: and the pushing module of the S5 consists of a main control chip, a decoder and a wireless network transmission module, and pushes the audio and the characters to a server or a mobile terminal through a wireless network.
CN201910188890.6A 2019-03-13 2019-03-13 Character audio pushing method Active CN109887493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910188890.6A CN109887493B (en) 2019-03-13 2019-03-13 Character audio pushing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910188890.6A CN109887493B (en) 2019-03-13 2019-03-13 Character audio pushing method

Publications (2)

Publication Number Publication Date
CN109887493A CN109887493A (en) 2019-06-14
CN109887493B true CN109887493B (en) 2021-08-31

Family

ID=66932122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910188890.6A Active CN109887493B (en) 2019-03-13 2019-03-13 Character audio pushing method

Country Status (1)

Country Link
CN (1) CN109887493B (en)

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1779777B (en) * 2005-08-16 2011-01-05 万纳特科技(深圳)有限公司 Audio-frequency editing and converting method by cutting audio-frequency wave form
CN101299333A (en) * 2007-04-30 2008-11-05 张家港市思韵语音科技有限公司 Built-in speech recognition system and inner core technique thereof
GB2502944A (en) * 2012-03-30 2013-12-18 Jpal Ltd Segmentation and transcription of speech
CN104464723B (en) * 2014-12-16 2018-03-20 科大讯飞股份有限公司 A kind of voice interactive method and system
US10134425B1 (en) * 2015-06-29 2018-11-20 Amazon Technologies, Inc. Direction-based speech endpointing
CN106601243B (en) * 2015-10-20 2020-11-06 阿里巴巴集团控股有限公司 Video file identification method and device
US10090005B2 (en) * 2016-03-10 2018-10-02 Aspinity, Inc. Analog voice activity detection
US10867620B2 (en) * 2016-06-22 2020-12-15 Dolby Laboratories Licensing Corporation Sibilance detection and mitigation
JP6677126B2 (en) * 2016-08-25 2020-04-08 株式会社デンソー Interactive control device for vehicles
CN108008824A (en) * 2017-12-26 2018-05-08 安徽声讯信息技术有限公司 The method that official document takes down in short-hand the collection of this multilink data
CN108074570A (en) * 2017-12-26 2018-05-25 安徽声讯信息技术有限公司 Surface trimming, transmission, the audio recognition method preserved
CN108449629B (en) * 2018-03-31 2020-06-05 湖南广播电视台广播传媒中心 Audio voice and character synchronization method, editing method and editing system

Also Published As

Publication number Publication date
CN109887493A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN111063341B (en) Method and system for segmenting and clustering multi-person voice in complex environment
CN109509470B (en) Voice interaction method and device, computer readable storage medium and terminal equipment
CN105405439B (en) Speech playing method and device
CN111508498B (en) Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium
CN108711429B (en) Electronic device and device control method
CN111243590A (en) Conference record generation method and device
CN103151039A (en) Speaker age identification method based on SVM (Support Vector Machine)
CN104575504A (en) Method for personalized television voice wake-up by voiceprint and voice identification
CN109005419B (en) Voice information processing method and client
CN103700370A (en) Broadcast television voice recognition method and system
CN102723078A (en) Emotion speech recognition method based on natural language comprehension
US8078455B2 (en) Apparatus, method, and medium for distinguishing vocal sound from other sounds
CN102404278A (en) Song request system based on voiceprint recognition and application method thereof
CN104766608A (en) Voice control method and voice control device
CN106656767A (en) Method and system for increasing new anchor retention
JP6915637B2 (en) Information processing equipment, information processing methods, and programs
CN113823323B (en) Audio processing method and device based on convolutional neural network and related equipment
CN114242064A (en) Speech recognition method and device, and training method and device of speech recognition model
WO2024140430A1 (en) Text classification method based on multimodal deep learning, device, and storage medium
CN110211609A (en) A method of promoting speech recognition accuracy
CN111107284B (en) Real-time generation system and generation method for video subtitles
CN111798846A (en) Voice command word recognition method and device, conference terminal and conference terminal system
CN109657094B (en) Audio processing method and terminal equipment
CN110853669A (en) Audio identification method, device and equipment
CN109887493B (en) Character audio pushing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant