CN108597493B - The audio exchange method and audio exchange system of language semantic - Google Patents

The audio exchange method and audio exchange system of language semantic Download PDF

Info

Publication number
CN108597493B
CN108597493B CN201810264460.3A CN201810264460A CN108597493B CN 108597493 B CN108597493 B CN 108597493B CN 201810264460 A CN201810264460 A CN 201810264460A CN 108597493 B CN108597493 B CN 108597493B
Authority
CN
China
Prior art keywords
language
voice
semantic
phoneme
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810264460.3A
Other languages
Chinese (zh)
Other versions
CN108597493A (en
Inventor
孔繁泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910143693.2A priority Critical patent/CN109754780B/en
Priority to CN201810264460.3A priority patent/CN108597493B/en
Publication of CN108597493A publication Critical patent/CN108597493A/en
Priority to PCT/CN2019/079834 priority patent/WO2019184942A1/en
Application granted granted Critical
Publication of CN108597493B publication Critical patent/CN108597493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

Audio exchange method, system and the audio coding figure of language semantic of the invention, because semantic complexity causes data to respond the technical problem to go wrong with real-time difference when solving linguistic intertranslation in the prior art.Method forms the voice mapping structure of each language including the use of minimum aligned phoneme sequence, completes to convert between semantic language by each voice mapping structure.Using language constitute in formed audio minimum short section minimum phoneme as the master data crosspoint of semantic conversion between each language, using minimum phoneme as the basis of coding of data exchange, change the foundation structure of speech recognition, optimize the codec complexity and accuracy rate of language sound intermediate frequency content, so that avoiding the complex audio feature for being coupled the formation of the composite informations such as language fragments medium pitch, scale, range in the cataloged procedure of language audio, phonetic recognization rate ensure that.The mapping structure of the voice coding and literal code that are formed using minimum phoneme is improved data exchange efficiency when language translation.

Description

The audio exchange method and audio exchange system of language semantic
Technical field
The present invention relates to information exchange fields, and in particular to a kind of audio exchange method of language semantic and audio exchange system System.
Background technique
Current language translation mainly synthesizes several parts by speech recognition, semantic analysis and sentence and forms, and speech recognition is adopted With high sensor, audio corresponding with text in sentence is extracted from the frequency domain or time domain speech signal stream of opriginal language Signal set, semantic analysis utilize the models pair such as hidden Markov model (HMM), self learning model, artificial neural network (ANN) Word sequence and semantic meaning in audio signal collection are identified and are quantified to determine expression content, sentence synthesis as far as possible The audio signal collection or word sequence set of object language are formed according to the identification of expression content and quantized data.In this mistake The computing resource for being influenced by semantic analysis model complexity to need magnanimity in journey, for the application of mobile terminal need using point The computing architecture of cloth, using the computing resource at the guaranteed bandwidth access service end of internet, therefore the real-time and standard translated True property is restricted.
In patent document CN104637482B, disclose it is a kind of utilize digital coding realize dress of the voice to text conversion It sets, wherein storing first language phoneme characteristic using phoneme storage unit;Using phoneme conversion unit by received phoneme Signal sequence is converted to first language phoneme by first language phoneme characteristic;It is first language using digital coding unit Phoneme carries out unique encodings, forms first language phoneme encoding sequence;The first language is formed using first language phoneme encoding sequence The word pronunciation code sequence and vocabulary pronunciation code sequence of speech;Using the word storing unit storage word of first language, vocabulary or Figure and corresponding coded sequence;First language is generated according to the corresponding relationship of coded sequence using words converting unit Word, vocabulary, figure and/or combination thereof.There are the bases of coding mapping between the device description words and voice.How coding is utilized The resource consumption of the picture and text audio conversion of identical semanteme needs inventive improvements between mapping basis reduction language.
Summary of the invention
In view of this, the embodiment of the present invention is dedicated to providing audio exchange method and the audio exchange system of a kind of language semantic System, semanteme complexity leads to the technical problem of data response and real-time difference when solving linguistic intertranslation in the prior art.
The audio exchange method of the language semantic of the embodiment of the present invention, forms the voice of each language using minimum aligned phoneme sequence Mapping structure is completed to convert between semantic language by each voice mapping structure.
The audio exchange system of the language semantic of the embodiment of the present invention characterized by comprising
Memory, the program code of the audio exchange method for storing above-mentioned language semantic;
Processor, for running said program code.
The audio exchange system of the language semantic of the embodiment of the present invention, for forming each language using minimum aligned phoneme sequence Voice mapping structure is completed to convert between semantic language by each voice mapping structure.
The basic voice coding figure of the embodiment of the present invention, for the graphic software platform of language phoneme, including basic framework, The basic framework includes the first adaptation column, the second adaptation column and adapter rod arranged side by side, the first adaptation column and described second Adaptation hyte is respectively set in adaptation column, and the adaptation hyte includes several adaptation positions, and the both ends of the adapter rod respectively connect one One adaptation position of a adaptation column.
The audio exchange method and audio exchange system of the language semantic of the embodiment of the present invention, coded graphics utilize language structure Master data crosspoint at the middle minimum phoneme for forming audio minimum short section as semantic conversion between each language, utilizes minimum Basis of coding of the phoneme as data exchange, changes the foundation structure of speech recognition, simplifies the volume of language sound intermediate frequency content Code length and code efficiency, so that data exchange efficiency when language translation is optimized, to reduction remote data real-time response Time delay, improving the memory capacity of basic data structure and basic data in local mobile terminal has positive influence.
Detailed description of the invention
Fig. 1 show the data handling procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 2 show the cataloged procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 3 show the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 4 show the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
The audio exchange method that Fig. 5 show one embodiment of the invention language semantic carries out the schematic diagram of language conversion.
Fig. 6 show the configuration diagram of the audio exchange system of one embodiment of the invention language semantic.
Fig. 7 show a kind of figure of basic voice coding figure in the audio exchange method of language semantic of the embodiment of the present invention Shape structural schematic diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.
The audio exchange method of the language semantic of the embodiment of the present invention, comprising:
The voice mapping structure that each language is formed using minimum aligned phoneme sequence completes semanteme by each voice mapping structure It is converted between language.
There are essence difference on picture and text and pronunciation, semantic conversion refers to identical semanteme for the expression of identical semanteme between language The conversion of different picture and text and pronunciation expression-form.
The pronunciation of the semantic text (one kind as graphical symbol) of regional all-purpose language expression has certainty, word The pronunciation law of remittance and sentence can be summarized as the various combination of syllable.And one group of basic minimum phoneme is used to constitute each sound The low signal Load Characteristics that section can use minimum phoneme exclude audio redundant signals and interference information, mention for complex data exchange For the basis of coding more simplified, code length is reduced.
It is compared according to statistics of the those skilled in the art to each regional all-purpose language, most as pronunciation fundamental It was determined that quantity is less than 1000, the world 7000 or so, which is planted, amounts to 800 left sides in language for small phoneme quantity and its audio frequency characteristics Right unduplicated minimum phoneme, wherein every kind of western language about uses 40 or so minimum phonemes, Chinese is no more than 150 left sides Right minimum phoneme, can be established completely using the block code of hundreds value range or kilobit numberical range index be, for example, ten into Make three digits or four figures, e.g. 0 digit of binary one or 20 digits.
The audio exchange method of the language semantic of the embodiment of the present invention utilizes formation audio minimum short section in language composition Master data crosspoint of the minimum phoneme as semantic conversion between each language, using minimum phoneme as the coding of data exchange Basis changes the foundation structure of speech recognition, the code length and code efficiency of language sound intermediate frequency content is simplified, so that language Say the complicated sound for avoiding being coupled the formation of the composite informations such as language fragments medium pitch, scale, range in the cataloged procedure of audio Frequency feature, ensure that phonetic recognization rate, and the mapping structure of the voice coding and literal code that are formed using minimum phoneme makes language Data exchange efficiency when speech translation is optimized.To reduce remote data real-time response time delay, improve basic data structure and Memory capacity of the basic data in local mobile terminal has positive influence.
Fig. 1 is the data handling procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 1 institute Show, comprising:
Step 100: all minimum phonemes of serializing.
Serialization process may include the identification to syllable, phoneme, scale, intonation in language, syllable, sound to identification The quantitative mathematical description of element, scale, intonation, such as time domain or the audio characteristic data of frequency domain, the knot of data is described to quantitative mathematical Structureization storage, such as coding forms index one by one.
Step 200: being formed by the subset of all minimum phonemes between the text-to-speech of each language and map data.
The pronunciation basis of every kind of language is determined by the subset of all minimum phonemes, passes through the group of phoneme minimum in subset The voice identifier for forming word pronunciation in a kind of language is closed, and then forms using voice identifier that text is corresponding between voice identifier to be tied The mapping data of structure, mapping data include the data structure of storing data.Mapping data may include reflecting between text and voice Penetrate the mapping data between data and voice.
Step 300: being formed by language semantic between the voice of each language and map data.
The mapping data that correspond to the voice of meaning are established between language using semantic objectivity, and mapping data include storage number According to data structure.It also may include the mapping data between text and voice.
Step 400: forming semantic language turn using mapping data between data and text-to-speech are mapped between corresponding voice It changes.
The audio exchange method of the language semantic of the embodiment of the present invention ensure that one kind by mapping data between text-to-speech The continuity and correctness of the Text-to-Speech of language map the combination that data are mapped between data and text-to-speech between voice The conversion diversity between language is allowed to realize higher language in conversion process while guaranteeing the conversion quality between language Say basic data interactive efficiency.It simultaneously can be with shape by mapping the mapping variation of mapping data between data and text-to-speech between voice At further cipher round results.
Fig. 2 is the cataloged procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.As shown in Fig. 2, On that basi of the above embodiments, step 100 includes:
Step 110: the minimum phoneme of each all-purpose language is acquired by speech recognition.
Based on human physiological's feature and language evolution, the voice of language can be decomposed into be pronounced to word pronunciation extremely by sentence Word syllable constitutes the STRUCTURE DECOMPOSITION of phoneme to syllable.It will be appreciated by those skilled in the art that carrying out audio using computer technology Acquisition and time domain or the frequency domain character analysis of audio fragment can determine the audio frequency characteristics of word, word, phrase, determine including Minimum phoneme feature.
Step 120: minimum phoneme is formed into unified aligned phoneme sequence.
It will be appreciated by those skilled in the art that by speech recognition technology, in conjunction with the speech analysis and statistics of necessary data amount The minimum phoneme audio frequency characteristics used in each language can be identified and be determined.By the audio frequency characteristics of each of determining minimum phoneme Unified mark coding, forms the unified aligned phoneme sequence of all minimum phonemes.Unified aligned phoneme sequence allows the voice of language quasi- The determining combination to be formed by least one minimum phoneme really is deconstructed, determines that combination can pass through unified aligned phoneme sequence acquisition pair The coded sequence answered.
Such as: syllable is formed using initial consonant and simple or compound vowel of a Chinese syllable in Chinese, initial consonant is by single minimum phoneme or several single minimum sounds Element is formed, and simple or compound vowel of a Chinese syllable is formed by one or several minimum phonemes, forms syllable using vowel and consonant in similar English, vowel by Single minimum phoneme or several single minimum phonemes are formed, and consonant is formed by one or several minimum phonemes, the unified sound of formation It the part of prime sequences can be as shown in the table:
Single minimum phoneme in table in unified aligned phoneme sequence has unique encodings in unified aligned phoneme sequence.For being less than 1000 minimum phonemes can form unique encodings using 10bit (bit) length.
The audio exchange method of language semantic of the embodiment of the present invention forms unified aligned phoneme sequence as same or similar semanteme The essential information carrier of text or voice conversion between different language, avoids other kinds of composite audio carrier (such as sound Section) entrained by excessive redundancy formed information interference, be conducive to optimize speech recognition accuracy and recognition efficiency.Most Small phoneme further can be updated unified aligned phoneme sequence with language evolution using unified aligned phoneme sequence, keep to each language The synchronous variation of speech sound.
As shown in Fig. 2, step 200 includes: in the audio exchange method of one embodiment of the invention language semantic
Step 210: forming the pronunciation with individual character or word in first language using a part of phoneme in unified aligned phoneme sequence Corresponding first basic voice coding sequence.
A part of phoneme includes a kind of all minimum phonemes of language pronouncing, can form syllable using this part of phoneme And then form the pronunciation of the language word or word.Coding based on minimum phoneme in unified aligned phoneme sequence, forms the first language The basic voice coding of each individual character or word is called the turn, and then forms the basic voice coder of all (or main) individual characters or word Code sequence.
Such as: " mother " word in Chinese, phonetic are " ma ", including phoneme " m " and " a ", and " m " is in unified aligned phoneme sequence Be encoded to 120, " a " is encoded to 010 in unified aligned phoneme sequence, then " mother " word is in the basic voice coding sequence of Chinese Be encoded to 120010.
It in an embodiment of the present invention can also be using other coding compress modes, such as the phoneme for including by " mother " word Coding adds up, and formation is encoded to 130.Or use the patterned mode of basic voice coding.
It will be appreciated by those skilled in the art that the coding form in basic voice coding sequence in citing is there are redundancy, by Minimum phoneme encoding effect length can use compression coding technology using the basic voice coding sequence of standard byte and keep compiling Code uniqueness and compared with lower Item length.
It will be appreciated by those skilled in the art that it is identical basic that there are the different individual characters of same pronunciation or word can have The different pronunciations of voice coding, individual character or word can make same individual character or word have different basic voice codings.
Step 220: being formed using the first basic voice coding sequence corresponding with phrase in first language or sentence pronunciation First voice mapping structure.
In the basic voice coding sequence basis that individual character or word determine, the voice mapping structure of phrase or sentence can be with Form the voice mapping structure that phrase or sentence are formed based on basic voice coding sequence extension.
Voice mapping structure can be using with address feature and addressable data structure, such as team either statically or dynamically The single form or combining form of column, array, heap, storehouse, chained list, tree or figure etc., can use either statically or dynamically pointer can be with Realize different data structure type address arithmetic, each data structure involved in voice mapping structure may exist comprising or Side by side.
In an embodiment of the present invention, it can be formed to pointer with related semantic meaning using above-mentioned data structure The mapping structure of voice and semanteme between word, word, language, sentence, by establishing part of speech mapping structure with semantic meaning.
Fig. 3 is the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 3 institute Show, for Chinese, by taking " hair " word, " bright " word, " wound " word, " making " word as an example, each word utilizes correspondence as minimum semantic primitive The phoneme of pronunciation establishes corresponding basic voice coding, has discreteness between the basic voice coding of each word.Individual character is with chained list knot Structure (only as an example) stores single character code (the i.e. phoneme feature) filter efficiency that can guarantee high speed.It is formed with individual character Each word with semantic meaning such as " invention ", " creation " are with the storage of another list structure, the basic voice coder of each word Code is formed using the basic voice coding of included individual character, has discreteness between the basic voice coding of each word.With individual character or Phrase of each of the word formation with semantic meaning is with structure of arrays (only as an example) storage, it is ensured that quickly seeks The efficiency of location and data topology update variation, has discreteness between the basic voice coding of each phrase.
Word, word, phrase are formed according to the semantic dependency of word, word, phrase using the address pointer in data structure The mapping structure tree of correlation or mapping structure figure, so that form mapping association between voice and semanteme, this mapping association can be with It is that static or some movable state updates.
In basic vocoded data structure, (perhaps word or phrase) data cell of each word can be with Extension, such as it is extended to queue, for storing the different semantic words (perhaps word or phrase) of same pronunciation, voice is reflected Penetrate structure multi-dimension.
The audio exchange method of language semantic of the embodiment of the present invention is made using the data store organisation of voice mapping text The major part of voice mapping structure is static structure, and it is excellent can to form structure by the computing capability in server end or cloud Change, a small amount of dynamic update and supplement can be completed using less computing resource in client.Since phoneme in pronunciation is utilized The basic voice coding sequence formed greatly reduces complexity and data volume for semantic voice mapping structure, so that The data storage and data processing of voice mapping structure can complete response in client and server-side under low time delay state.
As shown in Fig. 2, step 200 in the audio exchange method of one embodiment of the invention language semantic further include:
Step 230: forming individual character in second language or pronunciation of words using another part phoneme in unified aligned phoneme sequence Second basic voice coding sequence.
Another part phoneme may include the identical phoneme in part compared with phoneme a part of in above-mentioned steps 130, or The identical phoneme of person is with the word or symbol logo in different language.
Such as: " and " its phonetic symbol is in EnglishIncluding phoneme" n " and " d "," n " and " d " is encoded to 018,220 and 200 in unified aligned phoneme sequence, then " and " word is in the basic voice coding sequence of English Be encoded to 018220200.
It will be appreciated by those skilled in the art that citing in basic voice coding sequence in coding form there are redundancies, can To keep the uniqueness of coding using compression coding technology and compared with lower Item length.
It will be appreciated by those skilled in the art that it is identical basic that there are the different individual characters of same pronunciation or word can have The different pronunciations of voice coding, individual character or word can make same individual character or word have different basic voice codings.
Step 240: being formed using the second basic voice coding sequence corresponding with phrase in second language or sentence pronunciation Second voice mapping structure.
The text (or symbol) of identical semanteme has the possibility of same pronunciation, the difference of identical semanteme in different language The same pronunciation of text generates encoding variability with the formation of macaronic voice mapping structure.
Fig. 4 is the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 4 institute Show, for English, by taking " invention ", " creation " as an example, each word is sent out as minimum semantic primitive using corresponding The phoneme of sound establishes corresponding basic voice coding, has discreteness between the basic voice coding of each word.Word is with database Form structure (only as an example) storage can guarantee high speed word coding (i.e. phoneme feature) filter efficiency.With list Morphology at each of the phrase with semantic meaning stored with the form structure (only as an example) of database, it is ensured that The efficiency of immediate addressing and data topology update variation, has discreteness between the basic voice coding of each phrase.
Word, phrase correlation are formed according to the semantic dependency of word, phrase using the address pointer in data structure Mapping structure tree or mapping structure figure so that forming mapping association between voice and semanteme, this mapping association can be static state Or some movable state update.
In basic vocoded data structure, the data cell of each word or phrase can be extended to queue, It is for storing same pronunciation different semantic words or phrase, voice mapping structure is multidimensional.
The audio exchange method of language semantic of the embodiment of the present invention is made using the data store organisation of voice mapping text The major part of voice mapping structure is static structure, and it is excellent can to form structure by the computing capability in server end or cloud Change, a small amount of dynamic update and supplement can be completed using less computing resource in client.Since phoneme in pronunciation is utilized Basic voice coding sequence, the complexity and data volume for semantic voice mapping structure are greatly reduced, so that voice The data storage and data processing of mapping structure can complete response in client and server-side under low time delay state.
As shown in Fig. 2, step 300 in the audio exchange method of one embodiment of the invention language semantic further include:
Step 310: pass through each first language and second language using same or similar semantic information (i.e. first and the Two) voice mapping structure forms the voice primary transformational structure between corresponding language.
Using macaronic voice mapping structure based on same or similar semantic information between needing the language translated The voice primary transformational structure between the individual character or word of same or similar meaning is formed, macaronic individual character, word, short is stored Basic voice coding between language or sentence, voice primary transformational structure can be stored using the structure of " key: key assignments ", big to respond Measure the filter efficiency of concurrent request.
For example, by using
It is semantic: the basic voice coding of English: the basic voice coding of Chinese
Innovation and creation: 092072069:710169555614
The basic voice coding of English and the basic voice coding of Chinese can key and key assignments each other, be used for two-way translation.
As shown in Fig. 2, step 300 in the audio exchange method of one embodiment of the invention language semantic further include:
Step 320: forming corresponding (i.e. first and second) voice using the syntax rule of first language and second language and reflect Penetrate the advanced transformational structure of interstructural voice.
The syntax rule of each language includes the language between the root according to individual character or word and the individual character or word of part of speech foundation Pitch class transformational structure.According to voice primary transformational structure, the advanced transformational structure of voice can be deposited using the structure of " key: key assignments " Storage, to respond the filter efficiency of a large amount of concurrent requests.
For example, by using
It is semantic: grammer: the basic voice coding of English
English " creates (noun) " 0001:092072069;
English " creates (verb) " 0002:092072069;
English " creates (adverbial word) " 0003:092072069;
It is semantic: grammer: the basic voice coding of Chinese
Chinese " creates (noun) " 0001:710169555614;
Chinese " creates (verb) " 0002:710169555614;
Chinese " creates (adverbial word) " 0003:710169555614;
The basic voice coder of the individual character, word or vocabulary with similar semantic will be formed in bilingual according to different grammers Code opposite can be assembled, and coding dependency improves, and improve filter efficiency and machine translation efficiency of algorithm in translation process.
Fig. 5 is that the audio exchange method of one embodiment of the invention language semantic carries out the schematic diagram of language conversion.Such as Fig. 5 institute Show, step 400 includes:
Step 410: the sequence set of phonemes of the audio input segment of first language is obtained using speech recognition;
Step 420: determining the first basic of sequence set of phonemes using the first basic voice coding sequence of first language Voice coding;
Step 430: determining sequence using the first voice mapping structure of first language and the first basic voice coding sequence The continuous speech of set of phonemes encodes;
Step 440: the second basic voice coder of second language is obtained using the voice primary transformational structure between corresponding language Code;
Step 450: using between corresponding language the advanced transformational structure of voice and the second basic voice coding sequence obtain the The continuous speech of two language encodes;
Step 460: encoding to form sound pronunciation according to the continuous speech of second language.
The audio exchange method of language semantic of the embodiment of the present invention carries out utilizing the aligned phoneme sequence-base formed when language conversion The transformational structure that this voice coding sequence-is formed between voice mapping structure and language can between voice and text between completing bilingual Inverse conversion is conducive to voice and converts accurately or relatively accurately obtain corresponding alternative spelling words intellectual.Data and data structure Sizes of memory is limited, and retrieval difficulty is lower, suitable for being locally stored and handling, reality that whole process responds server-side request of data When property and bandwidth requirement be not high.Fig. 6 is the configuration diagram of the audio exchange system of one embodiment of the invention language semantic.Such as figure Shown in 6, the audio exchange system of the embodiment of the present invention, the voice for forming each language using minimum aligned phoneme sequence maps knot Structure is completed to convert between semantic language by each voice mapping structure.
As shown in fig. 6, the audio exchange system of the embodiment of the present invention includes:
Device 1100 is serialized, for serializing all minimum phonemes.
Phoneme maps to form device 1200 in language, for forming the text of each language by the subset of all minimum phonemes Data are mapped between voice.
Phoneme maps to form device 1300 between language, maps number between the voice for forming each language by language semantic According to.
Language converting device 1400, for using mapping data are formed between mapping data and text-to-speech between corresponding voice Semantic language conversion.
As shown in fig. 6, serializing device 1100 includes: in the audio exchange system of the embodiment of the present invention
Phoneme recognition module 1110, for acquiring the minimum phoneme of each all-purpose language by speech recognition.
Phoneme encoding module 1120, for minimum phoneme to be formed unified aligned phoneme sequence.
As shown in fig. 6, phoneme maps to form device 1200 and include: in language in the audio exchange system of the embodiment of the present invention
First voice coding establishes module 1210, for being formed and the first language using a part of phoneme in unified aligned phoneme sequence Call the turn the corresponding first basic voice coding sequence of pronunciation of individual character or word.
Module 1220 is established in the mapping of first voice, for using in the first basic voice coding sequence formation and first language Phrase or the corresponding first voice mapping structure of sentence pronunciation.
Second voice coding establishes module 1230, for forming the second language using another part phoneme in unified aligned phoneme sequence Call the turn the second basic voice coding sequence of individual character or pronunciation of words.
Module 1240 is established in the mapping of second voice, for using in the second basic voice coding sequence formation and second language Phrase or the corresponding second voice mapping structure of sentence pronunciation.
As shown in fig. 6, phoneme maps to form device 1300 and include: between language in the audio exchange system of the embodiment of the present invention
Language construction primary conversion module 1310, for using same or similar semantic information by each first language and (i.e. first and second) the voice mapping structure of second language forms the voice primary transformational structure between corresponding language.
The advanced conversion module 1320 of language construction, for being formed accordingly using the syntax rule of first language and second language The advanced transformational structure of voice between (i.e. first and second) voice mapping structure.
As shown in fig. 6, language converting device 1400 includes: in the audio exchange system of the embodiment of the present invention
Phoneme recognition module 1410, the sequence phoneme of the audio input segment for obtaining first language using speech recognition Set;
First basic coding identification module 1420 determines suitable for the first basic voice coding sequence using first language The basic voice coding of the first of sequence set of phonemes;
First continuous speech coding module 1430, substantially for the first voice mapping structure and first using first language Voice coding sequence determines the continuous speech coding of sequence set of phonemes;
Second basic coding identification module 1440, for obtaining second using the voice primary transformational structure between corresponding language The basic voice coding of the second of language;
Second continuous speech coding module 1450, for utilizing the advanced transformational structure of voice and the second base between corresponding language This voice coding sequence obtains the continuous speech coding of second language;
Continuous programming code conversion module 1460 encodes to form sound pronunciation for the continuous speech according to second language.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program ver-ify code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the audio exchange method of one embodiment of the invention language semantic, for utilizing the part in unified aligned phoneme sequence Minimum phoneme forms the basic voice coding sequence of individual character or pronunciation of words in a kind of language, and basic voice coding therein can be with Additional pictorial symbols are formed, it is corresponding with the individual character or word accordingly to pronounce.Graphical using basic voice coding can be with The pronunciation of individual character or word that phoneme is formed is converted to visual identity, is conducive to Computer Vision Recognition and machine word The communication of sound identification, so that the voice conversion of identical semanteme can have the basis of Computer Vision Recognition between language.
Fig. 7 show a kind of figure of basic voice coding figure in the audio exchange method of language semantic of the embodiment of the present invention Shape structural schematic diagram.As shown in the part a of Fig. 7, graphic structure includes the basic framework 01 of a H-shaped, and basic framework includes simultaneously Column further include both ends point in vertical parallel the first adaptation column 10 (bar paten) and the second adaptation column 20 (bar paten) The adapter rod 30 (bar paten) not connect with the first adaptation column and the second adaptation column.
The first adaptation hyte 11 is provided on first adaptation column (being left side in figure), second is adapted on column (being right side in figure) It is provided with the second adaptation hyte 21, third adaptation hyte 31 is provided in adapter rod 30, the end of adapter rod 30 is connected to correspondence Side is adapted on the adaptation position of column, is adapted in the adaptation hyte of column and is included at least three adaptation positions (what is provided in attached drawing is 5).
Adjacent adaptation position is used to adjust the length of adaptation column in same adaptation hyte, is overlapped to form adaptation by being adapted to position The specific adjusted of column, so that accordingly the length of adaptation column forms corresponding change, the adaptation position that can be overlapped includes at least two. The end of adapter rod 30 can connect on the coincidence adaptation position of corresponding side adaptation column.
In practical applications, the syllable that the phoneme encoding or phoneme that can will form the pronunciation syllable of individual character or word are formed Coding is reflected in the first adaptation column, second accommodates in the connection change in shape of adapter rod, utilizes the fixation position of adaptation position and suitable The coincidence variation of coordination forms the encoded content of enough permutation and combination reflection syllables.
As shown in the part b of Fig. 7 and the part c, it can also include in an embodiment of the present invention and be adapted to the auxiliary of position connection Help adaptation symbol 40, auxiliary adaptation symbol 40 include have direction vector vector line segment 41 and not direction vector standard symbol Numbers 42.Vector line segment 41 can be line segment or minor arc, and standard symbol 42, which can be round or annular, vector line segment, can one Or it is multiple, standard symbol can have one or more.
In practical applications additional vector line segment and standard symbol to can will be relevant with syllable after the connection of adaptation position The additional audio features such as intonation, the tone increase the information load of syllable coding in conjunction with syllable coding.
In practical application, such as Chinese, as shown in the part b of Fig. 7 and the part c, the part b is individual character " rear " and " time " The correspondence figure of voice coding, the part c are the correspondence figure of individual character " mouth " and " bandit " voice coding, the pronunciation of above-mentioned each individual character The breeder mother of syllable shows the fit structure of the length variation and vector line segment 41 of the first adaptation column on the left of basic framework, simple or compound vowel of a Chinese syllable Show the fit structure of the length variation and vector line segment 41 and standard symbol 42 of the second adaptation column on the right side of basic framework.Base This frame is adapted to the smoothed processing of symbol with auxiliary and can not only keep figure beautiful but also can guarantee Computer Vision Recognition quality.
Adaptation position and adapter rod 30 and the link position that is adapted to position, basic subrack as shown in the part d of Fig. 7, using coincidence Frame 01 can be converted to n shape from H-shaped, as shown in the part e of Fig. 7, using coincidence adaptation position and adapter rod 30 be adapted to position Link position, basic framework 01 can be converted to U-shaped from H-shaped.
As shown in the part d of Fig. 7, the first, second adaptation column around basic framework (H-shaped, n shape or U-shaped) is directly marked The coding of minimum phoneme, encodes digital number and the adaptation position of corresponding adaptation column is corresponding.Utilize sound minimum in a kind of linguistic syllables The direct coding of element is shown, phonographic alphabet-phoneme encoding-voice of language is directly done visual expression, so that macaronic Computer vision conversion may be implemented in basic voice coding figure, while voice conversion, is identified and is protected using computer graphical Demonstrate,prove the discrimination of language identification.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.

Claims (7)

1. a kind of audio exchange method of language semantic, which is characterized in that form the voice of each language using minimum aligned phoneme sequence Mapping structure is completed to convert between semantic language by each voice mapping structure;Wherein
The voice mapping structure for forming each language using minimum aligned phoneme sequence includes: all minimum phonemes of serializing;Pass through The subset of all minimum phonemes maps data between forming the text-to-speech of each language;It is formed by language semantic described Data are mapped between the voice of each language;And
Conversion includes: to map number using between the corresponding voice between the language for completing semanteme by each voice mapping structure Semantic language conversion is formed according to data are mapped between the text-to-speech;Wherein
Described using mapping between data and text-to-speech that mapping data form semantic language conversion between corresponding voice includes: benefit The sequence set of phonemes of the audio input segment of first language is obtained with speech recognition;Utilize the first basic voice of first language Coded sequence determines the first basic voice coding of sequence set of phonemes;Utilize the first voice mapping structure of first language and One basic voice coding sequence determines the continuous speech coding of sequence set of phonemes;It is converted using the voice primary between corresponding language Second basic voice coding of structure acquisition second language;Substantially using the advanced transformational structure of voice and second between corresponding language Voice coding sequence obtains the continuous speech coding of second language;It is encoded to form voice hair according to the continuous speech of second language Sound.
2. the audio exchange method of language semantic according to claim 1, which is characterized in that all minimums of serializing Phoneme includes:
The minimum phoneme of each all-purpose language is acquired by speech recognition;
The minimum phoneme is formed into unified aligned phoneme sequence.
3. the audio exchange method of language semantic according to claim 2, which is characterized in that it is described by it is described it is all most Mapping data include: between the subset of small phoneme forms the text-to-speech of each language
Corresponding with the pronunciation of individual character in first language or word the is formed using a part of phoneme in the unified aligned phoneme sequence One basic voice coding sequence;
The first voice corresponding with phrase in first language or sentence pronunciation is formed using the described first basic voice coding sequence Mapping structure;
It is basic using the second of individual character in another part phoneme formation second language in the unified aligned phoneme sequence or pronunciation of words Voice coding sequence;
The second voice corresponding with phrase in second language or sentence pronunciation is formed using the described second basic voice coding sequence Mapping structure.
4. the audio exchange method of language semantic according to claim 3, which is characterized in that described to pass through language semantic shape Include: at data are mapped between the voice of each language
It is formed using same or similar semantic information by the voice mapping structure of the first language and the second language Voice primary transformational structure between corresponding language;
The voice between the first language and the voice mapping structure of the second language is formed using the syntax rule of each language Advanced transformational structure.
5. the audio exchange method of language semantic according to claim 1, which is characterized in that the minimum aligned phoneme sequence is adopted It is established and is indexed with the block code of hundreds value range or kilobit numberical range.
6. a kind of audio exchange system of language semantic characterized by comprising
Memory, the program code of the audio exchange method for storing language semantic as claimed in claim 1 to 5;
Processor, for running said program code.
7. a kind of audio exchange system of language semantic, the voice for forming each language using minimum aligned phoneme sequence maps knot Structure is completed to convert between semantic language by each voice mapping structure, comprising:
Device is serialized, for serializing all minimum phonemes;
Phoneme maps to form device in language, reflects between the text-to-speech for forming each language by the subset of all minimum phonemes Penetrate data;
Phoneme maps to form device between language, maps data between the voice for forming each language by language semantic;
Language converting device, for using mapping data form semantic language between mapping data and text-to-speech between corresponding voice Speech conversion;Wherein
The language converting device includes:
Phoneme recognition module, the sequence set of phonemes of the audio input segment for obtaining first language using speech recognition;
First basic coding identification module, for determining sequence phone set using the first basic voice coding sequence of first language The the first basic voice coding closed;
First continuous speech coding module, for the first voice mapping structure and the first basic voice coding using first language Sequence determines the continuous speech coding of sequence set of phonemes;
Second basic coding identification module, for obtaining the of second language using the voice primary transformational structure between corresponding language Two basic voice codings;
Second continuous speech coding module, for utilizing the advanced transformational structure of voice and the second basic voice coder between corresponding language Code sequence obtains the continuous speech coding of second language;
Continuous programming code conversion module encodes to form sound pronunciation for the continuous speech according to second language.
CN201810264460.3A 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic Active CN108597493B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910143693.2A CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method
CN201810264460.3A CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic
PCT/CN2019/079834 WO2019184942A1 (en) 2018-03-28 2019-03-27 Audio exchanging method and system employing linguistic semantics, and coding graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810264460.3A CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910143693.2A Division CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method

Publications (2)

Publication Number Publication Date
CN108597493A CN108597493A (en) 2018-09-28
CN108597493B true CN108597493B (en) 2019-04-12

Family

ID=63624812

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910143693.2A Active CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method
CN201810264460.3A Active CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910143693.2A Active CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method

Country Status (2)

Country Link
CN (2) CN109754780B (en)
WO (1) WO2019184942A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754780B (en) * 2018-03-28 2020-08-04 孔繁泽 Basic speech coding graphics and audio exchange method
CN110991148B (en) * 2019-12-03 2024-02-09 孔繁泽 Information processing method and device, information interaction method and device
CN114171013A (en) * 2021-12-31 2022-03-11 西安讯飞超脑信息科技有限公司 Voice recognition method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219391B2 (en) * 2005-02-15 2012-07-10 Raytheon Bbn Technologies Corp. Speech analyzing system with speech codebook
US7840399B2 (en) * 2005-04-07 2010-11-23 Nokia Corporation Method, device, and computer program product for multi-lingual speech recognition
US7912716B2 (en) * 2005-10-06 2011-03-22 Sony Online Entertainment Llc Generating words and names using N-grams of phonemes
CN101131689B (en) * 2006-08-22 2010-08-18 苗玉水 Bidirectional mechanical translation method for sentence pattern conversion between Chinese language and foreign language
KR20080046552A (en) * 2006-11-22 2008-05-27 가구모토 주니치 Print having speech code, method and device for reappearing record, and commerce mode
WO2012061588A2 (en) * 2010-11-04 2012-05-10 Legendum Pro Vita, Llc Methods and systems for transcribing or transliterating to an iconophonological orthography
US10269353B2 (en) * 2016-08-30 2019-04-23 Tata Consultancy Services Limited System and method for transcription of spoken words using multilingual mismatched crowd unfamiliar with a spoken language
CN109754780B (en) * 2018-03-28 2020-08-04 孔繁泽 Basic speech coding graphics and audio exchange method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system

Also Published As

Publication number Publication date
CN108597493A (en) 2018-09-28
CN109754780A (en) 2019-05-14
CN109754780B (en) 2020-08-04
WO2019184942A1 (en) 2019-10-03

Similar Documents

Publication Publication Date Title
CN108447486B (en) Voice translation method and device
CN111276120B (en) Speech synthesis method, apparatus and computer-readable storage medium
EP2958105B1 (en) Method and apparatus for speech synthesis based on large corpus
WO2020215551A1 (en) Chinese speech synthesizing method, apparatus and device, storage medium
US12033612B2 (en) Speech synthesis method and apparatus, and readable storage medium
KR100391243B1 (en) System and method for generating and using context dependent sub-syllable models to recognize a tonal language
US11488577B2 (en) Training method and apparatus for a speech synthesis model, and storage medium
CN108597493B (en) The audio exchange method and audio exchange system of language semantic
KR102625184B1 (en) Speech synthesis training to create unique speech sounds
US6188984B1 (en) Method and system for syllable parsing
EP4029010B1 (en) Neural text-to-speech synthesis with multi-level context features
WO2022134164A1 (en) Translation method, apparatus and device, and storage medium
CN110335608A (en) Voice print verification method, apparatus, equipment and storage medium
KR102639322B1 (en) Voice synthesis system and method capable of duplicating tone and prosody styles in real time
CN114255737B (en) Voice generation method and device and electronic equipment
CN116386594A (en) Speech synthesis method, speech synthesis device, electronic device, and storage medium
CN113450758B (en) Speech synthesis method, apparatus, device and medium
CN115206284B (en) Model training method, device, server and medium
CN105895075B (en) Improve the method and system of synthesis phonetic-rhythm naturalness
CN113160849B (en) Singing voice synthesizing method, singing voice synthesizing device, electronic equipment and computer readable storage medium
CN113488010B (en) Music data generation method, device, equipment and storage medium
CN114373445B (en) Voice generation method and device, electronic equipment and storage medium
CN116469371A (en) Speech synthesis method and device, electronic equipment and storage medium
Sehgal et al. Speech Processing
CN115620702A (en) Speech synthesis method, speech synthesis device, electronic apparatus, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1261697

Country of ref document: HK