WO2024051729A1 - Transliteration method and electronic device - Google Patents

Transliteration method and electronic device Download PDF

Info

Publication number
WO2024051729A1
WO2024051729A1 PCT/CN2023/117202 CN2023117202W WO2024051729A1 WO 2024051729 A1 WO2024051729 A1 WO 2024051729A1 CN 2023117202 W CN2023117202 W CN 2023117202W WO 2024051729 A1 WO2024051729 A1 WO 2024051729A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
transliteration
character
electronic device
user
Prior art date
Application number
PCT/CN2023/117202
Other languages
French (fr)
Chinese (zh)
Inventor
丁建邦
凌雪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024051729A1 publication Critical patent/WO2024051729A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application relates to the field of computer technology, and in particular to a transliteration method and electronic equipment.
  • Transliteration refers to searching for information with similar pronunciation in the target language based on the pronunciation of the source language information for alternative translation, for example, using Chinese characters with similar pronunciation to translate into English. Transliteration is usually used for the translation of information such as names of people, places, countries, imported words, names of literary works, movies, music and other works.
  • the electronic devices can often only translate sound by sound and verse by verse.
  • the transliteration results returned to the user are single and of low quality, and may not be able to meet the user's transliteration needs.
  • This application discloses a transliteration method and electronic device, which can improve the quality of transliteration results, and can return multiple transliteration results at one time for users' reference, effectively meeting the users' transliteration needs.
  • the present application provides a transliteration method for use in electronic devices.
  • the method includes: receiving first information in a first language input by a user; transliterating the first information and obtaining a plurality of first information in a second language.
  • Two information, the plurality of second information includes third information and fourth information, the third information and the fourth information have different lengths; the plurality of second information is displayed.
  • the first information is "harmony”
  • the plurality of second information includes "Hongmeng” and “Harmony” with a length of 2, and “Harmony” with a length of 3.
  • the electronic device when the electronic device transliterates the first information, it can output multiple transliteration results (ie, multiple second information) with different lengths at one time for the user's reference. Therefore, the range of transliteration results available for the user to choose is greatly increased. , increasing the probability that users can obtain the required transliteration results, and effectively meeting users' transliteration needs.
  • the first information is a company name, a brand name, a trademark name, a product name, a person's name, a place name, a country name, a borrowed word, the name of a literary work, the name of a movie, the name of music, or a transliteration hot word.
  • the above type of first information is different from commonly translated sentences.
  • commonly translated sentences if it is transliterated sound by sound and section by section and a single transliteration result is returned, it is likely to meet the needs of the user (because this is often the case with manual transliteration ), but for the above-mentioned types of first information, artificial transliteration usually uses transliteration techniques such as homophone conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and final consonant omission, and under different users and/or different scenarios, even the same
  • the first information is that the transliteration results required by the user may also be different.
  • the electronic device outputs multiple transliteration results of different lengths at one time for users to choose, which can meet the personalized needs of different users and/or different scenarios and improve user experience.
  • the method further includes: receiving the fifth information in the third language input by the user; performing a literal translation or free translation on the fifth information and obtaining the sixth information in the fourth language;
  • the sixth information is transliterated and at least one seventh information in the third language is obtained; and the at least one seventh information is displayed.
  • the fifth message is “gene knee tie may”
  • the sixth message is “gene knee tie may”
  • at least one seventh message includes “chicken you are so beautiful”.
  • the electronic device can first perform literal translation or free translation of the fifth information, and then transliterate the translation result, which can be well adapted to some specific scenarios (for example, the fifth information is a hot word in transliteration) and satisfy the user's personality. needs.
  • the method before displaying the plurality of second information, the method further includes: determining whether the second information includes characters in the blacklist; when the second information includes characters in the blacklist when the first character in the second information is replaced with a second character in the white list, the second character is a pronunciation similar to the first character in the white list Characters whose degree is greater than or equal to the first threshold.
  • the whitelist includes auspicious characters with good meanings (such as " ⁇ ", " ⁇ ") and characters that can actually be used for transliteration, such as company names, brand names, trademark names, product names, people's names, place names, country names, Borrowed words, names of literary works, names of movies, names of music and hot transliterated words.
  • the blacklist includes unlucky characters with bad connotations (such as " ⁇ " and " ⁇ ") and characters that are not actually used for transliteration.
  • the electronic device can use the characters in the whitelist to replace the characters in the second message that belong to the blacklist. For example, use auspicious characters to replace unlucky characters.
  • the content of the transliteration result is highly controllable and more in line with the habits of manual transliteration. , to further improve the quality of transliteration results.
  • the method further includes: receiving a first instruction input by the user, the first instruction being used to indicate that the first character in the transliteration result of the first information is a third character; The plurality of second information is determined based on the first instruction, and the first character in the second information is the third character.
  • the user can customize the first character of the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
  • the method further includes: receiving a second instruction input by the user, the second instruction being used to indicate that the last character in the transliteration result of the first information is a fourth character;
  • the plurality of second information is determined based on the second instruction, and the last character in the second information is the fourth character.
  • the user can customize the suffix of the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
  • the method further includes: receiving a third instruction input by the user, the third instruction being used to indicate that the transliteration result corresponding to the first information includes a fifth character; the plurality of third instructions The second information is determined based on the third instruction, and the second information includes the fifth character.
  • the user can customize the words included in the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
  • transliterating the first information and obtaining a plurality of second information in a second language includes: transliterating the first information and obtaining a plurality of second information in the second language.
  • Eight information replace the sixth character in the eighth information with the fifth character indicated by the third instruction, the second information is the replaced eighth information, and the sixth character is The character in the eighth information that has the greatest pronunciation similarity with the fifth character.
  • the electronic device can replace the sixth character in the transliterated eighth information that is most similar in pronunciation to the user-specified character with the user-specified character, instead of replacing the character in the preset position with the user-specified character. While personalized needs are ensured, the quality of transliteration results is ensured and user experience is improved.
  • the method further includes: receiving the ninth information in the fifth language input by the user, receiving the first length input by the user; transliterating the ninth information and obtaining at least the sixth information in the sixth language.
  • a tenth information, the length of the tenth information is the first length; display the at least one tenth information.
  • the user can customize the length of the transliteration result, and the length of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
  • transliterating the first information and obtaining a plurality of second information in a second language includes: using a pronunciation embedding layer to map the first information into eleventh information, The distance between the eleventh information and the twelfth information is greater than the distance between the eleventh information and the thirteenth information, and the pronunciation similarity between the eleventh information and the twelfth information is greater than that of the tenth information.
  • the pronunciation similarity between the first information and the thirteenth information; the eleventh information is used as the input of the transliteration model to obtain the output, and the output is the plurality of second information.
  • the pronunciation embedding layer is trained based on multiple sentences, the multiple sentences include a first sentence and a second sentence, and both the first sentence and the second sentence include N words, the pronunciation similarity between the i-th word in the first sentence and the i-th word in the second sentence is less than or equal to the second threshold, N is a positive integer, and i is less than or equal to N Positive integer.
  • the eleventh information is a high-dimensional vector.
  • the pronunciation embedding layer is trained based on multiple sentences with similar and/or identical pronunciation.
  • the distance between the information mapped by the pronunciation embedding layer is determined based on pronunciation similarity rather than semantic similarity.
  • the difference between transliteration needs and literal translation and/or free translation needs is fully taken into account, and the input of the transliteration model is information mapped by the pronunciation embedding layer. Therefore, it greatly facilitates the transliteration model to learn pronunciation splitting and combination rules, allowing the transliteration model to It can fully capture transliteration techniques such as homophonic conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and final consonant omission, and reduce the quality gap between automatic transliteration and manual transliteration.
  • embodiments of the present application provide an electronic device, including a transceiver, a processor, and a memory; the memory is used to store a computer program, and the processor calls the computer program to execute the first aspect of the embodiment of the present application. and the transliteration method provided by any implementation of the first aspect.
  • embodiments of the present application provide a computer storage medium.
  • the computer storage medium stores a computer program.
  • the computer program is executed by a processor, the computer program is used to perform the first aspect of the embodiments of the present application and any of the first aspects.
  • a transliteration method provided by an implementation.
  • embodiments of the present application provide a computer program product, which when the computer program product is run on an electronic device, causes The electronic device is configured to execute the transliteration method provided by the first aspect of the embodiment of the present application and any implementation manner of the first aspect.
  • embodiments of the present application provide an electronic device, which includes executing the method or device described in any embodiment of the present application.
  • the above-mentioned electronic device is, for example, a chip.
  • Figure 1 is a schematic diagram of the hardware structure of an electronic device provided by this application.
  • FIG. 2 is a schematic diagram of the software architecture of an electronic device provided by this application.
  • FIG. 3 is a schematic diagram of the software architecture of another electronic device provided by this application.
  • Figure 4 is a schematic flow chart of a transliteration method provided by this application.
  • Figure 5A is a schematic diagram of the training process of a pronunciation embedding layer provided by this application.
  • Figure 5B is a schematic diagram of training data for a pronunciation embedding layer provided by this application.
  • Figure 6 is a schematic diagram of a high-dimensional space provided by this application.
  • FIG. 7 is a schematic diagram of the software architecture of another electronic device provided by this application.
  • FIGS 8, 9A-9C, 10, and 11 are schematic diagrams of some user interface embodiments provided by this application.
  • first and second are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.
  • the implementation of automatic transliteration technology is cumbersome, and one transliteration requires a large number of feature conversion processes.
  • the process of transliterating English words into Chinese characters includes: English words -> phoneme sequence -> initial consonant and final sequence -> Chinese Pinyin -> Chinese characters.
  • automatic transliteration technology can only translate sound by sound and verse by verse, and return a single, low-quality transliteration result. It cannot use transliteration techniques such as homophone conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and tail consonant omission, and the quality is far lower. Due to manual transliteration, it is likely that it cannot meet the transliteration needs of users.
  • This application provides a transliteration method that can effectively meet the transliteration needs of users.
  • the method includes: the electronic device can transliterate the source language information through a deep learning model, and obtain one or more transliteration results in the target language and return them to the user.
  • the lengths of the one or more transliteration results are different.
  • the range of transliteration results available for users to choose has been greatly expanded, increasing the probability that users will obtain the desired transliteration results.
  • the deep learning model can be obtained by implicitly learning the phoneme combination and splitting rules in the transliteration process based on big data. Therefore, the transliteration skills used in actual transliteration can be fully learned and the quality of automatic transliteration can be improved.
  • this deep learning model implements end-to-end feature conversion, that is, source language information -> high-dimensional features -> target language information, eliminating the cumbersome feature conversion process and making it more efficient.
  • This application can be understood as realizing data-driven transliteration technology.
  • the electronic device can automatically correct the transliteration results obtained by the deep learning model, for example, replacing unlucky characters with auspicious characters to further improve the quality of transliteration.
  • the above-mentioned deep learning model can obtain matching transliteration results based on user instructions.
  • the user can customize at least one of the length, first word, last word, and inclusive word of the transliteration result, which can be understood as User-oriented transliteration technology has been realized, and the transliteration functions provided by electronic devices are more abundant, effectively meeting the personalized needs of users and improving user experience.
  • the electronic device may be a mobile phone, a tablet computer, a handheld computer, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, or a personal digital assistant (personal digital assistant).
  • assistant PDA
  • smart home devices such as smart TVs and smart cameras
  • wearable devices such as smart bracelets, smart watches, and smart glasses
  • augmented reality (AR), virtual reality (VR), and mixed reality Extended reality (XR) devices such as (mixed reality, MR), vehicle-mounted devices or smart city devices.
  • AR augmented reality
  • VR virtual reality
  • XR mixed reality Extended reality
  • FIG. 1 exemplarily shows a schematic diagram of the hardware structure of an electronic device 100 .
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (SIM) card interface 195, etc.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the charging management module 140 is used to receive charging input from the charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, the wireless communication module 160, and the like.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G/6G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellites. system (global navigation Satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • WiFi wireless fidelity
  • Bluetooth bluetooth, BT
  • global navigation satellites global navigation Satellite system
  • GNSS global navigation Satellite system
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication technology
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the above-mentioned wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi) -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the optical signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, etc. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In one implementation, the ISP may be provided in the camera 193.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • an external memory card such as a Micro SD card
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals.
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • Microphone 170C also called “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the headphone interface 170D is used to connect wired headphones.
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensing The device 180A may be disposed on the display screen 194.
  • pressure sensors 180A such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc.
  • a capacitive pressure sensor may include at least two parallel plates of conductive material.
  • the electronic device 100 determines the intensity of the pressure based on the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • the gyro sensor 180B may be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • Air pressure sensor 180C is used to measure air pressure.
  • Magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may utilize the magnetic sensor 180D to detect opening and closing of the flip holster.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes).
  • Distance sensor 180F for measuring distance.
  • Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outwardly through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • Fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.
  • Temperature sensor 180J is used to detect temperature.
  • Touch sensor 180K also known as "touch device”.
  • the touch sensor 180K can be disposed on the display screen 194.
  • the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 in a position different from that of the display screen 194 .
  • Bone conduction sensor 180M can acquire vibration signals.
  • the buttons 190 include a power button, a volume button, etc.
  • the motor 191 can generate vibration prompts.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the layered architecture software system can be the Android system, the Harmony operating system (operating system, OS), or other software systems.
  • the embodiment of this application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
  • FIG. 2 exemplarily shows a schematic diagram of the software architecture of the electronic device 100 .
  • the layered architecture divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime and system libraries, and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include camera, calendar, music, gallery, short message, call, navigation, translation, browser and other applications.
  • the application package in this application can also be replaced by other forms of software such as applets.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications.
  • the data may include videos, images, Audio, calls made and received, browsing history and bookmarks, phone book and more.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 .
  • call status management including connected, hung up, etc.
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the following exemplifies the workflow of the software and hardware of the electronic device 100 in conjunction with capturing the photographing scene.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation and the control corresponding to the click operation as a camera application icon control as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. Camera 193 captures still images or video.
  • FIG. 3 exemplarily shows a schematic diagram of the software architecture of yet another electronic device 100 .
  • the electronic device 100 may include a pronunciation embedding layer 200 , a transliteration model 300 , and a transliteration knowledge base 400 .
  • the pronunciation embedding layer 200 can be understood as a high-dimensional matrix.
  • the pronunciation embedding layer 200 can receive source language information as input, and map the source language information to a high-dimensional vector in the high-dimensional matrix (which can be called source language feature information). (As output), for example, assuming that the pronunciation embedding layer 200 is a 2 million ⁇ 300 matrix, the source language information input to the pronunciation embedding layer 200 includes character 1 and character 2, and the pronunciation embedding layer 200 can map character 1 and character 2 respectively.
  • the transliteration model 300 can receive the source language feature information output by the pronunciation embedding layer 200 as input, and output one or more transliteration results of the target language (which may be called the transliteration set 1 of the target language).
  • the electronic device receives To a user instruction, which is, for example, but not limited to, used to set at least one of the following: the length, first word, last word, and containing word of the transliteration result, the transliteration model 300 can obtain one or more matching transliteration results based on the user instruction.
  • the transliteration model 300 can encode the source language feature information output by the pronunciation embedding layer 200, and then decode it based on the user instruction to obtain one or more transliteration results.
  • the electronic device 100 may also include a transliteration knowledge base 400.
  • the transliteration knowledge base 400 may include transliteration characters in big data, such as but not limited to well-known transliteration brands, trademarks, products, functions, names of people, and names of places. , country names, popular vocabulary, etc.
  • the electronic device 100 can use the transliteration knowledge base 400 to modify one or more transliteration results output by the transliteration model 300, for example, using characters in the transliteration knowledge base 400 to replace characters with the same or similar pronunciation in the transliteration results output by the transliteration model 300, and One or more corrected transliteration results (which can be called the transliteration set 2 of the target language) are obtained.
  • FIG. 4 is a schematic flow chart of a transliteration method provided by an embodiment of the present application.
  • This method can be applied to the electronic device 100 shown in FIG. 1 .
  • This method can be applied to the electronic device 100 shown in FIG. 2 .
  • This method can be applied to the electronic device 100 shown in FIG. 3 .
  • the method may include but is not limited to the following steps:
  • S101 The electronic device obtains source language information.
  • the electronic device receives source language information input by the user, and the form of the source language information includes, but is not limited to, characters, words, or sentences.
  • the electronic device uses the pronunciation embedding layer to obtain the source language feature information corresponding to the source language information.
  • the pronunciation embedding layer can be trained based on a large amount of data with similar/identical pronunciation.
  • An example of the training process can be seen in Figure 5A below, which will not be described in detail for now.
  • the pronunciation embedding layer can be used to map input information into high-dimensional feature information/high-dimensional vectors. The position of the feature information mapped by the pronunciation embedding layer in the high-dimensional space is determined based on the pronunciation ( For example, it can be understood as clustering based on the pronunciation similarity of the input information).
  • the pronunciation similarity of input information 1 corresponding to feature information 1 and input information 2 corresponding to feature information 2 is greater than that of input information 1 and The pronunciation similarity of the input information 3 corresponding to feature information 3. Therefore, in the high-dimensional space, the distance between feature information 1 and feature information 2 is smaller than the distance between feature information 1 and feature information 3.
  • Figure 6 See Figure 6 below. Not going into details.
  • the electronic device can use the pronunciation embedding layer to map the source language information into high-dimensional source language feature information. Assuming that the source language information is represented by X, the pronunciation embedding layer maps the source language feature information corresponding to X. It can be expressed as E(X).
  • the electronic device can also use the pronunciation embedding layer to map the preset length information into high-dimensional length feature information, and the length feature information can be used to predict the length of the transliteration result. Assuming that the length information is expressed as LEN, the length feature information corresponding to LEN mapped by the pronunciation embedding layer can be expressed as E(LEN).
  • the following embodiment takes the source language feature information as E(X) and the length feature information as E(LEN) as an example.
  • the electronic device uses the source language feature information as the input of the transliteration model to obtain an output (ie, the transliteration set 1 of the target language).
  • the transliteration model is a semi-autoregressive model.
  • the semi-autoregressive model is different from the autoregressive model and the non-autoregressive model.
  • the non-autoregressive model needs to first predict the length of the transliteration result (which can be referred to as the predicted length), and then decode the predicted length in one go and in parallel.
  • the transliteration results in the target language for example, when the non-autoregressive model transliterates "harmony", it can first predict that the length of the transliteration result is 2, and then decode "Hongmo" at once, but usually the transliteration quality is not high.
  • the autoregressive model When the autoregressive model transliterates, it does not need to predict the length of the transliteration result. Instead, it decodes each character/word in the transliteration result in sequence. For example, when the autoregressive model transliterates "harmony”, it will decode “harmony” from left to right. "Ha”, “Mo”, “Ni”, but the generation length is uncontrollable.
  • the semi-autoregressive model can predict the length of the transliteration result, and then decode it in parallel. The decoding process iterates multiple times and outputs the transliteration result of the target language with the predicted length. For example, when the semi-autoregressive model transliterates "harmony”, it can first predict The length of the transliteration result obtained is 2.
  • the transliteration model is based on a Transformer model (such as a semi-autoregressive translation model).
  • the electronic device can use the source language feature information as the input of the transliteration model to obtain an output: one or more transliteration results of the target language, that is, the transliteration set 1 of the target language.
  • the electronic device can use the source language feature information and the length feature information together as input to the transliteration model.
  • the transliteration model can predict the length of the transliteration result corresponding to the source language feature information based on the length feature information (which can be referred to as predicted length for short). ), the number of prediction lengths can be one or more.
  • the length of one or more transliteration results output by the transliteration model falls within the above predicted length. For example, if the source language information is "harmony", the predicted lengths obtained by the transliteration model include 2 and 3.
  • the transliteration model can output two transliteration results of length 2 "Harmony” and "Hongmeng", and one transliteration result of length 3 The result is "Harmony".
  • the transliteration model can encode the input information and obtain an encoded vector (such as a latent vector), and then decode the encoded vector and obtain one or more transliteration results in the target language, where:
  • the transliteration model can encode the source language feature information E(X) and obtain the source language encoding information.
  • the source language encoding information can be expressed as H(X), and H(X) can be used to decode to obtain the output.
  • the transliteration model can encode the length feature information E(LEN) and obtain the length encoding information, and the length encoding information can be expressed as H(LEN).
  • the transliteration model can generate the best K prediction lengths based on H(LEN), where K is a positive integer.
  • the above-mentioned best K prediction lengths are, for example, the prediction lengths ranked in the top K positions according to the predicted scores. Any prediction length corresponds to The score of can represent the probability that the length of the transliteration result is the predicted length. The greater the probability, the higher the score.
  • K is a settable parameter.
  • the electronic device sets K to 2 by default, or the electronic device can set K to a number input by the user in response to a user operation.
  • the transliteration model can decode the source language encoding information H(X) based on the above K prediction lengths to obtain a Or multiple lengths belong to the transliteration results of these K predicted lengths.
  • the transliteration model can decode the source language encoding information H(X) in combination with the attention score, where the attention score is obtained based on the attention mechanism, which can be used to alleviate long-term problems in natural language processing. For the problem of distance dependence, decoding combined with the attention mechanism can improve the quality of transliteration.
  • the decoding process of the transliteration model can be iterated T times, where T is a positive integer greater than 1, which greatly improves the quality of the decoding results.
  • T is a settable parameter.
  • the electronic device sets T to 2 by default, or the electronic device can set T to a number input by the user in response to a user operation.
  • the method further includes: the electronic device receives a user instruction, and the user instruction is used to indicate the length of the transliteration result. Therefore, in S103, the transliteration model can be based on the The user instruction generates one or more transliteration results of the length indicated by the user instruction. It can be understood that when the user does not indicate the length of the transliteration results, the transliteration set 1 output by the transliteration model includes transliteration results of the predicted length; when the user indicates the length of the transliteration results, the transliteration set 1 output by the transliteration model includes the transliteration results of the length indicated by the user.
  • the transliteration results can meet the different needs of users in different scenarios and improve user experience.
  • the method before S103, the method further includes: the electronic device receives a user instruction, the user instruction is used to indicate the first word, last word and/or containing word of the transliteration result. Therefore, in S103, the transliteration model can One or more matching transliteration results are generated based on the user instruction. Specific examples are as follows:
  • the user instruction is used to indicate that the transliteration result includes character 1, then the transliteration results in the transliteration set 1 all include character 1.
  • the user instruction is used to indicate that the first character of the transliteration result is character 2, then the first character of the transliteration results in transliteration set 1 is character 2.
  • the user instruction is used to indicate that the last character of the transliteration result is character 3, then the last character of the transliteration results in the transliteration set 11 is character 3.
  • the decoding process of the transliteration model is exemplarily shown.
  • the following example takes the decoding process of a transliteration result (which can be called a target transliteration sequence) in transliteration set 1 as an example.
  • the decoding process of other transliteration results in transliteration set 1 is similar.
  • Masking can be done by masking all the characters in the target transliteration sequence. Taking the special character [mask] as a placeholder for all the characters in the target transliteration sequence as an example, the target transliteration that is masked in the first iteration is sequence It can be expressed as the following formula (1):
  • N is the length of the target transliteration sequence.
  • N is the length of the transliteration result indicated by the user instruction received by the electronic device, or N is any one of the above K predicted lengths.
  • Prediction can be based on the given source language information Predict each masked position in X, that is, predict the character at that position in the transliteration result corresponding to X.
  • Predicted characters and the corresponding confidence level It can be expressed as the following formula (2) and the following formula (3):
  • is a variable used to represent confidence. It can be the maximum confidence corresponding to the character at position yi in the transliteration result corresponding to X when the first iterative prediction is given the source language information X. It can be the character at position yi and corresponding to the highest confidence level in the transliteration result corresponding to X predicted by the first iteration given the source language information X. The corresponding confidence level
  • the character at the i-th position in the target transliteration sequence in the first iteration can be characterized as The probability.
  • the predicted target transliteration sequence can be expressed as The corresponding confidence level can be expressed as
  • the electronic device When the electronic device does not receive the user instruction, it can be understood that the user instruction is empty. Therefore, the electronic device does not need to perform the replacement step. It can be understood that the sequence is replaced in the first iteration.
  • the electronic device receives a user instruction, it can parse the user instruction. If the user instruction is used to indicate the first word, last word and/or containing word of the transliteration result, the electronic device can perform the replacement step, that is, use the user instruction to indicate the first word, last word and/or containing word of the transliteration result.
  • Target transliteration sequence predicted by character replacement The corresponding characters in can, but are not limited to, include the following three situations:
  • the user instruction indicates that the first character of the transliteration result is z 1 .
  • the electronic device can replace the first character in the predicted target transliteration sequence with z 1 Therefore, the sequence that is replaced on the first iteration It can be expressed as the following formula (4):
  • the user instruction indicates that the suffix of the transliteration result is z 2 .
  • the electronic device can replace the Nth character in the predicted target transliteration sequence with z 2
  • the sequence to be replaced on the first iteration It can be expressed as the following formula (6):
  • Case 3 The user instruction indicates that the transliteration result contains the character z 3 .
  • the electronic device can first calculate the predicted target transliteration sequence Each character in has a pronunciation similarity with z 3 , and then uses z 3 to replace the character with the highest pronunciation similarity with z 3. Therefore, the sequence that is replaced in the first iteration It can be expressed as the following formula (8):
  • E( ⁇ ) can represent the mapping through the pronunciation embedding layer, therefore, for The high-dimensional vector obtained by mapping the pronunciation embedding layer, E(z 3 ) is the high-dimensional vector obtained by mapping z 3 through the pronunciation embedding layer.
  • Sim( ⁇ ) can represent the calculated similarity, for example, but is not limited to, used to calculate Euclidean distance, cosine similarity or Pearson correlation coefficient, etc. Therefore, s i can represent the pronunciation similarity of y i and z 3 .
  • Re-masking can be the target transliteration sequence obtained in the first iteration Mask some characters in , the following example is used to mask them characters (excluding characters in ) mask as an example to illustrate.
  • the confidence with which the electronic device can predict the first iteration Arrange in order from small to large, with confidence in Y ( 1) ranked first Bit characters are masked, and the masked sequence is used in the second iteration It can be expressed as the following formula (10):
  • y i is the i-th position occupied by the special character [mask], i is less than or equal to positive integer.
  • Electronic equipment is not correct
  • the character mask in Not included characters in for example, assuming that the value of confidence is [0,1], the electronic device can The confidence level corresponding to the characters in is set to 1.
  • the re-prediction can be the unmasked sequence given the source language information X and the second iteration Under the conditions, for Predict each masked position in X, that is, predict the character at that position in the transliteration result corresponding to X.
  • Predict each masked position in X that is, predict the character at that position in the transliteration result corresponding to X.
  • Predicted characters and the corresponding confidence level It can be expressed as the following formulas (11) and (12):
  • Y is the target transliteration sequence of length N.
  • is a variable used to represent confidence. can be given for the source language information X and Under the conditions, when the second iteration predicts the character at position yi in the transliteration result corresponding to X, the maximum confidence corresponding to the character. can be given for the source language information X and Under the condition of , in the transliteration result corresponding to The corresponding confidence level It can be characterized that the i-th character in the target transliteration sequence during the second iteration is The probability.
  • the number of iterations T may be greater than 2.
  • the description of other iteration processes except the first iteration is similar to the description of the second iteration above and will not be described in detail.
  • each character in the target transliteration sequence is occupied by [mask].
  • the predicted target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is ( ⁇ , sai, di , Si).
  • the replacement phase of the first iteration since the predicted pronunciation similarity between the predicted y 3 (i.e. "Di") and the character "De” indicated by the user instruction is the highest, "Di" is replaced by the character "De” indicated by the user instruction.
  • the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is (Me, Sai, De, Si).
  • the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is ( ⁇ , [mask], German, [mask ]).
  • the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is ( ⁇ , [mask], German, [mask ]).
  • the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) are (Mi, Sai, De, Si), which is the target transliteration sequence obtained by this decoding process.
  • S104 The electronic device corrects the transliteration set 1 based on the transliteration knowledge base to obtain the transliteration set 2 of the target language.
  • S104 is an optional step.
  • the transliteration knowledge base may include multiple characters, words and/or sentences, such as but not limited to company names, brand names, trade names, product names, names of people, place names, country names, borrowed words, and literary works. names, movie names, music names, transliterated hot words on the Internet, etc.
  • the transliteration knowledge base may include a white list and a black list.
  • the white list includes auspicious words with good meanings (such as " ⁇ ", " ⁇ ") and characters that can actually be used for transliteration, such as But it is not limited to including company names, brand names, trademark names, product names, names of people, places, countries, borrowed words, names of literary works, names of movies, names of music, hot transliterated words on the Internet, etc.
  • the blacklist includes unlucky characters with bad connotations (such as " ⁇ " and " ⁇ ") and characters that are not actually used for transliteration.
  • the electronic device can determine whether any transliteration result in the transliteration set 1 includes characters in the blacklist. When the determination result is yes, the electronic device can replace the character 4 in the transliteration result that belongs to the blacklist. It is character 5 in the white list that has the same/similar pronunciation as character 4. When the judgment result is no, no replacement will be performed. Understandably, the transliteration result provided by the electronic device will not include characters in the blacklist of the transliteration knowledge base.
  • the transliteration set 1 includes the transliteration result "Mei”
  • the blacklist of the transliteration knowledge base includes “Me”
  • the white list includes “Mei” and " ⁇ " which have the same/similar pronunciation as “Me”
  • S105 The electronic device displays the transliteration set of the target language.
  • S105 is an optional step.
  • the electronic device when S104 is not executed, the electronic device displays the transliteration set 1 of the target language. In another implementation, after S104 is executed, the electronic device displays the transliteration set 2 of the target language.
  • users can customize at least one of the length, first word, last word, and included word of the transliteration result, which can be understood as realizing a user-oriented personalized transliteration strategy that effectively meets the personalized needs of users.
  • Figure 5A exemplarily shows a schematic diagram of a training process of the pronunciation embedding layer.
  • the training process can be performed by the electronic device 100 itself.
  • the training process can be performed by a network device, and the electronic device 100 receives the pronunciation embedding layer sent by the network device.
  • FIG. 5A takes the electronic device 100 performing the training process as an example to illustrate.
  • the training process may, but is not limited to, include the following steps:
  • the electronic device 100 initializes the pronunciation embedding layer 200.
  • the electronic device 100 may first generate a high-dimensional matrix (ie, the initialized pronunciation embedding layer 200).
  • the initialized pronunciation embedding layer 200 is a 2 million ⁇ 300 matrix, which may include 200 Ten thousand 1 ⁇ 300 high-dimensional vectors, each vector can indicate a character.
  • the electronic device 100 trains the pronunciation embedding layer 200.
  • the electronic device 100 can use a speech recognition system (automatic speech recognition, ASR) to obtain a large amount of texts with similar/same pronunciation from big data for training the pronunciation embedding layer 200 .
  • ASR automatic speech recognition
  • the electronic device 100 can use ASR to obtain words from big data that are similar/identical in pronunciation to each word in the reference sentence: "eye” related to "I” “, “red”, “read”, “white” related to “write”, “the”, “an” related to "a”, “boot”, “foot”, “cook” related to “book” “, “root”. Then, the electronic device 100 may generate training data based on the reference sentence and the acquired words with similar/identical pronunciations.
  • the electronic device 100 can use the texts with similar/identical pronunciations obtained above as training data to train the pronunciation embedding layer 200 by training the continuous bag-of-words model (CBOW), Thereby updating the weight of the pronunciation embedding layer 200, where CBOW can be used to predict the center word corresponding to a given context. For example, assuming that the training data includes "I write a book”, the context of the second word "write” can be I_a book” is used as the input of CBOW, and CBOW can predict the second word represented by "_”.
  • CBOW continuous bag-of-words model
  • the electronic device 100 can use the context of any word in the training data as the input of CBOW, obtain the probability that CBOW outputs the word (which may be referred to as the predicted probability for short), and then use the obtained predicted probability to update the pronunciation embedding layer.
  • a weight of 200 for example, the electronic device 100 can compare the context of the second word "write”, “red”, “read”, and “white” in the 120 sentences in the above example (which can be understood as adding these 120 sentences (the second word in is replaced with "_") as the input of CBOW, obtain the predicted probabilities of CBOW outputting "write", “red”, “read”, and “white” respectively, and based on the predicted probability based on the back propagation algorithm Update the weights of the pronunciation embedding layer 200.
  • the pronunciation embedding layer can be trained based on a large amount of data with similar/identical pronunciation, which can be understood as learning a large amount of pronunciation dimension information, instead of only learning the semantic dimension like a normal embedding layer.
  • Information for example, the ordinary embedding layer will only use "I write a book” (words have semantic correlation) as training data, and will not use sentences other than "I write a book” among the above 120 statements ( There is no semantic correlation between words) as training data.
  • the clustering methods based on the ordinary embedding layer and the pronunciation embedding layer are also different.
  • the ordinary embedding layer clusters based on semantic similarity.
  • the pronunciation embedding layer is clustered based on pronunciation similarity.
  • Figure 6 shows a schematic diagram of the feature information mapped by the pronunciation embedding layer in the high-dimensional space
  • (B) of Figure 6 shows the feature information mapped by the ordinary embedding layer in the high-dimensional space.
  • a class may include more or less feature information.
  • the translation function usually only implements literal translation (translation that is both faithful to the content of the source language information and conforms to the structural form of the source language information) and/or free translation (on the premise of being faithful to the content of the source language information, it can get rid of the source language information).
  • the constraints of the information structure make the transliteration result conform to the specifications of the target language), and transliteration will not be achieved.
  • the learning goal of literal translation and/or free translation tasks is to learn a large number of grammatical rules and semantic knowledge, so as to output semantically correct and grammatically fluent target language content.
  • the target language content includes two characters
  • the decoder decodes one of the characters When it is " ⁇ ”
  • the literal translation and/or free translation model is more likely to decode another character at the same time: characters such as " ⁇ " or " ⁇ ” that are semantically related to " ⁇ ”.
  • the learning goal of the transliteration task is to learn pronunciation splitting and combination rules, so as to decode the target language content that has similar pronunciation and conforms to the user's actual pronunciation, without paying attention to semantics and grammar.
  • the target language content includes Two characters, when the decoder decodes one of the characters as " ⁇ ”, the transliteration model often splits according to pronunciation and decodes the other character as "Si" or other characters with the same/similar pronunciation.
  • the input of the transliteration model is the output of the pronunciation embedding layer
  • the big data used to train the pronunciation embedding layer can reflect the characteristics of fuzzy defaults, strong pronunciation and weak pronunciation in the user's usual pronunciation process, thus greatly facilitating the transliteration model.
  • Learn pronunciation splitting and combination rules so that the transliteration model can fully capture transliteration techniques such as homophonic conversion, voiceless/voiced consonant conversion, initial consonant optimization, and final consonant omission, effectively improving the quality of transliteration results and reducing the gap between automatic transliteration and manual transliteration. quality gap.
  • FIG. 7 exemplarily shows a schematic diagram of the software architecture of yet another electronic device 100 .
  • the electronic device 100 may include a pronunciation embedding layer 200 , a transliteration model 300 , a transliteration knowledge base 400 and a correction module 500 .
  • the transliteration model 300 may include an encoder 301 , a length prediction module 302 , an attention mechanism 303 and Decoder 304.
  • Transliteration knowledge base 400 may include whitelists and blacklists.
  • the pronunciation embedding layer 200 may receive source language information X and length information LEN as input, and output source language feature information E(X) and length feature information E(LEN).
  • the encoder 301 may receive E(X) and E(LEN) as input, encode E(X) and E(LEN) respectively, to output source language encoding information H(X) and length encoding information H(LEN).
  • the length prediction module 302 can receive H(LEN) as input, input H(LEN) into a pooling layer (Pooling) and a classifier in sequence, and output K predicted lengths, where the classifier includes, for example, a linear layer (Linear). and Softmax.
  • the attention mechanism 303 can receive H(X) as input and output an attention score.
  • the decoder 304 may receive H(X), K prediction lengths, and attention scores as input, iteratively decode H(X) based on the K prediction lengths and attention scores, and output one or Transliteration results of multiple target languages whose lengths belong to these K predicted lengths (i.e., transliteration set 1 of the target language).
  • decoder 304 may receive as input H(X), a user instruction indicating the length of the transliteration result, and an attention score, and iterate H(X) based on the user instruction and attention score. Decode and output one or more transliteration results of the target language with a length indicated by the user instruction (ie, transliteration set 1 of the target language).
  • the decoder 304 may also receive a user instruction for indicating the content of the transliteration result, such as a user instruction for indicating the first word, last word and/or containing words of the transliteration result, and combine the user instruction with the user instruction to indicate the content of the transliteration result.
  • H(X) decodes and outputs the transliteration set 1 that matches the user instruction.
  • the transliteration set 1 of the target language output by the decoder 304 can be used to determine whether it hits the blacklist in the transliteration knowledge base 400, that is, whether the transliteration results in the transliteration set 1 include characters in the blacklist.
  • the transliteration set 1 can be input to the correction module 500.
  • the correction module 500 can use the white list in the transliteration knowledge base 400 to replace the characters in the blacklist in the transliteration result of the transliteration set 1 with the characters in the white list. characters, and output the transliteration set 2 of the target language to provide to the user.
  • the transliteration set 1 of the target language can be directly output to be provided to the user.
  • Figure 8 exemplarily shows a schematic diagram of a user interface of a translation application.
  • the electronic device 100 may display a user interface 810 of the translation application.
  • the user interface 810 may include translation information 811, an input box 812, a translation option 813, a determination control 814 and a display box 815, where the translation information 811 may include a source language (eg English) and a target language (eg Chinese), and the electronic device 100 may In response to the operation on the translation information 811, the source language and/or the target language are switched.
  • the input box 812 can be used to input content to be translated.
  • the translation option 813 may indicate the type of translation, for example, the translation option 813 in the user interface 810 indicates "literal translation/free translation.”
  • the determination control 814 can be used to trigger translation of the content in the input box 812 (for example, specifically "literal translation/free translation” indicated by the translation option 813 ), and the translation result can be used to be displayed in the display box 815 .
  • the electronic device 100 may respond to an operation (such as a touch operation, such as a click) on the translation option 813, switch the translation type, for example, switch "literal translation/free translation” to "transliteration”, and display
  • an operation such as a touch operation, such as a click
  • switch the translation type for example, switch "literal translation/free translation” to "transliteration”
  • display For the user interface of the transliteration function, please refer to the user interface 820 shown in (B) of Figure 8 for details.
  • the user interface 820 is similar to the user interface 810.
  • the difference is that the translation type indicated by the translation option 813 in the user interface 820 is "transliteration". Therefore, the determination control 814 can be used to trigger the input
  • the content in the box 812 is transliterated, that is, used to trigger the execution of the transliteration method in the above embodiment.
  • the content in the input box 812 is the source language information in the above embodiment.
  • the transliteration result of the target language may be displayed in the display box 815.
  • the user interface 820 also includes a custom area 821.
  • the custom area 821 may include an input box 821A for setting the first word of the transliteration result, an input box 821B for setting the last word of the transliteration result, an input box 821B for setting the transliteration result containing
  • the character input box 821C, the input box 821D for setting the length of the transliteration result, and the custom area 821 are, for example, used for the user to input user instructions in the above embodiment.
  • the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Harmony” shown in the input box 812 in the user interface 910 shown in Figure 9A, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Harmony” is the source language information. Then, the electronic device 100 may transliterate the content "Harmony” in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 910, that is, execute FIG.
  • an operation such as a touch operation, such as a click
  • the obtained transliteration set of the target language includes three transliteration results: "Harmoni”, “Hongmeng” and "Hameng".
  • the electronic device can display the box in the user interface 910 815 displays these three transliteration results, that is, executes S105 shown in Figure 4.
  • the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Mercedes” shown in the input box 812 in the user interface 920 shown in FIG. 9B, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Mercedes” is the source language information.
  • the electronic device 100 can also receive the content input by the user based on the input box 821C in the custom area 821 included in the user interface 820: "DE”, that is, the electronic device 100 can perform the receiving user instruction described in FIG. 4, and the user instruction is Yu indicates that the transliteration result includes the character " ⁇ ".
  • the electronic device 100 may transliterate the content "Mercedes” in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 920 , that is, perform the steps shown in FIG. 4 S102-S103 or S102-S104, the obtained transliteration set of the target language includes 3 transliteration results: "Mercedes", “Mercedes” and "Mercedes", these 3 transliteration results include the above
  • the user command indicates the character "DE”.
  • the electronic device can display these three transliteration results in the display box 815 in the user interface 920, that is, execute S105 shown in Figure 4.
  • the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Harmony” shown in the input box 812 in the user interface 930 shown in Figure 9C, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Harmony” is the source language information.
  • the electronic device 100 can also receive the content input by the user based on the input box 821D in the custom area 821 included in the user interface 820: "2", that is, the electronic device 100 can perform the receiving user instruction described in Figure 4, and the user instruction is Yu indicates that the length of the transliteration result is 2.
  • the electronic device 100 may transliterate the content "Harmony" in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 930, that is, perform the steps shown in FIG. 4
  • the obtained transliteration set of the target language includes two transliteration results: "Hongmeng” and "Hameng”.
  • the electronic device can display these two transliteration results in the display box 815 in the user interface 930, That is, S105 shown in Figure 4 is executed.
  • the electronic device 100 also receives a user instruction indicating that the length of the transliteration result is 2. Therefore, the transliteration set shown in FIG. 9C includes the transliteration set shown in FIG. 9A The shown transliteration set contains 2 transliteration results of length 2.
  • the user can also set the first word or last word based on the custom area 821 in the user interface 820.
  • the user can also set the first word or the last word based on the user-defined area 821.
  • the custom area 821 in the interface 820 sets at least two items of the first word, the last word, the inclusive word, and the length.
  • Figure 10 exemplarily shows a schematic diagram of a user interface of a browser application.
  • the electronic device 100 may display a user interface 1010 of a browser application.
  • the user interface 1010 may include a search box 1011 , and the search box 1011 may include the characters “search or enter a URL” to prompt the user to enter a search. word or the URL of the web page you want to view.
  • the electronic device 100 can receive content input by the user based on the search box 1011 in the user interface 1010, assuming it is "harmony” shown in the search box 1011 in the user interface 1020 shown in (B) of Figure 10 ", that is, the electronic device 100 can execute S101 shown in Figure 4, and "harmony” is the source language information.
  • the electronic device 100 can transliterate the source language information "harmony”, that is, execute S102-S103 or S102-S104 shown in Figure 4, and the obtained transliteration set of the target language includes three transliteration results: "harmony", “harmony” ” and “Harmony.”
  • the electronic device 100 can display these three transliteration results in the candidate list 1021 in the user interface 1020, that is, perform S105 shown in FIG. 4.
  • the candidate list 1021 may include multiple options, any of which includes content related to the content “harmony” in the search box 1011 .
  • the multiple options include, for example, but are not limited to: including “ “harmony” option 1021A, including “Harmony” option 1021B, including “Hongmeng” option 1021C, including “Harmony” option 1021D, including "what does harmony mean” option 1021E, including "harmonyOS” option 1021F, option 1021G including "harmony adjective”.
  • the electronic device 100 may respond to an operation (such as a touch operation, such as a click) on any one of the multiple options, and search the Internet for information related to the content included in the option, such as , the option is option 1021C, then the electronic device 100 can display search results related to "Hongmeng" in response to the operation on option 1021C.
  • an operation such as a touch operation, such as a click
  • the electronic device 100 can receive content input by the user based on the search box 1011 in the user interface 1010, assuming it is "gene” shown in the search box 1011 in the user interface 1030 shown in (C) of Figure 10 Knee Tie May Translation”.
  • the electronic device 100 can obtain the key content "gene knee tie may” from the content in the search box 1011, and then literally/paraphrase “gene knee tie may” as “gene knee tie may", that is, the electronic device 100 can execute the figure In S101 shown in 4, “gene knee tie may" is the source language information.
  • the electronic device 100 can transliterate the source language information "gene knee tie may", that is, execute S102-S103 or S102-S104 shown in Figure 4, and obtain the transliteration result in the target language as "chicken you are so beautiful”.
  • the electronic device 100 may display the transliteration result in the candidate list 1031 in the user interface 1030, that is, perform S105 shown in FIG. 4 .
  • the candidate list 1031 may include multiple options, such as but not limited to: option 1031A including "gene knee tie may translation", including "gene knee tie may (chicken you are so beautiful)” option 1031A.
  • Option 1031B Option 1031C including “Chicken you are so beautiful”
  • Option 1031D including “Gene knee tie May translated into English”
  • the electronic device 100 may, in response to an operation for any one of the multiple options, search the Internet for information related to content included in the option.
  • Figure 11 exemplarily shows a schematic diagram of a user interface of a browser application.
  • the electronic device 100 may display a user interface 1110 of a browser application.
  • the user interface 1110 may include a search box 1111 .
  • the search box 1111 may include a search control 1111A and a switching control 1111B.
  • the search control 1111A includes characters. "Normal search” can indicate that the current search type is "normal search”, and the switching control 1111B can be used to switch the search type.
  • the electronic device 100 may receive the search term "harmony” input by the user based on the search box 1111 in the user interface 1110, and respond to an operation (eg, a touch operation) on the search control 1111A in the user interface 1110. The operation is, for example, clicking) to display the search results related to the search term "harmony". For details, see the user interface 1120 shown in (B) of Figure 11 .
  • the user interface 1120 may include a search box 1111 , a search summary 1121 and a search result list 1122 .
  • the search box 1111 is consistent with the search box 1111 in the user interface 1110 and will not be described again.
  • the search summary 1121 may include the characters: "10 results found for 'harmony' for you”.
  • Search results list 1122 may include multiple search results related to the search term "harmony.”
  • the electronic device 100 can switch the search type, for example, "normal search” to "transliteration” in response to an operation (such as a touch operation, such as a click) on the switch control 1111B in the user interface 1110 Search", at this time the search control 1111A may include the characters "transliteration search”.
  • the electronic device 100 can transliterate the content "harmony” in the search box 1111 in response to an operation (such as a touch operation, such as a click) on the search control 1111A, that is, the electronic device 100 can perform the method shown in Figure 4, “Harmony” is the source language information, and the obtained transliteration set of the target language includes three transliteration results: "Harmony", “Hongmeng” and “Harmony". Then, the electronic device 100 can display search results related to the search term "harmony” and the above three transliteration results. For details, see the user interface 1130 shown in (C) of FIG. 11 .
  • the user interface 1130 may include a search box 1111 , a search summary 1131 and a search result list 1132 .
  • the search box 1111 is similar to the search box 1111 in the user interface 1110 . The difference is that the user interface 1130
  • the search control 1111A in includes the characters "transliteration search", which may indicate that the current search type is "transliteration search”.
  • the search summary 1131 may include the characters "Find 100 related results for 'harmony', 'Harmony', 'Hongmeng', and 'Harmony' for you".
  • the search result list 1132 may include multiple search results related to the search term "harmony" and the above three transliteration results.
  • search results 1132A including the characters “harmony introduction”
  • search results 1132B including the characters “harmony-latest”
  • Information is related to the search term “harmony” and the transliteration result “Hongmeng”.
  • the search result 1132C (including the character “Harmony's name”) is related to the transliteration result "Harmony”.
  • search result 1132D (including the character “Harmony's name) is related to Story) is related to the transliteration result "Ha Meng”.
  • the methods provided by the embodiments of this application can be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmit to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media can be magnetic media (for example, floppy disks, hard disks, tapes ), optical media (for example, digital video disc (DWD)), or semiconductor media (for example, solid state disk (SSD), etc.).
  • magnetic media for example, floppy disks, hard disks, tapes
  • optical media for example, digital video disc (DWD)
  • semiconductor media for example, solid state disk (SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A transliteration method and an electronic device (100). The method is applied to the electronic device (100), the method comprising: receiving first information input by a user in a first language; transliterating the first information to obtain a plurality of pieces of second information in a second language, the plurality of pieces of second information comprising third information and fourth information, and the third information and the fourth information have different lengths; and displaying the plurality of pieces of second information. The present invention can implement transliteration at a relatively high quality by using an artificial intelligence (AI) technology and can return a plurality of transliteration results with different lengths at a time for users' reference, thereby effectively satisfying the user requirements for transliteration.

Description

一种音译方法及电子设备Transliteration method and electronic device
本申请要求于2022年09月07日提交中国专利局、申请号为202211089982.7、申请名称为“一种音译方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on September 7, 2022, with the application number 202211089982.7 and the application title "A transliteration method and electronic device", the entire content of which is incorporated into this application by reference. .
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种音译方法及电子设备。The present application relates to the field of computer technology, and in particular to a transliteration method and electronic equipment.
背景技术Background technique
音译是指根据源语言信息的发音在目标语言中寻找发音相近的信息进行替代翻译,例如用发音相近的汉字将英语翻译过来。音译通常用于人名,地名,国名,舶来词,文学著作、电影、音乐等作品的名称等信息的翻译。Transliteration refers to searching for information with similar pronunciation in the target language based on the pronunciation of the source language information for alternative translation, for example, using Chinese characters with similar pronunciation to translate into English. Transliteration is usually used for the translation of information such as names of people, places, countries, imported words, names of literary works, movies, music and other works.
但是,用户使用电子设备的音译功能时,电子设备往往只能逐音逐节的翻译,返回给用户的音译结果单一、质量不高,很可能无法满足用户的音译需求。However, when users use the transliteration function of electronic devices, the electronic devices can often only translate sound by sound and verse by verse. The transliteration results returned to the user are single and of low quality, and may not be able to meet the user's transliteration needs.
发明内容Contents of the invention
本申请公开了一种音译方法及电子设备,能够提升音译结果的质量,并且可以一次性返回多个音译结果以供用户参考,有效满足用户的音译需求。This application discloses a transliteration method and electronic device, which can improve the quality of transliteration results, and can return multiple transliteration results at one time for users' reference, effectively meeting the users' transliteration needs.
第一方面,本申请提供一种音译方法,应用于电子设备,该方法包括:接收用户输入的第一语言的第一信息;对所述第一信息进行音译并得到第二语言的多个第二信息,所述多个第二信息包括第三信息和第四信息,所述第三信息和所述第四信息的长度不同;显示所述多个第二信息。In a first aspect, the present application provides a transliteration method for use in electronic devices. The method includes: receiving first information in a first language input by a user; transliterating the first information and obtaining a plurality of first information in a second language. Two information, the plurality of second information includes third information and fourth information, the third information and the fourth information have different lengths; the plurality of second information is displayed.
例如,第一信息为“harmony”,多个第二信息包括长度为2的“鸿蒙”和“哈梦”,以及长度为3的“哈莫尼”。For example, the first information is "harmony", and the plurality of second information includes "Hongmeng" and "Harmony" with a length of 2, and "Harmony" with a length of 3.
在上述方法中,电子设备音译第一信息时,可以一次性输出多个长度不同的音译结果(即多个第二信息)以供用户参考,因此,可供用户选择的音译结果的范围大大提升,增加了用户获取到所需音译结果的概率,有效满足用户的音译需求。In the above method, when the electronic device transliterates the first information, it can output multiple transliteration results (ie, multiple second information) with different lengths at one time for the user's reference. Therefore, the range of transliteration results available for the user to choose is greatly increased. , increasing the probability that users can obtain the required transliteration results, and effectively meeting users' transliteration needs.
在一种可能的实现方式中,所述第一信息为企业名称、品牌名称、商标名称、产品名称、人名、地名、国名、舶来词、文学著作的名称、电影的名称、音乐的名称或者音译热词。In a possible implementation, the first information is a company name, a brand name, a trademark name, a product name, a person's name, a place name, a country name, a borrowed word, the name of a literary work, the name of a movie, the name of music, or a transliteration hot word.
在上述方法中,上述类型的第一信息和通常翻译的句子不同,对于通常翻译的句子,若是逐音逐节音译并返回单一音译结果,很可能是符合用户需求的(因为人工音译往往也是这样的),但是对于上述类型的第一信息,人工音译通常会使用谐音转换、清/浊辅音转换、首音优化、尾音省略等音译技巧,并且不同用户和/或不同场景下,即使是同一个第一信息,用户所需的音译结果也可能是不同的,因此,若是逐音逐节音译并返回单一音译结果,很可能无法满足用户的音译需求。本申请中,电子设备一次性输出多个长度不同的音译结果以供用户选择,可以满足不同用户和/或不同场景下的个性化需求,提升用户体验。In the above method, the above type of first information is different from commonly translated sentences. For commonly translated sentences, if it is transliterated sound by sound and section by section and a single transliteration result is returned, it is likely to meet the needs of the user (because this is often the case with manual transliteration ), but for the above-mentioned types of first information, artificial transliteration usually uses transliteration techniques such as homophone conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and final consonant omission, and under different users and/or different scenarios, even the same The first information is that the transliteration results required by the user may also be different. Therefore, if the transliteration is performed sound by sound and verse by verse and a single transliteration result is returned, it may not be able to meet the user's transliteration needs. In this application, the electronic device outputs multiple transliteration results of different lengths at one time for users to choose, which can meet the personalized needs of different users and/or different scenarios and improve user experience.
在一种可能的实现方式中,所述方法还包括:接收用户输入的第三语言的第五信息;对所述第五信息进行直译或者意译并得到第四语言的第六信息;对所述第六信息进行音译并得到所述第三语言的至少一个第七信息;显示所述至少一个第七信息。In a possible implementation, the method further includes: receiving the fifth information in the third language input by the user; performing a literal translation or free translation on the fifth information and obtaining the sixth information in the fourth language; The sixth information is transliterated and at least one seventh information in the third language is obtained; and the at least one seventh information is displayed.
例如,第五信息为“基因膝盖领带五月”,第六信息为“gene knee tie may”,至少一个第七信息包括“鸡你太美”。For example, the fifth message is "gene knee tie may", the sixth message is "gene knee tie may", and at least one seventh message includes "chicken you are so beautiful".
在上述方法中,电子设备可以先对第五信息进行直译或者意译,然后再对翻译结果进行音译,可以很好地适用于一些特定场景(例如第五信息为音译热词),满足用户的个性化需求。In the above method, the electronic device can first perform literal translation or free translation of the fifth information, and then transliterate the translation result, which can be well adapted to some specific scenarios (for example, the fifth information is a hot word in transliteration) and satisfy the user's personality. needs.
在一种可能的实现方式中,所述显示所述多个第二信息之前,所述方法还包括:判断所述第二信息是否包括黑名单中的字符;当所述第二信息包括黑名单中的第一字符时,将所述第二信息中的所述第一字符替换为白名单中的第二字符,所述第二字符是所述白名单中和所述第一字符的发音相似度大于或等于第一阈值的字符。In a possible implementation, before displaying the plurality of second information, the method further includes: determining whether the second information includes characters in the blacklist; when the second information includes characters in the blacklist when the first character in the second information is replaced with a second character in the white list, the second character is a pronunciation similar to the first character in the white list Characters whose degree is greater than or equal to the first threshold.
在一些示例中,白名单包括寓意好的吉利字(例如“美”、“斯”)和实际可用于音译的字符,例如企业名称、品牌名称、商标名称、产品名称、人名、地名、国名、舶来词、文学著作的名称、电影的名称、音乐的名称以及音译热词。黑名单包括寓意不好的晦气字(例如“没”、“死”)和实际不用于音译的字符。 In some examples, the whitelist includes auspicious characters with good meanings (such as "美", "斯") and characters that can actually be used for transliteration, such as company names, brand names, trademark names, product names, people's names, place names, country names, Borrowed words, names of literary works, names of movies, names of music and hot transliterated words. The blacklist includes unlucky characters with bad connotations (such as "无" and "死") and characters that are not actually used for transliteration.
在上述方法中,电子设备可以使用白名单中的字符替换掉第二信息中属于黑名单的字符,例如,使用吉利字替换掉晦气字,音译结果的内容高度可控,更加符合人工音译的习惯,进一步提升音译结果的质量。In the above method, the electronic device can use the characters in the whitelist to replace the characters in the second message that belong to the blacklist. For example, use auspicious characters to replace unlucky characters. The content of the transliteration result is highly controllable and more in line with the habits of manual transliteration. , to further improve the quality of transliteration results.
在一种可能的实现方式中,所述方法还包括:接收用户输入的第一指令,所述第一指令用于指示所述第一信息的音译结果中的第一个字符为第三字符;所述多个第二信息是基于所述第一指令确定的,所述第二信息中的第一个字符为所述第三字符。In a possible implementation, the method further includes: receiving a first instruction input by the user, the first instruction being used to indicate that the first character in the transliteration result of the first information is a third character; The plurality of second information is determined based on the first instruction, and the first character in the second information is the third character.
在上述方法中,用户可以自定义音译结果的首字,音译结果的内容高度可控,可以满足用户的个性化需求,提升用户体验。In the above method, the user can customize the first character of the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
在一种可能的实现方式中,所述方法还包括:接收用户输入的第二指令,所述第二指令用于指示所述第一信息的音译结果中的最后一个字符为第四字符;所述多个第二信息是基于所述第二指令确定的,所述第二信息中的最后一个字符为所述第四字符。In a possible implementation, the method further includes: receiving a second instruction input by the user, the second instruction being used to indicate that the last character in the transliteration result of the first information is a fourth character; The plurality of second information is determined based on the second instruction, and the last character in the second information is the fourth character.
在上述方法中,用户可以自定义音译结果的尾字,音译结果的内容高度可控,可以满足用户的个性化需求,提升用户体验。In the above method, the user can customize the suffix of the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
在一种可能的实现方式中,所述方法还包括:接收用户输入的第三指令,所述第三指令用于指示所述第一信息对应的音译结果包括第五字符;所述多个第二信息是基于所述第三指令确定的,所述第二信息包括所述第五字符。In a possible implementation, the method further includes: receiving a third instruction input by the user, the third instruction being used to indicate that the transliteration result corresponding to the first information includes a fifth character; the plurality of third instructions The second information is determined based on the third instruction, and the second information includes the fifth character.
在上述方法中,用户可以自定义音译结果的包含字,音译结果的内容高度可控,可以满足用户的个性化需求,提升用户体验。In the above method, the user can customize the words included in the transliteration result, and the content of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
在一种可能的实现方式中,所述对所述第一信息进行音译并得到第二语言的多个第二信息,包括:对所述第一信息进行音译并得到所述第二语言的第八信息;将所述第八信息中的第六字符替换为所述第三指令指示的所述第五字符,所述第二信息为替换后的所述第八信息,所述第六字符为所述第八信息中和所述第五字符的发音相似度最大的字符。In a possible implementation, transliterating the first information and obtaining a plurality of second information in a second language includes: transliterating the first information and obtaining a plurality of second information in the second language. Eight information; replace the sixth character in the eighth information with the fifth character indicated by the third instruction, the second information is the replaced eighth information, and the sixth character is The character in the eighth information that has the greatest pronunciation similarity with the fifth character.
在上述方法中,电子设备可以将音译的第八信息中和用户指定字符发音最相似的第六字符,替换为用户指定字符,而不是将预设位置的字符替换为用户指定字符,在满足用户个性化需求的同时保证了音译结果的质量,提升用户体验。In the above method, the electronic device can replace the sixth character in the transliterated eighth information that is most similar in pronunciation to the user-specified character with the user-specified character, instead of replacing the character in the preset position with the user-specified character. While personalized needs are ensured, the quality of transliteration results is ensured and user experience is improved.
在一种可能的实现方式中,所述方法还包括:接收用户输入的第五语言的第九信息,接收用户输入的第一长度;对所述第九信息进行音译并得到第六语言的至少一个第十信息,所述第十信息的长度为所述第一长度;显示所述至少一个第十信息。In a possible implementation, the method further includes: receiving the ninth information in the fifth language input by the user, receiving the first length input by the user; transliterating the ninth information and obtaining at least the sixth information in the sixth language. A tenth information, the length of the tenth information is the first length; display the at least one tenth information.
在上述方法中,用户可以自定义音译结果的长度,音译结果的长度高度可控,可以满足用户的个性化需求,提升用户体验。In the above method, the user can customize the length of the transliteration result, and the length of the transliteration result is highly controllable, which can meet the user's personalized needs and improve the user experience.
在一种可能的实现方式中,所述对所述第一信息进行音译并得到第二语言的多个第二信息,包括:使用发音嵌入层将所述第一信息映射为第十一信息,所述第十一信息和第十二信息的距离大于所述第十一信息和第十三信息的距离,所述第十一信息和所述第十二信息的发音相似度大于所述第十一信息和所述第十三信息的发音相似度;将所述十一信息作为音译模型的输入获取输出,所述输出为所述多个第二信息。In a possible implementation, transliterating the first information and obtaining a plurality of second information in a second language includes: using a pronunciation embedding layer to map the first information into eleventh information, The distance between the eleventh information and the twelfth information is greater than the distance between the eleventh information and the thirteenth information, and the pronunciation similarity between the eleventh information and the twelfth information is greater than that of the tenth information. The pronunciation similarity between the first information and the thirteenth information; the eleventh information is used as the input of the transliteration model to obtain the output, and the output is the plurality of second information.
在一种可能的实现方式中,所述发音嵌入层是基于多个语句训练得到的,所述多个语句包括第一语句和第二语句,所述第一语句和所述第二语句均包括N个单词,所述第一语句中的第i个单词和所述第二语句中的第i个单词的发音相似度小于或等于第二阈值,N为正整数,i为小于或等于N的正整数。In a possible implementation, the pronunciation embedding layer is trained based on multiple sentences, the multiple sentences include a first sentence and a second sentence, and both the first sentence and the second sentence include N words, the pronunciation similarity between the i-th word in the first sentence and the i-th word in the second sentence is less than or equal to the second threshold, N is a positive integer, and i is less than or equal to N Positive integer.
例如,第十一信息为高维向量。For example, the eleventh information is a high-dimensional vector.
在上述方法中,发音嵌入层是基于多个发音相似和/或相同的语句训练得到的,经发音嵌入层映射得到的信息之间的距离是根据发音相似度确定的,而不是根据语义相似度确定的,充分考虑到了音译需求和直译和/或意译需求的差异,而音译模型的输入是经发音嵌入层映射得到的信息,因此大大方便了音译模型学习发音拆分和组合规则,让音译模型可以充分捕捉到谐音转换、清/浊辅音转换、首音优化、尾音省略等音译技巧,减少自动音译和人工音译之间的质量差距。In the above method, the pronunciation embedding layer is trained based on multiple sentences with similar and/or identical pronunciation. The distance between the information mapped by the pronunciation embedding layer is determined based on pronunciation similarity rather than semantic similarity. Definitely, the difference between transliteration needs and literal translation and/or free translation needs is fully taken into account, and the input of the transliteration model is information mapped by the pronunciation embedding layer. Therefore, it greatly facilitates the transliteration model to learn pronunciation splitting and combination rules, allowing the transliteration model to It can fully capture transliteration techniques such as homophonic conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and final consonant omission, and reduce the quality gap between automatic transliteration and manual transliteration.
第二方面,本申请实施例提供了一种电子设备,包括收发器、处理器和存储器;上述存储器用于存储计算机程序,上述处理器调用上述计算机程序,用于执行本申请实施例第一方面以及第一方面的任意一种实现方式提供的音译方法。In a second aspect, embodiments of the present application provide an electronic device, including a transceiver, a processor, and a memory; the memory is used to store a computer program, and the processor calls the computer program to execute the first aspect of the embodiment of the present application. and the transliteration method provided by any implementation of the first aspect.
第三方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,该计算机程序被处理器执行时,用于执行本申请实施例第一方面以及第一方面的任意一种实现方式提供的音译方法。In a third aspect, embodiments of the present application provide a computer storage medium. The computer storage medium stores a computer program. When the computer program is executed by a processor, the computer program is used to perform the first aspect of the embodiments of the present application and any of the first aspects. A transliteration method provided by an implementation.
第四方面,本申请实施例提供了一种计算机程序产品,当该计算机程序产品在电子设备上运行时,使 得该电子设备执行本申请实施例第一方面以及第一方面的任意一种实现方式提供的音译方法。In the fourth aspect, embodiments of the present application provide a computer program product, which when the computer program product is run on an electronic device, causes The electronic device is configured to execute the transliteration method provided by the first aspect of the embodiment of the present application and any implementation manner of the first aspect.
第五方面,本申请实施例提供一种电子设备,该电子设备包括执行本申请任一实施例所介绍的方法或装置。上述电子设备例如为芯片。In a fifth aspect, embodiments of the present application provide an electronic device, which includes executing the method or device described in any embodiment of the present application. The above-mentioned electronic device is, for example, a chip.
附图说明Description of the drawings
以下对本申请用到的附图进行介绍。The drawings used in this application are introduced below.
图1是本申请提供的一种电子设备的硬件结构示意图;Figure 1 is a schematic diagram of the hardware structure of an electronic device provided by this application;
图2是本申请提供的一种电子设备的软件架构示意图;Figure 2 is a schematic diagram of the software architecture of an electronic device provided by this application;
图3是本申请提供的又一种电子设备的软件架构示意图;Figure 3 is a schematic diagram of the software architecture of another electronic device provided by this application;
图4是本申请提供的一种音译方法的流程示意图;Figure 4 is a schematic flow chart of a transliteration method provided by this application;
图5A是本申请提供的一种发音嵌入层的训练过程的示意图;Figure 5A is a schematic diagram of the training process of a pronunciation embedding layer provided by this application;
图5B是本申请提供的一种发音嵌入层的训练数据的示意图;Figure 5B is a schematic diagram of training data for a pronunciation embedding layer provided by this application;
图6是本申请提供的一种高维空间的示意图;Figure 6 is a schematic diagram of a high-dimensional space provided by this application;
图7是本申请提供的又一种电子设备的软件架构示意图;Figure 7 is a schematic diagram of the software architecture of another electronic device provided by this application;
图8、图9A-图9C、图10、图11本申请提供的一些用户界面实施例的示意图。Figures 8, 9A-9C, 10, and 11 are schematic diagrams of some user interface embodiments provided by this application.
具体实施方式Detailed ways
下面将结合附图对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings. Among them, in the description of the embodiments of this application, unless otherwise stated, "/" means or, for example, A/B can mean A or B; "and/or" in the text is only a way to describe related objects. The association relationship means that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of this application , "plurality" means two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms “first” and “second” are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.
目前,自动音译技术的实现繁冗,一次音译需要大量的特征转换过程,例如,将英文单词音译为汉字的过程包括:英文单词->音素序列->声母和韵母序列->汉语拼音->汉字。并且,自动音译技术只能实现逐音逐节的翻译,并返回单一、质量较低的音译结果,无法使用谐音转换、清/浊辅音转换、首音优化、尾音省略等音译技巧,质量远低于人工音译,很可能无法满足用户的音译需求。At present, the implementation of automatic transliteration technology is cumbersome, and one transliteration requires a large number of feature conversion processes. For example, the process of transliterating English words into Chinese characters includes: English words -> phoneme sequence -> initial consonant and final sequence -> Chinese Pinyin -> Chinese characters. Moreover, automatic transliteration technology can only translate sound by sound and verse by verse, and return a single, low-quality transliteration result. It cannot use transliteration techniques such as homophone conversion, unvoiced/voiced consonant conversion, initial consonant optimization, and tail consonant omission, and the quality is far lower. Due to manual transliteration, it is likely that it cannot meet the transliteration needs of users.
本申请提供了一种音译方法,可以有效满足用户的音译需求。该方法包括:电子设备可以通过深度学习模型对源语言信息进行音译,并获取到一个或多个目标语言的音译结果返回给用户,在一些示例中,这一个或多个音译结果的长度不同,可供用户选择的音译结果的范围大大提升,增加用户获取到所需音译结果的概率。其中,该深度学习模型可以是基于大数据隐式学习音译过程中的音素组合和拆分规则得到的,因此,可以充分学习到实际音译中使用的音译技巧,提升了自动音译的质量。并且,该深度学习模型实现了端到端的特征转换,即源语言信息->高维特征->目标语言信息,摒弃了繁冗的特征转换过程,效率更高。本申请可以理解为是实现了数据驱动的音译技术。This application provides a transliteration method that can effectively meet the transliteration needs of users. The method includes: the electronic device can transliterate the source language information through a deep learning model, and obtain one or more transliteration results in the target language and return them to the user. In some examples, the lengths of the one or more transliteration results are different. The range of transliteration results available for users to choose has been greatly expanded, increasing the probability that users will obtain the desired transliteration results. Among them, the deep learning model can be obtained by implicitly learning the phoneme combination and splitting rules in the transliteration process based on big data. Therefore, the transliteration skills used in actual transliteration can be fully learned and the quality of automatic transliteration can be improved. Moreover, this deep learning model implements end-to-end feature conversion, that is, source language information -> high-dimensional features -> target language information, eliminating the cumbersome feature conversion process and making it more efficient. This application can be understood as realizing data-driven transliteration technology.
在一种实施方式中,电子设备可以自动修正深度学习模型得到的音译结果,例如将晦气字替换为吉祥字,进一步提升音译质量。In one implementation, the electronic device can automatically correct the transliteration results obtained by the deep learning model, for example, replacing unlucky characters with auspicious characters to further improve the quality of transliteration.
在一种实施方式中,上述深度学习模型可以基于用户指令获取匹配的音译结果,例如,用户可以自定义音译结果的长度、首字、尾字、包含字中的至少一项,可以理解为是实现了用户导向的音译技术,电子设备提供的音译功能更加丰富,有效满足用户的个性化需求,提升用户体验。In one implementation, the above-mentioned deep learning model can obtain matching transliteration results based on user instructions. For example, the user can customize at least one of the length, first word, last word, and inclusive word of the transliteration result, which can be understood as User-oriented transliteration technology has been realized, and the transliteration functions provided by electronic devices are more abundant, effectively meeting the personalized needs of users and improving user experience.
本申请中,电子设备可以是手机、平板电脑、手持计算机、桌面型计算机、膝上型计算机、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA),以及智能电视、智能摄像头等智能家居设备,智能手环、智能手表、智能眼镜等可穿戴设备,增强现实(augmented reality,AR)、虚拟现实(virtual reality,VR)、混合现实(mixed reality,MR)等扩展现实(extended reality,XR)设备,车载设备或智慧城市设备,本申请实施例对电子设备的具体类型不作特殊限制。In this application, the electronic device may be a mobile phone, a tablet computer, a handheld computer, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, or a personal digital assistant (personal digital assistant). assistant (PDA), as well as smart home devices such as smart TVs and smart cameras, wearable devices such as smart bracelets, smart watches, and smart glasses, augmented reality (AR), virtual reality (VR), and mixed reality Extended reality (XR) devices such as (mixed reality, MR), vehicle-mounted devices or smart city devices. The embodiments of this application do not place special restrictions on the specific types of electronic devices.
接下来介绍本申请实施例提供的示例性的电子设备100。 Next, an exemplary electronic device 100 provided by an embodiment of the present application is introduced.
图1示例性示出了一种电子设备100的硬件结构示意图。FIG. 1 exemplarily shows a schematic diagram of the hardware structure of an electronic device 100 .
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了***的效率。The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, processor 110 may include one or more interfaces. Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
充电管理模块140用于从充电器接收充电输入。The charging management module 140 is used to receive charging input from the charger.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, the wireless communication module 160, and the like.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G/6G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一种实施方式中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一种实施方式中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块设置在同一个器件中。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G/6G applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation. In one implementation, at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 . In one implementation, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一种实施方式中,调制解调处理器可以是独立的器件。在另一种实施方式中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。A modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194. In one implementation, the modem processor may be a stand-alone device. In another implementation, the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation  satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellites. system (global navigation Satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
在一种实施方式中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。上述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位***(global positioning system,GPS),全球导航卫星***(global navigation satellite system,GLONASS),北斗卫星导航***(beidou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。In one implementation, the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The above-mentioned wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi) -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一种实施方式中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, etc. Display 194 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light). emitting diode (AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc. In one implementation, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度等进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一种实施方式中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the optical signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, etc. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In one implementation, the ISP may be provided in the camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一种实施方式中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object passes through the lens to produce an optical image that is projected onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other format image signals. In one implementation, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals.
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。Speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。Receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals.
耳机接口170D用于连接有线耳机。The headphone interface 170D is used to connect wired headphones.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一种实施方式中,压力传感 器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。The pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals. In one embodiment, pressure sensing The device 180A may be disposed on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc. A capacitive pressure sensor may include at least two parallel plates of conductive material. When a force is applied to pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure based on the change in capacitance. When a touch operation is performed on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一种实施方式中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。The gyro sensor 180B may be used to determine the motion posture of the electronic device 100 . In one implementation, the angular velocity of electronic device 100 about three axes (ie, x, y, and z axes) may be determined by gyro sensor 180B.
气压传感器180C用于测量气压。Air pressure sensor 180C is used to measure air pressure.
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。Magnetic sensor 180D includes a Hall sensor. The electronic device 100 may utilize the magnetic sensor 180D to detect opening and closing of the flip holster.
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。The acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes).
距离传感器180F,用于测量距离。Distance sensor 180F for measuring distance.
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outwardly through the light emitting diode. Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
环境光传感器180L用于感知环境光亮度。The ambient light sensor 180L is used to sense ambient light brightness.
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。Fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.
温度传感器180J用于检测温度。Temperature sensor 180J is used to detect temperature.
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一种实施方式中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also known as "touch device". The touch sensor 180K can be disposed on the display screen 194. The touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K. The touch sensor can pass the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through display screen 194 . In another implementation, the touch sensor 180K may also be disposed on the surface of the electronic device 100 in a position different from that of the display screen 194 .
骨传导传感器180M可以获取振动信号。Bone conduction sensor 180M can acquire vibration signals.
按键190包括开机键,音量键等。The buttons 190 include a power button, a volume button, etc.
马达191可以产生振动提示。The motor 191 can generate vibration prompts.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
SIM卡接口195用于连接SIM卡。The SIM card interface 195 is used to connect a SIM card.
电子设备100的软件***可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。例如,分层架构的软件***可以是安卓(Android)***,也可以是鸿蒙(harmony)操作***(operating system,OS),或其它软件***。本申请实施例以分层架构的Android***为例,示例性说明电子设备100的软件结构。The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. For example, the layered architecture software system can be the Android system, the Harmony operating system (operating system, OS), or other software systems. The embodiment of this application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
图2示例性示出一种电子设备100的软件架构示意图。FIG. 2 exemplarily shows a schematic diagram of the software architecture of the electronic device 100 .
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android***分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和***库,以及内核层。The layered architecture divides the software into several layers, and each layer has clear roles and division of labor. The layers communicate through software interfaces. In some embodiments, the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime and system libraries, and kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图2所示,应用程序包可以包括相机,日历,音乐,图库,短信息,通话,导航,翻译,浏览器等应用程序。本申请中的应用程序包也可以替换为小程序等其他形式的软件。As shown in Figure 2, the application package can include camera, calendar, music, gallery, short message, call, navigation, translation, browser and other applications. The application package in this application can also be replaced by other forms of software such as applets.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (API) and programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图***,电话管理器,资源管理器,通知管理器等。As shown in Figure 2, the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。A window manager is used to manage window programs. The window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像, 音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make this data accessible to applications. The data may include videos, images, Audio, calls made and received, browsing history and bookmarks, phone book and more.
视图***包括可视控件,例如显示文字的控件,显示图片的控件等。视图***可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls that display text, controls that display pictures, etc. A view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide communication functions of the electronic device 100 . For example, call status management (including connected, hung up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在***顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓***的调度和管理。Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and application framework layer into binary files. The virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
***库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
表面管理器用于对显示子***进行管理,并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
2D图形引擎是2D绘图的绘图引擎。2D Graphics Engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。The following exemplifies the workflow of the software and hardware of the electronic device 100 in conjunction with capturing the photographing scene.
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, and other information). Raw input events are stored at the kernel level. The application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation and the control corresponding to the click operation as a camera application icon control as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. Camera 193 captures still images or video.
图3示例性示出又一种电子设备100的软件架构示意图。FIG. 3 exemplarily shows a schematic diagram of the software architecture of yet another electronic device 100 .
如图3所示,电子设备100可以包括发音嵌入层200、音译模型300和音译知识库400。其中,发音嵌入层200可以理解为是高维矩阵,发音嵌入层200可以接收源语言信息作为输入,将源语言信息映射为该高维矩阵中的高维向量(可称为源语言特征信息)(作为输出),例如,假设发音嵌入层200为一个200万×300的矩阵,输入发音嵌入层200的源语言信息包括字符1和字符2,发音嵌入层200可以将字符1和字符2分别映射为上述矩阵中的向量1和向量2,向量1和向量2均为1*300的高维向量。音译模型300可以接收发音嵌入层200输出的源语言特征信息作为输入,输出一个或多个目标语言的音译结果(可称为目标语言的音译集合1),在一种实施方式中,电子设备接收到用户指令,该用户指令例如但不限于用于设置以下至少一项:音译结果的长度、首字、尾字和包含字,音译模型300可以基于该用户指令获取匹配的一个或多个音译结果,在一些示例中,音译模型300可以对发音嵌入层200输出的源语言特征信息进行编码,然后基于该用户指令进行解码,以得到一个或多个音译结果。As shown in FIG. 3 , the electronic device 100 may include a pronunciation embedding layer 200 , a transliteration model 300 , and a transliteration knowledge base 400 . Among them, the pronunciation embedding layer 200 can be understood as a high-dimensional matrix. The pronunciation embedding layer 200 can receive source language information as input, and map the source language information to a high-dimensional vector in the high-dimensional matrix (which can be called source language feature information). (As output), for example, assuming that the pronunciation embedding layer 200 is a 2 million × 300 matrix, the source language information input to the pronunciation embedding layer 200 includes character 1 and character 2, and the pronunciation embedding layer 200 can map character 1 and character 2 respectively. are vector 1 and vector 2 in the above matrix, and vector 1 and vector 2 are both high-dimensional vectors of 1*300. The transliteration model 300 can receive the source language feature information output by the pronunciation embedding layer 200 as input, and output one or more transliteration results of the target language (which may be called the transliteration set 1 of the target language). In one embodiment, the electronic device receives To a user instruction, which is, for example, but not limited to, used to set at least one of the following: the length, first word, last word, and containing word of the transliteration result, the transliteration model 300 can obtain one or more matching transliteration results based on the user instruction. , in some examples, the transliteration model 300 can encode the source language feature information output by the pronunciation embedding layer 200, and then decode it based on the user instruction to obtain one or more transliteration results.
在一种实施方式中,电子设备100还可以包括音译知识库400,音译知识库400可以包括大数据中的音译字符,例如但不限于包括知名的音译品牌、商标、产品、功能、人名、地名、国名、热门词汇等。电子设备100可以使用音译知识库400对音译模型300输出的一个或多个音译结果进行修正,例如使用音译知识库400中的字符替换音译模型300输出的音译结果中发音相同或相近的字符,并得到修正后的一个或多个音译结果(可称为目标语言的音译集合2)。In one embodiment, the electronic device 100 may also include a transliteration knowledge base 400. The transliteration knowledge base 400 may include transliteration characters in big data, such as but not limited to well-known transliteration brands, trademarks, products, functions, names of people, and names of places. , country names, popular vocabulary, etc. The electronic device 100 can use the transliteration knowledge base 400 to modify one or more transliteration results output by the transliteration model 300, for example, using characters in the transliteration knowledge base 400 to replace characters with the same or similar pronunciation in the transliteration results output by the transliteration model 300, and One or more corrected transliteration results (which can be called the transliteration set 2 of the target language) are obtained.
接下来介绍本申请实施例提供的音译方法。 Next, the transliteration method provided by the embodiment of this application is introduced.
请参见图4,图4是本申请实施例提供的一种音译方法的流程示意图。该方法可以应用于图1所示的电子设备100。该方法可以应用于图2所示的电子设备100。该方法可以应用于图3所示的电子设备100。该方法可以包括但不限于如下步骤:Please refer to Figure 4, which is a schematic flow chart of a transliteration method provided by an embodiment of the present application. This method can be applied to the electronic device 100 shown in FIG. 1 . This method can be applied to the electronic device 100 shown in FIG. 2 . This method can be applied to the electronic device 100 shown in FIG. 3 . The method may include but is not limited to the following steps:
S101:电子设备获取源语言信息。S101: The electronic device obtains source language information.
在一种实施方式中,电子设备接收用户输入的源语言信息,源语言信息的形式例如但不限于包括字、词或句等。In one implementation, the electronic device receives source language information input by the user, and the form of the source language information includes, but is not limited to, characters, words, or sentences.
S102:电子设备使用发音嵌入层获取源语言信息对应的源语言特征信息。S102: The electronic device uses the pronunciation embedding layer to obtain the source language feature information corresponding to the source language information.
在一种实施方式中,发音嵌入层可以是基于大量发音相似/相同的数据训练得到的,训练过程的示例可参见下图5A,暂不详述。在一种实施方式中,发音嵌入层可以用于将输入信息映射为高维的特征信息/高维向量,经发音嵌入层映射得到的特征信息在高维空间中的位置是根据发音确定的(例如可以理解为是根据输入信息的发音相似度进行聚类),在一些示例中,特征信息1对应的输入信息1和特征信息2对应的输入信息2的发音相似度,大于,输入信息1和特征信息3对应的输入信息3的发音相似度,因此,在高维空间中,特征信息1和特征信息2的距离小于特征信息1和特征信息3的距离,具体示例可参见下图6,暂不详述。In one implementation, the pronunciation embedding layer can be trained based on a large amount of data with similar/identical pronunciation. An example of the training process can be seen in Figure 5A below, which will not be described in detail for now. In one implementation, the pronunciation embedding layer can be used to map input information into high-dimensional feature information/high-dimensional vectors. The position of the feature information mapped by the pronunciation embedding layer in the high-dimensional space is determined based on the pronunciation ( For example, it can be understood as clustering based on the pronunciation similarity of the input information). In some examples, the pronunciation similarity of input information 1 corresponding to feature information 1 and input information 2 corresponding to feature information 2 is greater than that of input information 1 and The pronunciation similarity of the input information 3 corresponding to feature information 3. Therefore, in the high-dimensional space, the distance between feature information 1 and feature information 2 is smaller than the distance between feature information 1 and feature information 3. For specific examples, see Figure 6 below. Not going into details.
在一种实施方式中,电子设备可以使用发音嵌入层将源语言信息映射为高维的源语言特征信息,假设源语言信息表示为X,则发音嵌入层映射得到的X对应的源语言特征信息可以表示为E(X)。In one implementation, the electronic device can use the pronunciation embedding layer to map the source language information into high-dimensional source language feature information. Assuming that the source language information is represented by X, the pronunciation embedding layer maps the source language feature information corresponding to X. It can be expressed as E(X).
在一种实施方式中,电子设备还可以使用发音嵌入层将预置的长度信息映射为高维的长度特征信息,长度特征信息可以用于预测音译结果的长度。假设长度信息表示为LEN,则发音嵌入层映射得到的LEN对应的长度特征信息可以表示为E(LEN)。In one implementation, the electronic device can also use the pronunciation embedding layer to map the preset length information into high-dimensional length feature information, and the length feature information can be used to predict the length of the transliteration result. Assuming that the length information is expressed as LEN, the length feature information corresponding to LEN mapped by the pronunciation embedding layer can be expressed as E(LEN).
为了方便说明,以下实施例以源语言特征信息表示为E(X),长度特征信息表示为E(LEN)为例进行说明。For convenience of explanation, the following embodiment takes the source language feature information as E(X) and the length feature information as E(LEN) as an example.
S103:电子设备将源语言特征信息作为音译模型的输入得到输出(即目标语言的音译集合1)。S103: The electronic device uses the source language feature information as the input of the transliteration model to obtain an output (ie, the transliteration set 1 of the target language).
在一种实施方式中,音译模型属于半自回归模型。半自回归模型不同于自回归模型和非自回归模型,其中,非自回归模型进行音译时需要先预测音译结果的长度(可简称为预测长度),然后一次性、并行地解码出预测长度的目标语言的音译结果,例如,非自回归模型音译“harmony”时,可以先预测得到音译结果的长度为2,然后一次性解码出“鸿莫”,但通常音译质量不高。自回归模型进行音译时不需要预测音译结果的长度,而是按照顺序依次解码出音译结果中的各个字/词,例如,自回归模型音译“harmony”时,会从左往右依次解码出“哈”、“莫”、“尼”,但生成长度不可控。而半自回归模型可以预测音译结果的长度,然后并行地进行解码,解码过程迭代多次,并输出预测长度的目标语言的音译结果,例如,半自回归模型音译“harmony”时,可以先预测得到音译结果的长度为2,然后,第一次并行解码出“鸿莫”,第二次基于第一次的解码结果进行修正,并行解码出“鸿蒙”,不仅音译结果的长度可预测/可控,而且由于迭代多次,音译结果的质量较高。在一些示例中,音译模型是基于Transformer模型(例如半自回归的翻译模型)得到的。In one implementation, the transliteration model is a semi-autoregressive model. The semi-autoregressive model is different from the autoregressive model and the non-autoregressive model. When performing transliteration, the non-autoregressive model needs to first predict the length of the transliteration result (which can be referred to as the predicted length), and then decode the predicted length in one go and in parallel. For the transliteration results in the target language, for example, when the non-autoregressive model transliterates "harmony", it can first predict that the length of the transliteration result is 2, and then decode "Hongmo" at once, but usually the transliteration quality is not high. When the autoregressive model transliterates, it does not need to predict the length of the transliteration result. Instead, it decodes each character/word in the transliteration result in sequence. For example, when the autoregressive model transliterates "harmony", it will decode "harmony" from left to right. "Ha", "Mo", "Ni", but the generation length is uncontrollable. The semi-autoregressive model can predict the length of the transliteration result, and then decode it in parallel. The decoding process iterates multiple times and outputs the transliteration result of the target language with the predicted length. For example, when the semi-autoregressive model transliterates "harmony", it can first predict The length of the transliteration result obtained is 2. Then, the first time the transliteration result is decoded in parallel to "Hongmo", the second time is corrected based on the first decoding result, and "Hongmeng" is decoded in parallel. Not only is the length of the transliteration result predictable/can control, and due to multiple iterations, the quality of the transliteration results is higher. In some examples, the transliteration model is based on a Transformer model (such as a semi-autoregressive translation model).
在一种实施方式中,电子设备可以将源语言特征信息作为音译模型的输入,得到输出:一个或多个目标语言的音译结果,即目标语言的音译集合1。In one implementation, the electronic device can use the source language feature information as the input of the transliteration model to obtain an output: one or more transliteration results of the target language, that is, the transliteration set 1 of the target language.
在一种实施方式中,电子设备可以将源语言特征信息和长度特征信息一起作为音译模型的输入,音译模型可以基于长度特征信息预测源语言特征信息对应的音译结果的长度(可简称为预测长度),预测长度的数量可以为一个或多个。在一些示例中,音译模型输出的一个或多个音译结果的长度属于上述预测长度。例如,源语言信息为“harmony”,音译模型得到的预测长度包括2和3,音译模型可以输出2个长度为2的音译结果“哈梦”、“鸿蒙”,以及1个长度为3的音译结果“哈莫尼”。In one implementation, the electronic device can use the source language feature information and the length feature information together as input to the transliteration model. The transliteration model can predict the length of the transliteration result corresponding to the source language feature information based on the length feature information (which can be referred to as predicted length for short). ), the number of prediction lengths can be one or more. In some examples, the length of one or more transliteration results output by the transliteration model falls within the above predicted length. For example, if the source language information is "harmony", the predicted lengths obtained by the transliteration model include 2 and 3. The transliteration model can output two transliteration results of length 2 "Harmony" and "Hongmeng", and one transliteration result of length 3 The result is "Harmony".
在一种实施方式中,音译模型可以对输入的信息进行编码并得到编码后的向量(例如隐向量),然后对编码后的向量进行解码并得到一个或多个目标语言的音译结果,其中:In one implementation, the transliteration model can encode the input information and obtain an encoded vector (such as a latent vector), and then decode the encoded vector and obtain one or more transliteration results in the target language, where:
在一些示例中,音译模型可以对源语言特征信息E(X)进行编码并得到源语言编码信息,源语言编码信息可以表示为H(X),H(X)可以用于解码以获取输出的目标语言的音译集合1。In some examples, the transliteration model can encode the source language feature information E(X) and obtain the source language encoding information. The source language encoding information can be expressed as H(X), and H(X) can be used to decode to obtain the output. A collection of transliterations in the target language1.
在一些示例中,音译模型可以对长度特征信息E(LEN)进行编码并得到长度编码信息,长度编码信息可以表示为H(LEN)。音译模型可以根据H(LEN)生成最佳的K个预测长度,K为正整数,上述最佳的K个预测长度例如为预测得到的评分排列在前K位的预测长度,任意一个预测长度对应的评分可以表征音译结果的长度为该预测长度的概率,概率越大,评分越高。可选地,K为可设置的参数,例如,电子设备默认设置K为2,或者,电子设备可以响应于用户操作设置K为用户输入的数字。In some examples, the transliteration model can encode the length feature information E(LEN) and obtain the length encoding information, and the length encoding information can be expressed as H(LEN). The transliteration model can generate the best K prediction lengths based on H(LEN), where K is a positive integer. The above-mentioned best K prediction lengths are, for example, the prediction lengths ranked in the top K positions according to the predicted scores. Any prediction length corresponds to The score of can represent the probability that the length of the transliteration result is the predicted length. The greater the probability, the higher the score. Optionally, K is a settable parameter. For example, the electronic device sets K to 2 by default, or the electronic device can set K to a number input by the user in response to a user operation.
在一些示例中,音译模型可以基于上述K个预测长度对源语言编码信息H(X)进行解码,以得到一个 或多个长度属于这K个预测长度的音译结果。In some examples, the transliteration model can decode the source language encoding information H(X) based on the above K prediction lengths to obtain a Or multiple lengths belong to the transliteration results of these K predicted lengths.
在一些示例中,音译模型可以结合注意力得分对源语言编码信息H(X)进行解码,其中,注意力得分是根据注意力机制得到的,注意力机制可以用于缓解自然语言处理中的长距离依赖的问题,结合注意力机制进行解码可以提升音译质量。In some examples, the transliteration model can decode the source language encoding information H(X) in combination with the attention score, where the attention score is obtained based on the attention mechanism, which can be used to alleviate long-term problems in natural language processing. For the problem of distance dependence, decoding combined with the attention mechanism can improve the quality of transliteration.
在一些示例中,音译模型的解码过程可以迭代T次,T为大于1的正整数,大大提升了解码结果的质量。可选地,T为可设置的参数,例如,电子设备默认设置T为2,或者,电子设备可以响应于用户操作设置T为用户输入的数字。In some examples, the decoding process of the transliteration model can be iterated T times, where T is a positive integer greater than 1, which greatly improves the quality of the decoding results. Optionally, T is a settable parameter. For example, the electronic device sets T to 2 by default, or the electronic device can set T to a number input by the user in response to a user operation.
不限于上述实施方式,在另一种实施方式中,S103之前,该方法还包括:电子设备接收用户指令,该用户指令用于指示音译结果的长度,因此,在S103中,音译模型可以基于该用户指令生成长度为该用户指令指示的长度的一个或多个音译结果。可以理解地,当用户未指示音译结果的长度时,音译模型输出的音译集合1包括预测长度的音译结果,当用户指示音译结果的长度时,音译模型输出的音译集合1包括用户指示的长度的音译结果,可以满足用户在不同场景下的不同需求,提升用户体验。Not limited to the above embodiment, in another embodiment, before S103, the method further includes: the electronic device receives a user instruction, and the user instruction is used to indicate the length of the transliteration result. Therefore, in S103, the transliteration model can be based on the The user instruction generates one or more transliteration results of the length indicated by the user instruction. It can be understood that when the user does not indicate the length of the transliteration results, the transliteration set 1 output by the transliteration model includes transliteration results of the predicted length; when the user indicates the length of the transliteration results, the transliteration set 1 output by the transliteration model includes the transliteration results of the length indicated by the user. The transliteration results can meet the different needs of users in different scenarios and improve user experience.
在一种实施方式中,S103之前,该方法还包括:电子设备接收用户指令,该用户指令用于指示音译结果的首字、尾字和/或包含字,因此,在S103中,音译模型可以基于该用户指令生成匹配的一个或多个音译结果,具体示例如下所示:In one implementation, before S103, the method further includes: the electronic device receives a user instruction, the user instruction is used to indicate the first word, last word and/or containing word of the transliteration result. Therefore, in S103, the transliteration model can One or more matching transliteration results are generated based on the user instruction. Specific examples are as follows:
在一些示例中,该用户指令用于指示音译结果包括字符1,则音译集合1中的音译结果均包括字符1。In some examples, the user instruction is used to indicate that the transliteration result includes character 1, then the transliteration results in the transliteration set 1 all include character 1.
在另一些示例中,该用户指令用于指示音译结果的第一个字符为字符2,则音译集合1中的音译结果的第一个字符均为字符2。In other examples, the user instruction is used to indicate that the first character of the transliteration result is character 2, then the first character of the transliteration results in transliteration set 1 is character 2.
在另一些示例中,该用户指令用于指示音译结果的最后一个字符为字符3,则音译集合11中的音译结果的最后一个字符均为字符3。In other examples, the user instruction is used to indicate that the last character of the transliteration result is character 3, then the last character of the transliteration results in the transliteration set 11 is character 3.
接下来示例性示出音译模型的解码过程。以下示例以解码过程的迭代次数T=2为例进行说明。以下示例以音译集合1中的一个音译结果(可称为目标音译序列)的解码过程为例进行说明,音译集合1中的其他音译结果的解码过程类似。Next, the decoding process of the transliteration model is exemplarily shown. The following example takes the iteration number T=2 of the decoding process as an example. The following example takes the decoding process of a transliteration result (which can be called a target transliteration sequence) in transliteration set 1 as an example. The decoding process of other transliteration results in transliteration set 1 is similar.
首次迭代(当前进行的迭代轮次t=1)可以包括掩码、预测和替换三个步骤,其中:The first iteration (current iteration round t=1) can include three steps: masking, prediction and replacement, where:
掩码可以是对目标音译序列中的全部字符进行掩码,以将目标音译序列中的全部字符均用特殊字符[mask]占位为例进行说明,因此,首次迭代时被掩码的目标音译序列可以表示为下式(1):
Masking can be done by masking all the characters in the target transliteration sequence. Taking the special character [mask] as a placeholder for all the characters in the target transliteration sequence as an example, the target transliteration that is masked in the first iteration is sequence It can be expressed as the following formula (1):
其中,yi是用特殊字符[mask]占位的第i个位置,i为小于或等于N的正整数,N为目标音译序列的长度。在一些示例中,N为电子设备接收到的用户指令指示的音译结果的长度,或者,N为上述K个预测长度中的任意一个。in, y i is the i-th position occupied by the special character [mask], i is a positive integer less than or equal to N, and N is the length of the target transliteration sequence. In some examples, N is the length of the transliteration result indicated by the user instruction received by the electronic device, or N is any one of the above K predicted lengths.
预测可以是在给定源语言信息X的条件下,对中每个被掩码的位置进行预测,即预测X对应的音译结果中处于该位置的字符。在首轮迭代中,对于第i个掩码位置预测得到的字符和对应的置信度可以表示为下式(2)和下式(3):

Prediction can be based on the given source language information Predict each masked position in X, that is, predict the character at that position in the transliteration result corresponding to X. In the first iteration, for the i-th mask position Predicted characters and the corresponding confidence level It can be expressed as the following formula (2) and the following formula (3):

其中,ω为变量,用于表征置信度。可以为在给定源语言信息X的条件下,首次迭代预测X对应的音译结果中处于位置yi的字符时,该字符对应的最大置信度。可以为在给定源语言信息X的条件下,首次迭代预测得到的X对应的音译结果中处于位置yi且对应的置信度最大的字符。对应的置信度可以表征首次迭代时目标音译序列中第i个位置的字符(可简称为第i个字符)为的概率。Among them, ω is a variable used to represent confidence. It can be the maximum confidence corresponding to the character at position yi in the transliteration result corresponding to X when the first iterative prediction is given the source language information X. It can be the character at position yi and corresponding to the highest confidence level in the transliteration result corresponding to X predicted by the first iteration given the source language information X. The corresponding confidence level The character at the i-th position in the target transliteration sequence in the first iteration (can be referred to as the i-th character for short) can be characterized as The probability.
因此,预测得到的目标音译序列可以表示为对应的置信度可以表示为 Therefore, the predicted target transliteration sequence can be expressed as The corresponding confidence level can be expressed as
替换为可选的步骤。当电子设备未接收到用户指令时,可以理解为是用户指令为空,因此,电子设备可以不执行替换步骤,可以理解为是,首次迭代时被替换的序列当电子设备接收到用户指令时,可以解析该用户指令,若该用户指令用于指示音译结果的首字、尾字和/或包含字,电子设备可以执行替换步骤,即使用该用户指令指示的字符替换预测得到的目标音译序列中对应的字符,可以但不限于包括以下三种情况:Replaced with optional steps. When the electronic device does not receive the user instruction, it can be understood that the user instruction is empty. Therefore, the electronic device does not need to perform the replacement step. It can be understood that the sequence is replaced in the first iteration. When the electronic device receives a user instruction, it can parse the user instruction. If the user instruction is used to indicate the first word, last word and/or containing word of the transliteration result, the electronic device can perform the replacement step, that is, use the user instruction to indicate the first word, last word and/or containing word of the transliteration result. Target transliteration sequence predicted by character replacement The corresponding characters in can, but are not limited to, include the following three situations:
情况1,用户指令指示音译结果的首字为z1。电子设备可以使用z1替换预测得到的目标音译序列中的第1个字符因此,首次迭代时被替换的序列可以表示为下式(4):
In case 1, the user instruction indicates that the first character of the transliteration result is z 1 . The electronic device can replace the first character in the predicted target transliteration sequence with z 1 Therefore, the sequence that is replaced on the first iteration It can be expressed as the following formula (4):
其中,可参见下式(5):
in, See the following formula (5):
情况2,用户指令指示音译结果的尾字为z2。电子设备可以使用z2替换预测得到的目标音译序列中的第N个字符首次迭代时被替换的序列可以表示为下式(6):
In case 2, the user instruction indicates that the suffix of the transliteration result is z 2 . The electronic device can replace the Nth character in the predicted target transliteration sequence with z 2 The sequence to be replaced on the first iteration It can be expressed as the following formula (6):
其中,可参见下式(7):
in, See the following formula (7):
情况3,用户指令指示音译结果包含字符z3。电子设备可以先计算预测得到的目标音译序列中的每个字符和z3的发音相似度,然后使用z3替换其中和z3的发音相似度最高的字符,因此,首次迭代时被替换的序列可以表示为下式(8):
Case 3: The user instruction indicates that the transliteration result contains the character z 3 . The electronic device can first calculate the predicted target transliteration sequence Each character in has a pronunciation similarity with z 3 , and then uses z 3 to replace the character with the highest pronunciation similarity with z 3. Therefore, the sequence that is replaced in the first iteration It can be expressed as the following formula (8):
其中,si可参见下式(9):
Among them, s i can be seen in the following formula (9):
其中,E(·)可以表征通过发音嵌入层映射,因此,经过发音嵌入层映射得到的高维向量,E(z3)为z3经过发音嵌入层映射得到的高维向量。Sim(·)可以表征计算相似度,例如但不限于用于计算欧式距离、余弦相似度或者皮尔逊相关系数等,因此,si可以表征yi和z3的发音相似度。可以表征包括中和z3的发音相似度最大的字符,该字符被替换为z3Among them, E(·) can represent the mapping through the pronunciation embedding layer, therefore, for The high-dimensional vector obtained by mapping the pronunciation embedding layer, E(z 3 ) is the high-dimensional vector obtained by mapping z 3 through the pronunciation embedding layer. Sim(·) can represent the calculated similarity, for example, but is not limited to, used to calculate Euclidean distance, cosine similarity or Pearson correlation coefficient, etc. Therefore, s i can represent the pronunciation similarity of y i and z 3 . Can be represented by The character with the greatest pronunciation similarity to z 3 is replaced with z 3 .
二次迭代(当前进行的迭代次数t=2)可以包括重掩码和重预测两个步骤,其中:The second iteration (the current number of iterations t=2) can include two steps: re-masking and re-prediction, where:
重掩码可以是对首次迭代得到的目标音译序列中的部分字符进行掩码,以下示例以对其中个字符(不包括中的字符)掩码为例进行说明。电子设备可以将首次迭代的预测得到的置信度按照从小到大的顺序排列,对Y(=1)中置信度排列在前位的字符进行掩码,二次迭代时被掩码的序列可以表示为下式(10):
Re-masking can be the target transliteration sequence obtained in the first iteration Mask some characters in , the following example is used to mask them characters (excluding characters in ) mask as an example to illustrate. The confidence with which the electronic device can predict the first iteration Arrange in order from small to large, with confidence in Y (=1) ranked first Bit characters are masked, and the masked sequence is used in the second iteration It can be expressed as the following formula (10):
其中,可以表示对向上取整。yi是用特殊字符[mask]占位的第i个位置,i为小于或等于的正整数。电子设备不会对中的字符掩码,即不包括中的字符,例如,假设置信度的取值为[0,1],电子设备可以将中的字符对应的置信度设置为1。in, can express Rounded up. y i is the i-th position occupied by the special character [mask], i is less than or equal to positive integer. Electronic equipment is not correct The character mask in Not included characters in , for example, assuming that the value of confidence is [0,1], the electronic device can The confidence level corresponding to the characters in is set to 1.
重预测可以是在给定源语言信息X和二次迭代时未被掩码的序列的条件下,对中每个被掩码的位置进行预测,即预测X对应的音译结果中处于该位置的字符。在二次迭代中,对于第i个掩码位置预测得到的字符和对应的置信度可以表示为下式(11)和(12):

The re-prediction can be the unmasked sequence given the source language information X and the second iteration Under the conditions, for Predict each masked position in X, that is, predict the character at that position in the transliteration result corresponding to X. In the second iteration, for the i-th mask position Predicted characters and the corresponding confidence level It can be expressed as the following formulas (11) and (12):

其中,可参见下式(13):
in, See the following formula (13):
其中,Y为长度是N的目标音译序列。ω为变量,用于表征置信度。可以为在给定源语言信息X和的条件下,二次迭代预测X对应的音译结果中处于位置yi的字符时,该字符对应的最大置信度。可以为在给定源语言信息X和的条件下,二次迭代预测得到的X对应的音译结果中处于位置yi且对应的置信度最大的字符。对应的置信度可以表征二次迭代时目标音译序列中的第i个字符为的概率。Among them, Y is the target transliteration sequence of length N. ω is a variable used to represent confidence. can be given for the source language information X and Under the conditions, when the second iteration predicts the character at position yi in the transliteration result corresponding to X, the maximum confidence corresponding to the character. can be given for the source language information X and Under the condition of , in the transliteration result corresponding to The corresponding confidence level It can be characterized that the i-th character in the target transliteration sequence during the second iteration is The probability.
其中,对于每个未被掩码的位置预测得到的字符和对应的置信度继承上一轮迭代(即首次迭代)的结果,因此,可以表示为下式(14)和(15):

where, for each unmasked position Predicted characters and the corresponding confidence level The result of the previous iteration (i.e. the first iteration) is inherited, therefore, it can be expressed as the following formulas (14) and (15):

不限于上述示例的解码过程,在具体实现中,迭代次数T可以大于2,除首次迭代外的其他迭代过程的说明和上述二次迭代的说明类似,不再详述。It is not limited to the decoding process of the above example. In a specific implementation, the number of iterations T may be greater than 2. The description of other iteration processes except the first iteration is similar to the description of the second iteration above and will not be described in detail.
在一些示例中,假设源语言信息X=Mercedes,目标音译序列的长度为4,用户指令指示音译结果包括字符“德”,因此,基于上述示例的解码过程的说明,该目标音译序列的解码过程可参见下表1。 In some examples, it is assumed that the source language information See Table 1 below.
表1
Table 1
其中,在首次迭代的掩码阶段,目标音译序列中的每个字符均用[mask]占位。在首次迭代的预测阶段,需要对目标音译序列中每个被掩码的位置进行预测,假设预测得到的目标音译序列(y1,y2,y3,y4)为(没,塞,迪,斯)。在首次迭代的替换阶段,由于预测得到的y3(即“迪”)和用户指令指示的字符“德”的发音相似度最高,因此,“迪”被替换为用户指令指示的“德”,此时,目标音译序列(y1,y2,y3,y4)为(没,塞,德,斯)。在二次迭代的重掩码阶段,需要对目标音译序列中置信度较低的个字符进行掩码,假设y2和y4对应的置信度较低,则此时目标音译序列(y1,y2,y3,y4)为(没,[mask],德,[mask])。在二次掩码的重预测阶段,需要对目标音译序列中被掩码的y2和y4进行预测,假设预测得到“赛”和“斯”,则此时目标音译序列(y1,y2,y3,y4)为(没,赛,德,斯),即此次解码过程得到的目标音译序列。Among them, in the masking stage of the first iteration, each character in the target transliteration sequence is occupied by [mask]. In the prediction phase of the first iteration, it is necessary to predict each masked position in the target transliteration sequence. It is assumed that the predicted target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is (无, sai, di , Si). In the replacement phase of the first iteration, since the predicted pronunciation similarity between the predicted y 3 (i.e. "Di") and the character "De" indicated by the user instruction is the highest, "Di" is replaced by the character "De" indicated by the user instruction. At this time, the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is (Me, Sai, De, Si). In the re-masking stage of the second iteration, it is necessary to identify the low-confidence parts of the target transliteration sequence. Mask characters, assuming that the confidence level corresponding to y 2 and y 4 is low, then the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) is (无, [mask], German, [mask ]). In the re-prediction stage of secondary masking, it is necessary to predict the masked y 2 and y 4 in the target transliteration sequence. Assume that "sai" and "si" are predicted, then at this time the target transliteration sequence (y 1 , y 2 , y 3 , y 4 ) are (Mi, Sai, De, Si), which is the target transliteration sequence obtained by this decoding process.
S104:电子设备基于音译知识库对音译集合1进行修正,得到目标语言的音译集合2。S104: The electronic device corrects the transliteration set 1 based on the transliteration knowledge base to obtain the transliteration set 2 of the target language.
在一种实施方式中,S104为可选的步骤。In one implementation, S104 is an optional step.
在一种实施方式中,音译知识库可以包括多个字、词和/或句,例如但不限于包括企业名称、品牌名称、商标名称、产品名称、人名、地名、国名、舶来词、文学著作的名称、电影的名称、音乐的名称、互联网中的音译热词等。In one implementation, the transliteration knowledge base may include multiple characters, words and/or sentences, such as but not limited to company names, brand names, trade names, product names, names of people, place names, country names, borrowed words, and literary works. names, movie names, music names, transliterated hot words on the Internet, etc.
在一种实施方式中,音译知识库可以包括白名单和黑名单,在一些示例中,白名单包括寓意好的吉利字(例如“美”、“斯”)和实际可用于音译的字符,例如但不限于包括企业名称、品牌名称、商标名称、产品名称、人名、地名、国名、舶来词、文学著作的名称、电影的名称、音乐的名称、互联网中的音译热词等。黑名单包括寓意不好的晦气字(例如“没”、“死”)和实际不用于音译的字符。In one implementation, the transliteration knowledge base may include a white list and a black list. In some examples, the white list includes auspicious words with good meanings (such as "美", "斯") and characters that can actually be used for transliteration, such as But it is not limited to including company names, brand names, trademark names, product names, names of people, places, countries, borrowed words, names of literary works, names of movies, names of music, hot transliterated words on the Internet, etc. The blacklist includes unlucky characters with bad connotations (such as "无" and "死") and characters that are not actually used for transliteration.
在一种实施方式中,电子设备可以判断音译集合1中的任意一个音译结果是否包括黑名单中的字符,当判断结果为是时,电子设备可以将该音译结果中属于黑名单的字符4替换为白名单中和字符4发音相同/相似的字符5,当判断结果为否时,不进行替换。可以理解地,这样电子设备提供的音译结果不会包括音译知识库的黑名单中的字符。In one implementation, the electronic device can determine whether any transliteration result in the transliteration set 1 includes characters in the blacklist. When the determination result is yes, the electronic device can replace the character 4 in the transliteration result that belongs to the blacklist. It is character 5 in the white list that has the same/similar pronunciation as character 4. When the judgment result is no, no replacement will be performed. Understandably, the transliteration result provided by the electronic device will not include characters in the blacklist of the transliteration knowledge base.
在一些示例中,假设音译集合1包括音译结果“没赛德斯”,音译知识库的黑名单包括“没”,白名单包括和“没”发音相同/相似的“梅”和“美”,因此,电子设备可以将上述音译结果“没赛德斯”中的“没”替换为“梅”或者“美”,因此音译集合2可以包括替换后的音译结果“梅赛德斯”和/或“美赛德斯”。In some examples, it is assumed that the transliteration set 1 includes the transliteration result "Mei", the blacklist of the transliteration knowledge base includes "Me", and the white list includes "Mei" and "美" which have the same/similar pronunciation as "Me", Therefore, the electronic device can replace "Me" in the above-mentioned transliteration result "Mercedes" with "Mei" or "美", so the transliteration set 2 can include the replaced transliteration result "Mercedes" and/or "Mercedes".
S105:电子设备显示目标语言的音译集合。S105: The electronic device displays the transliteration set of the target language.
在一种实施方式中,S105为可选的步骤。In one implementation, S105 is an optional step.
在一种实施方式中,S104未执行的情况下,电子设备显示目标语言的音译集合1,在另一种实施方式中,S104执行之后,电子设备显示目标语言的音译集合2。In one implementation, when S104 is not executed, the electronic device displays the transliteration set 1 of the target language. In another implementation, after S104 is executed, the electronic device displays the transliteration set 2 of the target language.
在一些示例中,假设源语言信息X=Mercedes,音译模型得到2个预测长度:3和4,电子设备接收到的用户指令指示音译结果包括字符“德”,因此,目标语言的音译集合包括2个长度为4的音译结果“梅赛德斯”和“美赛德斯”,以及1个长度为3的音译结果“美赛德”。In some examples, assuming that the source language information There are two transliteration results of length 4 "Mercedes" and "Mercedes", and one transliteration result of length 3 "Mercedes".
在上述方法中,通过基于大数据训练的发音嵌入层和半自回归模型实现自动音译,有效提升音译质量,并且可以一次性输出多个目标语言的音译结果以提供给用户,可供用户选择的音译结果的范围大大提升。电子设备还可以使用音译知识库对音译模型输出的音译结果进行自动修正,进一步提升音译结果的质量。In the above method, automatic transliteration is realized through the pronunciation embedding layer and semi-autoregressive model trained based on big data, which effectively improves the quality of transliteration, and can output transliteration results of multiple target languages at one time to provide users with options to choose from. The range of transliteration results has been greatly improved. Electronic devices can also use the transliteration knowledge base to automatically correct the transliteration results output by the transliteration model, further improving the quality of the transliteration results.
并且,用户可以自定义音译结果的长度、首字、尾字、包含字中的至少一项,可以理解为是实现了用户导向的个性化音译策略,有效满足用户的个性化需求。In addition, users can customize at least one of the length, first word, last word, and included word of the transliteration result, which can be understood as realizing a user-oriented personalized transliteration strategy that effectively meets the personalized needs of users.
图5A示例性示出一种发音嵌入层的训练过程的示意图。在一种实施方式中,该训练过程可以由电子设备100自行执行,在另一种实施方式中,该训练过程可以由网络设备执行,电子设备100接收网络设备发送的发音嵌入层。图5A以电子设备100执行该训练过程为例进行说明。该训练过程可以但不限于包括以下步骤:Figure 5A exemplarily shows a schematic diagram of a training process of the pronunciation embedding layer. In one implementation, the training process can be performed by the electronic device 100 itself. In another implementation, the training process can be performed by a network device, and the electronic device 100 receives the pronunciation embedding layer sent by the network device. FIG. 5A takes the electronic device 100 performing the training process as an example to illustrate. The training process may, but is not limited to, include the following steps:
1.电子设备100初始化发音嵌入层200。 1. The electronic device 100 initializes the pronunciation embedding layer 200.
在一种实施方式中,电子设备100可以先生成一个高维的矩阵(即初始化得到的发音嵌入层200),例如,初始化得到的发音嵌入层200是一个200万×300的矩阵,可以包括200万个1×300的高维向量,每个向量可以指示一个字符。In one implementation, the electronic device 100 may first generate a high-dimensional matrix (ie, the initialized pronunciation embedding layer 200). For example, the initialized pronunciation embedding layer 200 is a 2 million × 300 matrix, which may include 200 Ten thousand 1×300 high-dimensional vectors, each vector can indicate a character.
2.电子设备100训练发音嵌入层200。2. The electronic device 100 trains the pronunciation embedding layer 200.
在一种实施方式中,电子设备100可以使用语音识别***(automatic speech recognition,ASR)从大数据中获取大量发音相似/相同的文本,以用于训练发音嵌入层200。In one implementation, the electronic device 100 can use a speech recognition system (automatic speech recognition, ASR) to obtain a large amount of texts with similar/same pronunciation from big data for training the pronunciation embedding layer 200 .
在一些示例中,以“I write a book”为参考语句,电子设备100可以使用ASR从大数据中获取和参考语句中的每个单词发音相似/相同的单词:和“I”相关的“eye”,和“write”相关的“red”、“read”、“white”,和“a”相关的“the”、“an”,和“book”相关的“boot”、“foot”、“cook”、“root”。然后,电子设备100可以基于参考语句和获取的发音相似/相同的单词生成训练数据,具体地,以参考语句的格式为基准,训练数据中的每个语句包括四个单词,每个单词为参考语句中对应位置的单词或者发音相似/相同的单词,即第一个单词包括2种取值:“I”或“eye”,第二个单词包括4种取值:“write”、“red”、“read”或“white”,第三个单词包括3种取值:“a”、“the”或“an”,第四个单词包括5种取值:“book”、“boot”、“foot”、“cook”或“root”,因此,上述参考语句对应的训练数据可以包括:2×4×3×5=120个语句,具体可参见图5B。In some examples, taking "I write a book" as a reference sentence, the electronic device 100 can use ASR to obtain words from big data that are similar/identical in pronunciation to each word in the reference sentence: "eye" related to "I" ", "red", "read", "white" related to "write", "the", "an" related to "a", "boot", "foot", "cook" related to "book" ", "root". Then, the electronic device 100 may generate training data based on the reference sentence and the acquired words with similar/identical pronunciations. Specifically, using the format of the reference sentence as a benchmark, each sentence in the training data includes four words, and each word is a reference Words at corresponding positions in the sentence or words with similar/same pronunciation, that is, the first word includes 2 values: "I" or "eye", and the second word includes 4 values: "write", "red" , "read" or "white", the third word includes 3 values: "a", "the" or "an", the fourth word includes 5 values: "book", "boot", " foot", "cook" or "root", therefore, the training data corresponding to the above reference sentences can include: 2×4×3×5=120 sentences, see Figure 5B for details.
在一种实施方式中,电子设备100可以将上述获取的发音相似/相同的文本作为训练数据,通过训练连续词袋模型(the continuous bag-of-words model,CBOW)来训练发音嵌入层200,从而更新发音嵌入层200的权重,其中,CBOW可以用于预测给定上下文对应的中心词,例如,假设训练数据包括“I write a book”,可以将其中第二个单词“write”的上下文“I_a book”作为CBOW的输入,CBOW可以预测其中用“_”表征的第二个单词。在一些示例中,电子设备100可以将训练数据中相对任一单词的上下文作为CBOW的输入,获取CBOW输出该单词的概率(可简称为预测概率),然后使用获取到的预测概率更新发音嵌入层200的权重,例如,电子设备100可以将上述示例的120个语句中相对第二个单词“write”、“red”、“read”、“white”的上下文(可以理解为是将这120个语句中的第二个单词替换为“_”)作为CBOW的输入,获取CBOW分别输出“write”、“red”、“read”、“white”的预测概率,并基于反向传播算法根据该预测概率更新发音嵌入层200的权重。In one implementation, the electronic device 100 can use the texts with similar/identical pronunciations obtained above as training data to train the pronunciation embedding layer 200 by training the continuous bag-of-words model (CBOW), Thereby updating the weight of the pronunciation embedding layer 200, where CBOW can be used to predict the center word corresponding to a given context. For example, assuming that the training data includes "I write a book", the context of the second word "write" can be I_a book" is used as the input of CBOW, and CBOW can predict the second word represented by "_". In some examples, the electronic device 100 can use the context of any word in the training data as the input of CBOW, obtain the probability that CBOW outputs the word (which may be referred to as the predicted probability for short), and then use the obtained predicted probability to update the pronunciation embedding layer. A weight of 200, for example, the electronic device 100 can compare the context of the second word "write", "red", "read", and "white" in the 120 sentences in the above example (which can be understood as adding these 120 sentences (the second word in is replaced with "_") as the input of CBOW, obtain the predicted probabilities of CBOW outputting "write", "red", "read", and "white" respectively, and based on the predicted probability based on the back propagation algorithm Update the weights of the pronunciation embedding layer 200.
在图5A所示的训练过程中,发音嵌入层可以是基于大量发音相似/相同的数据训练得到的,可以理解为是学习到了大量发音维度的信息,而不是像普通嵌入层那样仅学习语义维度的信息,例如,普通嵌入层只会将“I write a book”(单词之间具有语义相关性)作为训练数据,而不会将上述120个语句中除“I write a book”以外的语句(单词之间不具有语义相关性)作为训练数据。In the training process shown in Figure 5A, the pronunciation embedding layer can be trained based on a large amount of data with similar/identical pronunciation, which can be understood as learning a large amount of pronunciation dimension information, instead of only learning the semantic dimension like a normal embedding layer. Information, for example, the ordinary embedding layer will only use "I write a book" (words have semantic correlation) as training data, and will not use sentences other than "I write a book" among the above 120 statements ( There is no semantic correlation between words) as training data.
可以理解地,由于发音嵌入层和普通嵌入层的训练数据不同、学习信息的维度不同,因此,基于普通嵌入层和发音嵌入层的聚类方式也不同,普通嵌入层是基于语义相似度进行聚类的,发音嵌入层是基于发音相似度进行聚类的,具体示例可参见图6。其中,图6的(A)示出了经发音嵌入层映射得到的特征信息在高维空间中的示意图,图6的(B)示出了经普通嵌入层映射得到的特征信息在高维空间中的示意图。Understandably, since the pronunciation embedding layer and the ordinary embedding layer have different training data and different dimensions of learning information, the clustering methods based on the ordinary embedding layer and the pronunciation embedding layer are also different. The ordinary embedding layer clusters based on semantic similarity. The pronunciation embedding layer is clustered based on pronunciation similarity. For specific examples, see Figure 6. Among them, (A) of Figure 6 shows a schematic diagram of the feature information mapped by the pronunciation embedding layer in the high-dimensional space, and (B) of Figure 6 shows the feature information mapped by the ordinary embedding layer in the high-dimensional space. schematic diagram in.
如图6的(A)所示,two作为输入信息经发音嵌入层映射后,在高维空间和发音相似度较高的to、too距离较近,而和语义相似度较高的one、three距离较远,其中,one经发音嵌入层映射后,可以在高维空间和发音相似度较高的won距离较近。As shown in (A) of Figure 6, after two is used as input information and mapped by the pronunciation embedding layer, it is closer to to and too with higher pronunciation similarity in the high-dimensional space, but is closer to one and three with higher semantic similarity. Among them, one can be closer to won with higher pronunciation similarity in high-dimensional space after being mapped by the pronunciation embedding layer.
如图6的(B)所示,two作为输入信息经普通嵌入层映射后,在高维空间和语义相似度较高的one、three距离较近,而和发音相似度较高的to、too距离较远,其中,to经普通嵌入层映射后,可以在高维空间和语义相似度较高的forth距离较近,too经普通嵌入层映射后,可以在高维空间和语义相似度较高的also距离较近。As shown in (B) of Figure 6, after two is used as input information and mapped by the ordinary embedding layer, it is closer to one and three with higher semantic similarity in the high-dimensional space, but is closer to to and too with higher pronunciation similarity. The distance is relatively far. Among them, after to is mapped by the ordinary embedding layer, it can be closer to the fourth in the high-dimensional space with higher semantic similarity. After too is mapped by the ordinary embedding layer, it can be in the high-dimensional space and has higher semantic similarity. The also distance is closer.
不限于上述示例的情况,在另一些示例中,一个类可以包括更多或更少的特征信息。Without being limited to the above examples, in other examples, a class may include more or less feature information.
可以理解地,通常翻译功能仅实现直译(既忠实源语言信息的内容,又符合源语言信息的结构形式的翻译)和/或意译(在忠实源语言信息的内容的前提下,可以摆脱源语言信息的结构的束缚,使音译结果符合目标语言的规范),不会实现音译。而直译和/或意译任务的学习目标是学习大量语法规则和语义知识,从而输出语义正确且语法通顺的目标语言内容,例如,假设目标语言内容包括两个字符,当解码器解码出其中一个字符为“德”时,直译和/或意译模型更倾向于同时解码出另一个字符为:和“德”语义相关的“行”或“品”等字符。音译任务的学习目标是学习发音拆分和组合规则,从而解码出发音相近且符合用户实际发音的目标语言内容,无需关注语义和语法,例如,假设源语言信息为“Mercedes”,目标语言内容包括两个字符,当解码器解码出其中一个字符为“德”时,音译模型往往按照发音拆分同时解码出另一个字符为“斯”或其他发音相同/相似的字符。 Understandably, the translation function usually only implements literal translation (translation that is both faithful to the content of the source language information and conforms to the structural form of the source language information) and/or free translation (on the premise of being faithful to the content of the source language information, it can get rid of the source language information). The constraints of the information structure make the transliteration result conform to the specifications of the target language), and transliteration will not be achieved. The learning goal of literal translation and/or free translation tasks is to learn a large number of grammatical rules and semantic knowledge, so as to output semantically correct and grammatically fluent target language content. For example, assuming that the target language content includes two characters, when the decoder decodes one of the characters When it is "德", the literal translation and/or free translation model is more likely to decode another character at the same time: characters such as "行" or "品" that are semantically related to "德". The learning goal of the transliteration task is to learn pronunciation splitting and combination rules, so as to decode the target language content that has similar pronunciation and conforms to the user's actual pronunciation, without paying attention to semantics and grammar. For example, assuming that the source language information is "Mercedes", the target language content includes Two characters, when the decoder decodes one of the characters as "德", the transliteration model often splits according to pronunciation and decodes the other character as "Si" or other characters with the same/similar pronunciation.
本申请中,音译模型的输入是发音嵌入层的输出,而用于训练发音嵌入层的大数据可以体现用户平时发音过程中的模糊缺省、强读弱读等特点,因此大大方便了音译模型学习发音拆分和组合规则,让音译模型可以充分捕捉到谐音转换、清/浊辅音转换、首音优化、尾音省略等音译技巧,有效提升音译结果的质量,减少自动音译和人工音译之间的质量差距。In this application, the input of the transliteration model is the output of the pronunciation embedding layer, and the big data used to train the pronunciation embedding layer can reflect the characteristics of fuzzy defaults, strong pronunciation and weak pronunciation in the user's usual pronunciation process, thus greatly facilitating the transliteration model. Learn pronunciation splitting and combination rules so that the transliteration model can fully capture transliteration techniques such as homophonic conversion, voiceless/voiced consonant conversion, initial consonant optimization, and final consonant omission, effectively improving the quality of transliteration results and reducing the gap between automatic transliteration and manual transliteration. quality gap.
图7示例性示出又一种电子设备100的软件架构示意图。FIG. 7 exemplarily shows a schematic diagram of the software architecture of yet another electronic device 100 .
如图7所示,电子设备100可以包括发音嵌入层200、音译模型300、音译知识库400和修正模块500,其中,音译模型300可以包括编码器301、长度预测模块302、注意力机制303和解码器304。音译知识库400可以包括白名单和黑名单。As shown in FIG. 7 , the electronic device 100 may include a pronunciation embedding layer 200 , a transliteration model 300 , a transliteration knowledge base 400 and a correction module 500 . The transliteration model 300 may include an encoder 301 , a length prediction module 302 , an attention mechanism 303 and Decoder 304. Transliteration knowledge base 400 may include whitelists and blacklists.
发音嵌入层200可以接收源语言信息X和长度信息LEN作为输入,输出源语言特征信息E(X)和长度特征信息E(LEN)。编码器301可以接收E(X)和E(LEN)作为输入,分别对E(X)和E(LEN)进行编码,以输出源语言编码信息H(X)和长度编码信息H(LEN)。长度预测模块302可以接收H(LEN)作为输入,将H(LEN)依次输入到池化层(Pooling)和分类器中,并输出K个预测长度,其中,分类器例如包括线性层(Linear)和Softmax。注意力机制303可以接收H(X)作为输入,并输出注意力得分。The pronunciation embedding layer 200 may receive source language information X and length information LEN as input, and output source language feature information E(X) and length feature information E(LEN). The encoder 301 may receive E(X) and E(LEN) as input, encode E(X) and E(LEN) respectively, to output source language encoding information H(X) and length encoding information H(LEN). The length prediction module 302 can receive H(LEN) as input, input H(LEN) into a pooling layer (Pooling) and a classifier in sequence, and output K predicted lengths, where the classifier includes, for example, a linear layer (Linear). and Softmax. The attention mechanism 303 can receive H(X) as input and output an attention score.
在一种实施方式中,解码器304可以接收H(X)、K个预测长度和注意力得分作为输入,基于K个预测长度和注意力得分对H(X)进行迭代解码,并输出一个或多个长度属于这K个预测长度的目标语言的音译结果(即目标语言的音译集合1)。在另一种实施方式中,解码器304可以接收H(X)、用于指示音译结果的长度的用户指令和注意力得分作为输入,基于该用户指令和注意力得分对H(X)进行迭代解码,并输出一个或多个长度为该用户指令指示的长度的目标语言的音译结果(即目标语言的音译集合1)。In one implementation, the decoder 304 may receive H(X), K prediction lengths, and attention scores as input, iteratively decode H(X) based on the K prediction lengths and attention scores, and output one or Transliteration results of multiple target languages whose lengths belong to these K predicted lengths (i.e., transliteration set 1 of the target language). In another implementation, decoder 304 may receive as input H(X), a user instruction indicating the length of the transliteration result, and an attention score, and iterate H(X) based on the user instruction and attention score. Decode and output one or more transliteration results of the target language with a length indicated by the user instruction (ie, transliteration set 1 of the target language).
在一种实施方式中,解码器304还可以接收用于指示音译结果的内容的用户指令,例如用于指示音译结果的首字、尾字和/或包含字的用户指令,结合该用户指令对H(X)进行解码,并输出和该用户指令匹配的音译集合1。In one embodiment, the decoder 304 may also receive a user instruction for indicating the content of the transliteration result, such as a user instruction for indicating the first word, last word and/or containing words of the transliteration result, and combine the user instruction with the user instruction to indicate the content of the transliteration result. H(X) decodes and outputs the transliteration set 1 that matches the user instruction.
解码器304输出的目标语言的音译集合1可以进行判断:是否命中音译知识库400中的黑名单,即音译集合1中的音译结果是否包括黑名单中的字符。当判断结果为是时,音译集合1可以输入至修正模块500,修正模块500可以使用音译知识库400中的白名单,将音译集合1的音译结果中属于黑名单的字符替换为白名单中的字符,并输出目标语言的音译集合2以提供给用户。当判断结果为否时,可以直接输出目标语言的音译集合1以提供给用户。The transliteration set 1 of the target language output by the decoder 304 can be used to determine whether it hits the blacklist in the transliteration knowledge base 400, that is, whether the transliteration results in the transliteration set 1 include characters in the blacklist. When the judgment result is yes, the transliteration set 1 can be input to the correction module 500. The correction module 500 can use the white list in the transliteration knowledge base 400 to replace the characters in the blacklist in the transliteration result of the transliteration set 1 with the characters in the white list. characters, and output the transliteration set 2 of the target language to provide to the user. When the judgment result is no, the transliteration set 1 of the target language can be directly output to be provided to the user.
下面介绍本申请实施例涉及的应用场景以及该场景下的用户界面实施例。The application scenarios involved in the embodiments of this application and the user interface embodiments in this scenario are introduced below.
图8示例性示出一种翻译应用的用户界面的示意图。Figure 8 exemplarily shows a schematic diagram of a user interface of a translation application.
如图8的(A)所示,电子设备100可以显示翻译应用的用户界面810。用户界面810可以包括翻译信息811、输入框812、翻译选项813、确定控件814和显示框815,其中,翻译信息811可以包括源语言(例如英文)和目标语言(例如中文),电子设备100可以响应针对翻译信息811的操作,切换源语言和/或目标语言。输入框812可以用于输入待翻译的内容。翻译选项813可以指示翻译的类型,例如,用户界面810中的翻译选项813指示“直译/意译”。确定控件814可以用于触发对输入框812中的内容进行翻译(例如具体为翻译选项813指示的“直译/意译”),翻译结果可以用于在显示框815中显示。As shown in (A) of FIG. 8 , the electronic device 100 may display a user interface 810 of the translation application. The user interface 810 may include translation information 811, an input box 812, a translation option 813, a determination control 814 and a display box 815, where the translation information 811 may include a source language (eg English) and a target language (eg Chinese), and the electronic device 100 may In response to the operation on the translation information 811, the source language and/or the target language are switched. The input box 812 can be used to input content to be translated. The translation option 813 may indicate the type of translation, for example, the translation option 813 in the user interface 810 indicates "literal translation/free translation." The determination control 814 can be used to trigger translation of the content in the input box 812 (for example, specifically "literal translation/free translation" indicated by the translation option 813 ), and the translation result can be used to be displayed in the display box 815 .
在一种实施方式中,电子设备100可以响应针对翻译选项813的操作(例如触摸操作,该触摸操作例如为点击),切换翻译类型,例如将“直译/意译”切换为“音译”,并显示音译功能的用户界面,具体可参见图8的(B)所示的用户界面820。In one implementation, the electronic device 100 may respond to an operation (such as a touch operation, such as a click) on the translation option 813, switch the translation type, for example, switch "literal translation/free translation" to "transliteration", and display For the user interface of the transliteration function, please refer to the user interface 820 shown in (B) of Figure 8 for details.
如图8的(B)所示,用户界面820和用户界面810类似,区别在于,用户界面820中的翻译选项813指示的翻译类型为“音译”,因此,确定控件814可以用于触发对输入框812中的内容进行音译,即用于触发执行以上实施例中的音译方法,此时,输入框812中的内容即为以上实施例中的源语言信息,以上实施例中的一个或多个目标语言的音译结果可以在显示框815中显示。并且,用户界面820还包括自定义区域821,自定义区域821可以包括用于设置音译结果的首字的输入框821A、用于设置音译结果的尾字的输入框821B、用于设置音译结果包含的字符的输入框821C、用于设置音译结果的长度的输入框821D,自定义区域821例如用于用户输入以上实施例中的用户指令。As shown in (B) of Figure 8, the user interface 820 is similar to the user interface 810. The difference is that the translation type indicated by the translation option 813 in the user interface 820 is "transliteration". Therefore, the determination control 814 can be used to trigger the input The content in the box 812 is transliterated, that is, used to trigger the execution of the transliteration method in the above embodiment. At this time, the content in the input box 812 is the source language information in the above embodiment. One or more of the above embodiments The transliteration result of the target language may be displayed in the display box 815. Moreover, the user interface 820 also includes a custom area 821. The custom area 821 may include an input box 821A for setting the first word of the transliteration result, an input box 821B for setting the last word of the transliteration result, an input box 821B for setting the transliteration result containing The character input box 821C, the input box 821D for setting the length of the transliteration result, and the custom area 821 are, for example, used for the user to input user instructions in the above embodiment.
在一种实施方式中,电子设备100可以接收用户基于用户界面820中的输入框812输入的内容,假设为图9A所示的用户界面910中的输入框812所示的“Harmony”,即电子设备100可以执行图4所示的S101,“Harmony”即为源语言信息。然后,电子设备100可以响应针对用户界面910中的确定控件814的操作(例如触摸操作,该触摸操作例如为点击),对输入框812中的内容“Harmony”进行音译,即执行图4 所示的S102-S103或者S102-S104,得到的目标语言的音译集合包括3个音译结果:“哈莫尼”、“鸿蒙”和“哈梦”,电子设备可以在用户界面910中的显示框815显示这3个音译结果,即执行图4所示的S105。In one implementation, the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Harmony" shown in the input box 812 in the user interface 910 shown in Figure 9A, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Harmony" is the source language information. Then, the electronic device 100 may transliterate the content "Harmony" in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 910, that is, execute FIG. 4 As shown in S102-S103 or S102-S104, the obtained transliteration set of the target language includes three transliteration results: "Harmoni", "Hongmeng" and "Hameng". The electronic device can display the box in the user interface 910 815 displays these three transliteration results, that is, executes S105 shown in Figure 4.
在一种实施方式中,电子设备100可以接收用户基于用户界面820中的输入框812输入的内容,假设为图9B所示的用户界面920中的输入框812所示的“Mercedes”,即电子设备100可以执行图4所示的S101,“Mercedes”即为源语言信息。电子设备100还可以接收用户基于用户界面820包括的自定义区域821中的输入框821C输入的内容:“德”,即电子设备100可以执行图4所述的接收用户指令,并且该用户指令用于指示音译结果包括字符“德”。然后,电子设备100可以响应针对用户界面920中的确定控件814的操作(例如触摸操作,该触摸操作例如为点击),对输入框812中的内容“Mercedes”进行音译,即执行图4所示的S102-S103或者S102-S104,得到的目标语言的音译集合包括3个音译结果:“梅赛德斯”、“美赛德斯”和“美赛德”,这3个音译结果均包括上述用户指令指示的字符“德”。电子设备可以在用户界面920中的显示框815显示这3个音译结果,即执行图4所示的S105。In one implementation, the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Mercedes" shown in the input box 812 in the user interface 920 shown in FIG. 9B, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Mercedes" is the source language information. The electronic device 100 can also receive the content input by the user based on the input box 821C in the custom area 821 included in the user interface 820: "DE", that is, the electronic device 100 can perform the receiving user instruction described in FIG. 4, and the user instruction is Yu indicates that the transliteration result includes the character "德". Then, the electronic device 100 may transliterate the content "Mercedes" in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 920 , that is, perform the steps shown in FIG. 4 S102-S103 or S102-S104, the obtained transliteration set of the target language includes 3 transliteration results: "Mercedes", "Mercedes" and "Mercedes", these 3 transliteration results include the above The user command indicates the character "DE". The electronic device can display these three transliteration results in the display box 815 in the user interface 920, that is, execute S105 shown in Figure 4.
在一种实施方式中,电子设备100可以接收用户基于用户界面820中的输入框812输入的内容,假设为图9C所示的用户界面930中的输入框812所示的“Harmony”,即电子设备100可以执行图4所示的S101,“Harmony”即为源语言信息。电子设备100还可以接收用户基于用户界面820包括的自定义区域821中的输入框821D输入的内容:“2”,即电子设备100可以执行图4所述的接收用户指令,并且该用户指令用于指示音译结果的长度为2。然后,电子设备100可以响应针对用户界面930中的确定控件814的操作(例如触摸操作,该触摸操作例如为点击),对输入框812中的内容“Harmony”进行音译,即执行图4所示的S102-S103或者S102-S104,得到的目标语言的音译集合包括2个音译结果:“鸿蒙”和“哈梦”,电子设备可以在用户界面930中的显示框815显示这2个音译结果,即执行图4所示的S105。相比图9A所示的实施方式,图9C所示的实施方式中,电子设备100还接收到用于指示音译结果的长度为2的用户指令,因此,图9C所示的音译集合包括图9A所示的音译集合中长度为2的2个音译结果。In one implementation, the electronic device 100 can receive content input by the user based on the input box 812 in the user interface 820, assuming it is "Harmony" shown in the input box 812 in the user interface 930 shown in Figure 9C, that is, electronic The device 100 can execute S101 shown in Figure 4, and "Harmony" is the source language information. The electronic device 100 can also receive the content input by the user based on the input box 821D in the custom area 821 included in the user interface 820: "2", that is, the electronic device 100 can perform the receiving user instruction described in Figure 4, and the user instruction is Yu indicates that the length of the transliteration result is 2. Then, the electronic device 100 may transliterate the content "Harmony" in the input box 812 in response to an operation (such as a touch operation, such as a click) on the determination control 814 in the user interface 930, that is, perform the steps shown in FIG. 4 In S102-S103 or S102-S104, the obtained transliteration set of the target language includes two transliteration results: "Hongmeng" and "Hameng". The electronic device can display these two transliteration results in the display box 815 in the user interface 930, That is, S105 shown in Figure 4 is executed. Compared with the embodiment shown in FIG. 9A , in the embodiment shown in FIG. 9C , the electronic device 100 also receives a user instruction indicating that the length of the transliteration result is 2. Therefore, the transliteration set shown in FIG. 9C includes the transliteration set shown in FIG. 9A The shown transliteration set contains 2 transliteration results of length 2.
不限于上述所示的实施方式,在另一种实施方式中,用户还可以基于用户界面820中的自定义区域821设置首字或者尾字,在另一种实施方式中,用户还可以基于用户界面820中的自定义区域821设置首字、尾字、包含字和长度中的至少两项,具体示例和上述实施方式类似,不再赘述。Not limited to the implementation shown above, in another implementation, the user can also set the first word or last word based on the custom area 821 in the user interface 820. In another implementation, the user can also set the first word or the last word based on the user-defined area 821. The custom area 821 in the interface 820 sets at least two items of the first word, the last word, the inclusive word, and the length. The specific examples are similar to the above embodiments and will not be described again.
图10示例性示出一种浏览器应用的用户界面的示意图。Figure 10 exemplarily shows a schematic diagram of a user interface of a browser application.
如图10的(A)所示,电子设备100可以显示浏览器应用的用户界面1010,用户界面1010可以包括搜索框1011,搜索框1011可以包括字符“搜索或输入网址”,以提示用户输入搜索词或者想查看的网页的网址。As shown in (A) of FIG. 10 , the electronic device 100 may display a user interface 1010 of a browser application. The user interface 1010 may include a search box 1011 , and the search box 1011 may include the characters “search or enter a URL” to prompt the user to enter a search. word or the URL of the web page you want to view.
在一种实施方式中,电子设备100可以接收用户基于用户界面1010中的搜索框1011输入的内容,假设为图10的(B)所示的用户界面1020中的搜索框1011所示的“harmony”,即电子设备100可以执行图4所示的S101,“harmony”即为源语言信息。电子设备100可以对源语言信息“harmony”进行音译,即执行图4所示的S102-S103或者S102-S104,得到的目标语言的音译集合包括3个音译结果:“哈莫尼”、“鸿蒙”和“哈梦”。电子设备100可以在用户界面1020中的候选列表1021显示这3个音译结果,即执行图4所示的S105。如图10的(B)所示,候选列表1021可以包括多个选项,其中任意一个选项包括和搜索框1011中的内容“harmony”相关的内容,这多个选项例如但不限于包括:包括“harmony”的选项1021A、包括“哈莫尼”的选项1021B、包括“鸿蒙”的选项1021C、包括“哈梦”的选项1021D、包括“harmony是什么意思”的选项1021E、包括“harmonyOS”的选项1021F、包括“harmony形容词”的选项1021G。在一些示例中,电子设备100可以响应针对这多个选项中的任意一个选项的操作(例如触摸操作,该触摸操作例如为点击),在互联网中搜索和该选项包括的内容相关的信息,例如,该选项为选项1021C,则电子设备100可以响应于针对选项1021C的操作,显示和“鸿蒙”相关的搜索结果。In one implementation, the electronic device 100 can receive content input by the user based on the search box 1011 in the user interface 1010, assuming it is "harmony" shown in the search box 1011 in the user interface 1020 shown in (B) of Figure 10 ", that is, the electronic device 100 can execute S101 shown in Figure 4, and "harmony" is the source language information. The electronic device 100 can transliterate the source language information "harmony", that is, execute S102-S103 or S102-S104 shown in Figure 4, and the obtained transliteration set of the target language includes three transliteration results: "harmony", "harmony" ” and “Harmony.” The electronic device 100 can display these three transliteration results in the candidate list 1021 in the user interface 1020, that is, perform S105 shown in FIG. 4. As shown in (B) of FIG. 10 , the candidate list 1021 may include multiple options, any of which includes content related to the content “harmony” in the search box 1011 . The multiple options include, for example, but are not limited to: including “ "harmony" option 1021A, including "Harmony" option 1021B, including "Hongmeng" option 1021C, including "Harmony" option 1021D, including "what does harmony mean" option 1021E, including "harmonyOS" option 1021F, option 1021G including "harmony adjective". In some examples, the electronic device 100 may respond to an operation (such as a touch operation, such as a click) on any one of the multiple options, and search the Internet for information related to the content included in the option, such as , the option is option 1021C, then the electronic device 100 can display search results related to "Hongmeng" in response to the operation on option 1021C.
在一种实施方式中,电子设备100可以接收用户基于用户界面1010中的搜索框1011输入的内容,假设为图10的(C)所示的用户界面1030中的搜索框1011所示的“基因膝盖领带五月翻译”。电子设备100可以从搜索框1011中的内容获取得到关键内容“基因膝盖领带五月”,然后将“基因膝盖领带五月”直译/意译为“gene knee tie may”,即电子设备100可以执行图4所示的S101,“gene knee tie may”即为源语言信息。电子设备100可以对源语言信息“gene knee tie may”进行音译,即执行图4所示的S102-S103或者S102-S104,得到的目标语言的音译结果为“鸡你太美”。电子设备100可以在用户界面1030中的候选列表1031显示该音译结果,即执行图4所示的S105。如图10的(C)所示,候选列表1031可以包括多个选项,例如但不限于包括:包括“基因膝盖领带五月翻译”的选项1031A、包括“gene knee tie may(鸡你太美)”的选项1031B、包括“鸡你太美”的选项1031C、包括“基因膝盖领带五月翻译英文”的选项1031D、 包括“基因膝盖领带五月”的选项1031E,其中,1031B和选项1031C包括的内容与音译结果相关。在一些示例中,电子设备100可以响应针对这多个选项中的任意一个选项的操作,在互联网中搜索和该选项包括的内容相关的信息。In one implementation, the electronic device 100 can receive content input by the user based on the search box 1011 in the user interface 1010, assuming it is "gene" shown in the search box 1011 in the user interface 1030 shown in (C) of Figure 10 Knee Tie May Translation”. The electronic device 100 can obtain the key content "gene knee tie may" from the content in the search box 1011, and then literally/paraphrase "gene knee tie may" as "gene knee tie may", that is, the electronic device 100 can execute the figure In S101 shown in 4, "gene knee tie may" is the source language information. The electronic device 100 can transliterate the source language information "gene knee tie may", that is, execute S102-S103 or S102-S104 shown in Figure 4, and obtain the transliteration result in the target language as "chicken you are so beautiful". The electronic device 100 may display the transliteration result in the candidate list 1031 in the user interface 1030, that is, perform S105 shown in FIG. 4 . As shown in (C) of FIG. 10 , the candidate list 1031 may include multiple options, such as but not limited to: option 1031A including "gene knee tie may translation", including "gene knee tie may (chicken you are so beautiful)" option 1031A. Option 1031B, Option 1031C including "Chicken you are so beautiful", Option 1031D including "Gene knee tie May translated into English", Includes option 1031E of "Gene Knee Tie May", where the content included in options 1031B and 1031C is related to the transliteration results. In some examples, the electronic device 100 may, in response to an operation for any one of the multiple options, search the Internet for information related to content included in the option.
图11示例性示出一种浏览器应用的用户界面的示意图。Figure 11 exemplarily shows a schematic diagram of a user interface of a browser application.
如图11的(A)所示,电子设备100可以显示浏览器应用的用户界面1110,用户界面1110可以包括搜索框1111,搜索框1111可以包括搜索控件1111A和切换控件1111B,搜索控件1111A包括字符“普通搜索”,可以指示当前的搜索类型为“普通搜索”,切换控件1111B可以用于切换搜索类型。在一种实施方式中,电子设备100可以接收用户基于用户界面1110中的搜索框1111输入的搜索词“harmony”,并响应针对用户界面1110中的搜索控件1111A的操作(例如触摸操作,该触摸操作例如为点击),显示和搜索词“harmony”相关的搜索结果,具体可参见图11的(B)所示的用户界面1120。As shown in (A) of FIG. 11 , the electronic device 100 may display a user interface 1110 of a browser application. The user interface 1110 may include a search box 1111 . The search box 1111 may include a search control 1111A and a switching control 1111B. The search control 1111A includes characters. "Normal search" can indicate that the current search type is "normal search", and the switching control 1111B can be used to switch the search type. In one implementation, the electronic device 100 may receive the search term "harmony" input by the user based on the search box 1111 in the user interface 1110, and respond to an operation (eg, a touch operation) on the search control 1111A in the user interface 1110. The operation is, for example, clicking) to display the search results related to the search term "harmony". For details, see the user interface 1120 shown in (B) of Figure 11 .
如图11的(B)所示,用户界面1120可以包括搜索框1111、搜索概要1121和搜索结果列表1122,其中,搜索框1111和用户界面1110中的搜索框1111一致,不再赘述。搜索概要1121可以包括字符:“为您找到‘harmony’的相关结果10个”。搜索结果列表1122可以包括多个和搜索词“harmony”相关的搜索结果。As shown in (B) of FIG. 11 , the user interface 1120 may include a search box 1111 , a search summary 1121 and a search result list 1122 . The search box 1111 is consistent with the search box 1111 in the user interface 1110 and will not be described again. The search summary 1121 may include the characters: "10 results found for 'harmony' for you". Search results list 1122 may include multiple search results related to the search term "harmony."
在一种实施方式中,电子设备100可以响应针对用户界面1110中的切换控件1111B的操作(例如触摸操作,该触摸操作例如为点击),切换搜索类型,例如将“普通搜索”切换为“音译搜索”,此时搜索控件1111A可以包括字符“音译搜索”。电子设备100可以响应针对搜索控件1111A的操作(例如触摸操作,该触摸操作例如为点击),对搜索框1111中的内容“harmony”进行音译,即电子设备100可以执行图4所示的方法,“harmony”即为源语言信息,得到的目标语言的音译集合包括3个音译结果:“哈莫尼”、“鸿蒙”和“哈梦”。然后,电子设备100可以显示与搜索词“harmony”和上述3个音译结果相关的搜索结果,具体可参见图11的(C)所示的用户界面1130。In one implementation, the electronic device 100 can switch the search type, for example, "normal search" to "transliteration" in response to an operation (such as a touch operation, such as a click) on the switch control 1111B in the user interface 1110 Search", at this time the search control 1111A may include the characters "transliteration search". The electronic device 100 can transliterate the content "harmony" in the search box 1111 in response to an operation (such as a touch operation, such as a click) on the search control 1111A, that is, the electronic device 100 can perform the method shown in Figure 4, "Harmony" is the source language information, and the obtained transliteration set of the target language includes three transliteration results: "Harmony", "Hongmeng" and "Harmony". Then, the electronic device 100 can display search results related to the search term "harmony" and the above three transliteration results. For details, see the user interface 1130 shown in (C) of FIG. 11 .
如图11的(C)所示,用户界面1130可以包括搜索框1111、搜索概要1131和搜索结果列表1132,其中,搜索框1111和用户界面1110中的搜索框1111类似,区别在于,用户界面1130中的搜索控件1111A包括字符“音译搜索”,可以指示当前的搜索类型为“音译搜索”。搜索概要1131可以包括字符“为您找到‘harmony’、‘哈莫尼’、‘鸿蒙’、‘哈梦’的相关结果100个”。搜索结果列表1132可以包括多个和搜索词“harmony”、上述3个音译结果相关的搜索结果,例如,搜索结果1132A(包括字符“鸿蒙harmony介绍”)、搜索结果1132B(包括字符“鸿蒙-最新资讯”)和搜索词“harmony”、音译结果“鸿蒙”相关,搜索结果1132C(包括字符“哈莫尼名字”)和音译结果“哈莫尼”相关,搜索结果1132D(包括字符“哈梦的故事”)和音译结果“哈梦”相关。As shown in (C) of FIG. 11 , the user interface 1130 may include a search box 1111 , a search summary 1131 and a search result list 1132 . The search box 1111 is similar to the search box 1111 in the user interface 1110 . The difference is that the user interface 1130 The search control 1111A in includes the characters "transliteration search", which may indicate that the current search type is "transliteration search". The search summary 1131 may include the characters "Find 100 related results for 'harmony', 'Harmony', 'Hongmeng', and 'Harmony' for you". The search result list 1132 may include multiple search results related to the search term "harmony" and the above three transliteration results. For example, search results 1132A (including the characters "harmony introduction"), search results 1132B (including the characters "harmony-latest"). Information") is related to the search term "harmony" and the transliteration result "Hongmeng". The search result 1132C (including the character "Harmony's name") is related to the transliteration result "Harmony". The search result 1132D (including the character "Harmony's name") is related to Story") is related to the transliteration result "Ha Meng".
本申请各实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DWD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD)等。以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。 The methods provided by the embodiments of this application can be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user equipment, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmit to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer The readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media can be magnetic media (for example, floppy disks, hard disks, tapes ), optical media (for example, digital video disc (DWD)), or semiconductor media (for example, solid state disk (SSD), etc.). The above-mentioned embodiments are only used to illustrate the technology of the present application. scheme, rather than limiting it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions recorded in the foregoing embodiments, or modify part of them. The technical features are equivalently substituted; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the scope of the technical solutions of the embodiments of the present application.

Claims (11)

  1. 一种音译方法,其特征在于,应用于电子设备,所述方法包括:A transliteration method, characterized in that it is applied to electronic equipment, and the method includes:
    接收用户输入的第一语言的第一信息;receiving the first information in the first language input by the user;
    对所述第一信息进行音译并得到第二语言的多个第二信息,所述多个第二信息包括第三信息和第四信息,所述第三信息和所述第四信息的长度不同;The first information is transliterated to obtain a plurality of second information in a second language, the plurality of second information includes third information and fourth information, and the lengths of the third information and the fourth information are different. ;
    显示所述多个第二信息。Display the plurality of second information.
  2. 如权利要求1所述的方法,其特征在于,所述第一信息为企业名称、品牌名称、商标名称、产品名称、人名、地名、国名、舶来词、文学著作的名称、电影的名称、音乐的名称或者音译热词。The method of claim 1, wherein the first information is a company name, a brand name, a trademark name, a product name, a person's name, a place name, a country name, a borrowed word, the name of a literary work, the name of a movie, music name or transliterated hot word.
  3. 如权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, characterized in that the method further includes:
    接收用户输入的第三语言的第五信息;Receive fifth information in the third language input by the user;
    对所述第五信息进行直译或者意译并得到第四语言的第六信息;Perform literal translation or free translation of the fifth information and obtain the sixth information in the fourth language;
    对所述第六信息进行音译并得到所述第三语言的至少一个第七信息;Transliterate the sixth information and obtain at least one seventh information in the third language;
    显示所述至少一个第七信息。Displaying the at least one seventh piece of information.
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述显示所述多个第二信息之前,所述方法还包括:The method according to any one of claims 1 to 3, characterized in that before displaying the plurality of second information, the method further includes:
    判断所述第二信息是否包括黑名单中的字符;Determine whether the second information includes characters in the blacklist;
    当所述第二信息包括黑名单中的第一字符时,将所述第二信息中的所述第一字符替换为白名单中的第二字符,所述第二字符是所述白名单中和所述第一字符的发音相似度大于或等于第一阈值的字符。When the second information includes a first character in the blacklist, replace the first character in the second information with a second character in the whitelist, and the second character is a character in the whitelist. Characters whose pronunciation similarity to the first character is greater than or equal to the first threshold.
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that the method further includes:
    接收用户输入的第一指令,所述第一指令用于指示所述第一信息的音译结果中的第一个字符为第三字符;Receive a first instruction input by the user, the first instruction being used to indicate that the first character in the transliteration result of the first information is a third character;
    所述多个第二信息是基于所述第一指令确定的,所述第二信息中的第一个字符为所述第三字符。The plurality of second information is determined based on the first instruction, and the first character in the second information is the third character.
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, characterized in that the method further includes:
    接收用户输入的第二指令,所述第二指令用于指示所述第一信息的音译结果中的最后一个字符为第四字符;Receive a second instruction input by the user, the second instruction being used to indicate that the last character in the transliteration result of the first information is a fourth character;
    所述多个第二信息是基于所述第二指令确定的,所述第二信息中的最后一个字符为所述第四字符。The plurality of second information is determined based on the second instruction, and the last character in the second information is the fourth character.
  7. 如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that the method further includes:
    接收用户输入的第三指令,所述第三指令用于指示所述第一信息对应的音译结果包括第五字符;Receive a third instruction input by the user, the third instruction being used to indicate that the transliteration result corresponding to the first information includes a fifth character;
    所述多个第二信息是基于所述第三指令确定的,所述第二信息包括所述第五字符。The plurality of second information is determined based on the third instruction, and the second information includes the fifth character.
  8. 如权利要求7所述的方法,其特征在于,所述对所述第一信息进行音译并得到第二语言的多个第二信息,包括:The method of claim 7, wherein transliterating the first information and obtaining a plurality of second information in a second language includes:
    对所述第一信息进行音译并得到所述第二语言的第八信息;Transliterate the first information and obtain the eighth information in the second language;
    将所述第八信息中的第六字符替换为所述第三指令指示的所述第五字符,所述第二信息为替换后的所述第八信息,所述第六字符为所述第八信息中和所述第五字符的发音相似度最大的字符。The sixth character in the eighth information is replaced with the fifth character indicated by the third instruction, the second information is the replaced eighth information, and the sixth character is the The character with the greatest pronunciation similarity to the fifth character among the eight pieces of information.
  9. 如权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, characterized in that the method further includes:
    接收用户输入的第五语言的第九信息,接收用户输入的第一长度;Receive the ninth information in the fifth language input by the user, and receive the first length input by the user;
    对所述第九信息进行音译并得到第六语言的至少一个第十信息,所述第十信息的长度为所述第一长度;Transliterate the ninth information and obtain at least one tenth information in the sixth language, and the length of the tenth information is the first length;
    显示所述至少一个第十信息。Display the at least one tenth piece of information.
  10. 一种电子设备,其特征在于,包括收发器、处理器和存储器,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如权利要求1-9任一项所述的方法。 An electronic device, characterized in that it includes a transceiver, a processor and a memory, the memory is used to store a computer program, and the processor calls the computer program to execute the method according to any one of claims 1-9. method described.
  11. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1-9任一项所述的方法。 A computer storage medium, characterized in that the computer storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any one of claims 1-9 is implemented.
PCT/CN2023/117202 2022-09-07 2023-09-06 Transliteration method and electronic device WO2024051729A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211089982.7 2022-09-07
CN202211089982.7A CN117672190A (en) 2022-09-07 2022-09-07 Transliteration method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2024051729A1 true WO2024051729A1 (en) 2024-03-14

Family

ID=90079576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/117202 WO2024051729A1 (en) 2022-09-07 2023-09-06 Transliteration method and electronic device

Country Status (2)

Country Link
CN (1) CN117672190A (en)
WO (1) WO2024051729A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114145A1 (en) * 2003-11-25 2005-05-26 International Business Machines Corporation Method and apparatus to transliterate text using a portable device
CN102193643A (en) * 2010-03-15 2011-09-21 北京搜狗科技发展有限公司 Word input method and input method system having translation function
US8775165B1 (en) * 2012-03-06 2014-07-08 Google Inc. Personalized transliteration interface
US20140244234A1 (en) * 2013-02-26 2014-08-28 International Business Machines Corporation Chinese name transliteration
CN104111972A (en) * 2008-07-18 2014-10-22 谷歌公司 Transliteration For Query Expansion
CN104795082A (en) * 2015-03-26 2015-07-22 广州酷狗计算机科技有限公司 Player and audio subtitle display method and device
CN105070289A (en) * 2015-07-06 2015-11-18 百度在线网络技术(北京)有限公司 English name recognition method and device
CN113591491A (en) * 2020-04-30 2021-11-02 阿里巴巴集团控股有限公司 System, method, device and equipment for correcting voice translation text

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114145A1 (en) * 2003-11-25 2005-05-26 International Business Machines Corporation Method and apparatus to transliterate text using a portable device
CN104111972A (en) * 2008-07-18 2014-10-22 谷歌公司 Transliteration For Query Expansion
CN102193643A (en) * 2010-03-15 2011-09-21 北京搜狗科技发展有限公司 Word input method and input method system having translation function
US8775165B1 (en) * 2012-03-06 2014-07-08 Google Inc. Personalized transliteration interface
US20140244234A1 (en) * 2013-02-26 2014-08-28 International Business Machines Corporation Chinese name transliteration
CN104795082A (en) * 2015-03-26 2015-07-22 广州酷狗计算机科技有限公司 Player and audio subtitle display method and device
CN105070289A (en) * 2015-07-06 2015-11-18 百度在线网络技术(北京)有限公司 English name recognition method and device
CN113591491A (en) * 2020-04-30 2021-11-02 阿里巴巴集团控股有限公司 System, method, device and equipment for correcting voice translation text

Also Published As

Publication number Publication date
CN117672190A (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN110111787B (en) Semantic parsing method and server
CN110910872B (en) Voice interaction method and device
US20220172717A1 (en) Voice Interaction Method and Electronic Device
WO2021027476A1 (en) Method for voice controlling apparatus, and electronic apparatus
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
WO2023125335A1 (en) Question and answer pair generation method and electronic device
WO2021254411A1 (en) Intent recognigion method and electronic device
CN111970401B (en) Call content processing method, electronic equipment and storage medium
CN111881315A (en) Image information input method, electronic device, and computer-readable storage medium
US20210383798A1 (en) Human-Computer Interaction Method and Electronic Device
CN113495984A (en) Statement retrieval method and related device
CN114330374A (en) Fusion scene perception machine translation method, storage medium and electronic equipment
CN111768765B (en) Language model generation method and electronic equipment
CN112740148A (en) Method for inputting information into input box and electronic equipment
CN114691839A (en) Intention slot position identification method
WO2023179490A1 (en) Application recommendation method and an electronic device
CN113742460A (en) Method and device for generating virtual role
WO2024051729A1 (en) Transliteration method and electronic device
WO2021031862A1 (en) Data processing method and apparatus thereof
CN111695071B (en) Page display method and related device
CN115083401A (en) Voice control method and device
WO2023078221A1 (en) Language translation method and electronic device
WO2023197951A1 (en) Search method and electronic device
CN117170560B (en) Image transformation method, electronic equipment and storage medium
WO2023236908A1 (en) Image description method, electronic device and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862423

Country of ref document: EP

Kind code of ref document: A1