CN114662485A - Translation model compression method, translation method and related device - Google Patents

Translation model compression method, translation method and related device Download PDF

Info

Publication number
CN114662485A
CN114662485A CN202210344547.8A CN202210344547A CN114662485A CN 114662485 A CN114662485 A CN 114662485A CN 202210344547 A CN202210344547 A CN 202210344547A CN 114662485 A CN114662485 A CN 114662485A
Authority
CN
China
Prior art keywords
interval
quantization
vocabulary
translation model
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210344547.8A
Other languages
Chinese (zh)
Inventor
徐浩广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210344547.8A priority Critical patent/CN114662485A/en
Publication of CN114662485A publication Critical patent/CN114662485A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

One or more embodiments of the present specification disclose a translation model compression method, a translation method, and a related apparatus, the method including: extracting a word list vector set from a target translation model, determining a quantization interval based on a normal distribution result mapped by the word list vector set, calculating the number of the to-be-divided intervals according to the determined quantization precision, dividing the quantization interval into a plurality of quantization intervals with the same number as the intervals in an equal division mode, allocating a unique interval number for each quantization interval, respectively replacing an original value of a high bit corresponding to an element of a word list vector contained in each quantization interval with the interval number of the high bit of the corresponding quantization interval, and writing the word list vector represented by the interval number into the target translation model. Thus, compression of the target translation model is achieved.

Description

Translation model compression method, translation method and related device
Technical Field
The present document relates to the field of artificial intelligence technologies, and in particular, to a translation model compression method, a translation method, and a related apparatus.
Background
With the continuous development of machine translation technology, online translation is widely applied. Meanwhile, with the upgrade of services, the requirements of application scenarios of off-line translation are increasing, and terminal devices such as a swipe pen, a translator, a simultaneous interpretation earphone, and the like generally use a translation function in a wireless or weak network environment.
However, the memory resources of such terminal devices are generally limited, and the memory on the terminal device is usually shared by a system module, a translation module, an optical character recognition module, a man-machine conversation module, and the like, and the memory occupies a relatively large amount.
At present, the industry proposes more compression schemes for translation models, but due to unreasonable compression methods, although the memory usage can be reduced, the translation quality is easily lost.
Disclosure of Invention
One or more embodiments of the present disclosure provide a translation model compression method, a translation method, and a related apparatus, so as to implement compression of a translation model in a vocabulary quantization manner, and reduce occupation of a translation model on a memory on the premise of ensuring that translation quality is less damaged.
To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:
in a first aspect, a translation model compression method is provided, including:
extracting a word list vector set from a target translation model obtained by training, wherein elements of word list vectors in the word list vector set are in normal distribution;
determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set;
according to the determined quantization precision, calculating the number of the to-be-divided intervals, dividing the quantization interval into a plurality of quantization intervals with the same number as the intervals in an equal division mode, and allocating a unique interval number to each quantization interval;
respectively replacing the original values corresponding to the elements of the vocabulary vectors contained in each quantization interval with the interval numbers of the corresponding quantization interval, and writing the vocabulary vectors represented by the interval numbers into the target translation model;
and the bit of the interval number is smaller than the bit of the original value.
In a second aspect, a translation method is provided, including:
determining a target original text to be translated;
searching a word list vector corresponding to each word list in at least one word list contained in the target original text from a target translation model based on the target original text, wherein the word list vector stored in the target translation model is a word list vector which is represented by an interval number after quantization processing is carried out by using the translation model compression method in the first aspect;
determining a final value of each element in the vocabulary vector through inverse quantization according to a corresponding relation between the vocabulary vector and a locally stored vector, wherein the corresponding relation is pre-stored through the translation model compression method of the first aspect, and the bits of the final value of each element in the vocabulary vector determined through inverse quantization are the same as the bits of the original value;
and inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for reasoning and prediction, and outputting a translation result.
In a third aspect, a translation model compression apparatus is provided, including:
the extraction module is used for extracting a word list vector set from a target translation model obtained through training, wherein elements of word list vectors in the word list vector set are in normal distribution;
the determining module is used for determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set;
the dividing module is used for calculating the number of the to-be-divided sections according to the determined quantization precision, dividing the quantization interval into a plurality of quantization sections with the same number as the sections in an equal division mode, and allocating a unique interval number to each quantization section;
the quantization module is used for replacing the original value corresponding to the element of the vocabulary vector contained in each quantization interval with the interval number of the corresponding quantization interval and writing the vocabulary vector represented by the interval number into the target translation model;
and the bit of the interval number is smaller than the bit of the original value.
In a fourth aspect, a translation apparatus is provided, including:
the first determining module is used for determining a target original text to be translated;
the searching module is used for searching a word list vector corresponding to each word list in at least one word list contained in the target original text from a target translation model based on the target original text, wherein the word list vector stored in the target translation model is a word list vector which is quantized by using the translation model compression method of the first aspect and then is represented by interval numbers;
a second determining module, configured to determine, by inverse quantization, a final value corresponding to each element in the vocabulary vector according to a correspondence between the vocabulary vector and a locally stored vector, where the correspondence between the vectors is pre-stored by the translation model compression method according to the first aspect, and bits of the final value of each element in the vocabulary vector determined by inverse quantization are the same as bits of an original value;
and the translation module inputs the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for inference prediction and outputs a translation result.
In a fifth aspect, an electronic device is provided, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the translation model compression method of the first aspect or the translation method of the second aspect.
In a sixth aspect, a computer-readable storage medium is provided, which stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the translation model compression method of the first aspect or the translation method of the second aspect.
As can be seen from the technical solutions provided in one or more embodiments of the present specification, a high-bit vocabulary vector extracted from a target translation model is quantized to obtain a low-bit vocabulary vector, so as to achieve the purpose of compressing the target translation model, thereby reducing the occupied space of the target translation model in a system memory. Moreover, the compression scheme does not involve the change of the calculation weight of the model network, so the influence on the subsequent translation quality is small.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, reference will now be made briefly to the attached drawings, which are needed in the description of one or more embodiments or prior art, and it should be apparent that the drawings in the description below are only some of the embodiments described in the specification, and that other drawings may be obtained by those skilled in the art without inventive exercise.
Fig. 1 is a flowchart illustrating steps of a translation model compression method according to an embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating steps of a translation method according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of normal distribution obtained by mapping elements in a vocabulary vector according to an embodiment of the present disclosure.
Fig. 4 is a schematic structural diagram of a translation model compression apparatus according to an embodiment of the present disclosure.
Fig. 5 is a schematic structural diagram of a translation apparatus provided in an embodiment of this specification.
Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.
Fig. 7 is a schematic structural diagram of an electronic device provided in another embodiment of the present specification.
Detailed Description
In order to make the technical solutions in the present specification better understood, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification, and it is obvious that the one or more embodiments described are only a part of the embodiments of the present specification, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.
Based on the requirements of the terminal device on the off-line translation service scene and the service upgrade, the memory resource occupation of the terminal device is concerned. In order to reasonably plan memory occupation of the terminal device, especially for the terminal device with limited memory, a translation model used on the terminal device is generally prone to be compressed, and occupation of system memory resources is reduced. Although the currently proposed translation model compression scheme can reduce the occupation of system memory resources to a certain extent, the compression scheme involves network computation inside the translation model, especially network computation weight, so that the translation quality is greatly affected, and higher translation quality and accuracy cannot be guaranteed.
Therefore, in the embodiment of the present specification, a translation model compression scheme is provided, and in consideration of that a vocabulary vector occupies half of the total space size of a translation model, a high-bit vocabulary vector is quantized to obtain a low-bit vocabulary vector to achieve the purpose of compressing the translation model, so that the occupation of the translation model on a system memory is reduced. Because the influence of the vocabulary vector on network calculation inside the translation model is small, the change of the network calculation weight is hardly involved, and therefore, the influence on the final translation quality is small.
It should be understood that, in the embodiments of the present specification, performing quantization processing on a vocabulary vector means converting a high-order vocabulary vector into a low-order vocabulary vector, thereby achieving weight reduction of the vocabulary vector and thus weight reduction of a translation model.
The technical means referred to in the present specification will be described in detail below with reference to specific examples.
Example one
Referring to fig. 1, a flowchart of steps of a translation model compression method provided in an embodiment of this specification is that an execution subject of the compression method may be a hardware device (e.g., a smartphone, a personal computer, a wearable device, a tablet computer, etc.) or a software device (e.g., a client integrated on the foregoing various hardware devices or a combination of software modules), which is not limited in this specification. The translation model compression method may include the steps of:
step 102: and extracting a word list vector set from the target translation model obtained by training, wherein elements of the word list vectors in the word list vector set are in normal distribution.
The word list vector (Words Embedding) is a matrix value after a source text word list and a target end word list in a translation model are respectively vectorized. In a trained target translation model, the number of included word lists may be large, and correspondingly, the number of word list vectors is large, so that the word list vectors can be combined into a word list vector set to be stored in the target translation model.
It should be understood that a vocabulary vector may contain M eigen dimensions (i.e., elements), and if there are 6000 vocabularies in the target translation model and the vocabulary vector for each vocabulary is 512 dimensions, there will be 6000 × 512 eigen dimensions in the target translation model. The vocabulary vectors are mapped into a coordinate system, and the distribution intervals of elements corresponding to the vocabulary vectors are concentrated and generally in normal distribution. In fact, the distribution referred to herein is normal, not standard normal, but a trend, and the distribution of the elements included in the vocabulary vectors tends to normal. In the examples of the present specification, a normal distribution is taken as an example for explanation.
Step 104: and determining a quantization interval based on the normal distribution result mapped by the vocabulary vector set.
In this embodiment, a vocabulary vector distribution interval satisfying a preset constraint condition may be selected as a quantization interval based on a normal distribution mapped by a vocabulary vector set. The preset constraint conditions can be flexibly set according to business requirements, different selection examples of the preset constraint conditions are respectively given by selecting all vocabulary vectors in the vocabulary vector set and selecting the vocabulary vector set with the degree meeting the constraint sub-conditions, and the vocabulary vector distribution interval meeting the preset constraint conditions is selected from normal distribution results to serve as a quantization interval.
-the predetermined constraint 1 is to select a distribution interval corresponding to all vocabulary vectors in the set of vocabulary vectors.
Correspondingly, when a vocabulary vector distribution interval meeting a preset constraint condition 1 is selected as a quantization interval in normal distribution mapped by the vocabulary vector set, the vocabulary vector distribution interval corresponding to all the vocabulary vectors can be selected as a quantization interval based on the normal distribution mapped by the vocabulary vector set, wherein the maximum boundary of the quantization interval is the original value of the maximum element in the vocabulary vector set, and the minimum boundary is the original value of the minimum element in the vocabulary vector set.
For example, assuming that in a normal distribution result obtained by mapping the vocabulary vector set, the vocabulary vector distribution interval in which the element in the vocabulary vector set is located is [ -0.809, 0.758], if a preset constraint condition 1 is used as a determination condition, the vocabulary vector distribution intervals corresponding to all the vocabulary vectors in the vocabulary vector set are selected, that is, [ -0.809, 0.758] is used as a quantization interval. Therefore, the subsequent quantization compression processing is carried out on the vocabulary vectors in the whole vocabulary vector set.
Although the quantization interval determined by the preset constraint condition 1 can cover all vocabulary vectors in the vocabulary vector set, the compression range is expanded, the compression effect is improved to a greater extent, and the problem of occupation of a translation model is reduced. However, it is considered that a set of vocabulary vectors contains many vocabulary vectors and each vocabulary vector contains many elements, and the elements in the vocabulary vectors are normally distributed, and most of the elements are centrally distributed in a certain distribution interval. Therefore, the preset constraint condition 2 can be taken as a selection condition, and a distribution interval with concentrated word list vector concentration degree is selected from a normal distribution result as a quantization interval, so that the quantization interval needing to be compressed is reasonably determined, the low-efficiency compression caused by the overlarge range of the compressed quantization interval is avoided, and the precision loss caused by the dispersion of element distribution in the compressed quantization interval is reduced.
-the predetermined constraint 2 is to select a distribution interval of the vocabulary vectors whose degree of the vocabulary vector set satisfies the constraint sub-condition.
Correspondingly, when a vocabulary vector distribution interval meeting a preset constraint condition is selected as a quantization interval in normal distribution mapped by the vocabulary vector set, the vocabulary vector distribution interval with the centralized degree meeting the constraint sub-condition can be selected as the quantization interval from the normal distribution mapped by the vocabulary vector set; wherein the constraint sub-condition may include: the value of the element at the maximum boundary of the word list vector distribution interval meets a first threshold value and/or the value of the element at the minimum boundary meets a second threshold value; or the density of the elements at the maximum boundary of the vocabulary vector distribution interval meets a third threshold and/or the density of the elements at the minimum boundary meets a fourth threshold.
For example, in a normal distribution result obtained by mapping the vocabulary vector set, a vocabulary vector distribution interval in which an element in the vocabulary vector set is located is [ -0.809, 0.758], if a preset constraint condition 2 is used as a determination condition, when the constraint sub-condition is that a value of the element at the maximum boundary of the quantization interval satisfies a first threshold, the quantization interval may be [ -0.5, 0.758], or when the constraint sub-condition is that a value of the element at the minimum boundary of the quantization interval satisfies a second threshold, the quantization interval may be [ -0.809, 0.6], or when the constraint sub-condition is that a value of the element at the maximum boundary of the quantization interval satisfies the first threshold and a value of the element at the minimum boundary satisfies the second threshold, the quantization interval is [ -0.5, 0.6 ]. The first threshold is-0.5, and the second threshold is 0.6, where the first threshold and the second threshold can be determined according to the concentration of the element distribution determined by the analysis of the normal distribution result, and in general, most elements in the vocabulary vector set are distributed in [ -0.5,0.5] by default, and the element distribution in the interval is more concentrated, which accounts for about 60% -80% or more of the total vocabulary vectors, while the element distributions in other distribution intervals are more dispersed, so the first threshold and the second threshold can be determined by the analysis.
When the constraint sub-condition is that the density of the element at the maximum boundary of the vocabulary vector distribution interval satisfies the third threshold, it is determined that the quantization interval may be [ -0.6, 0.758], or, when the constraint sub-condition is that the density of the element at the minimum boundary of the vocabulary vector distribution interval satisfies the fourth threshold, it is determined that the quantization interval may be [ -0.809, 0.7], or, when the constraint sub-condition is that the density of the element at the maximum boundary of the vocabulary vector distribution interval satisfies the third threshold and the density of the element at the minimum boundary satisfies the fourth threshold, it is determined that the quantization interval may be [ -0.6, 0.7 ]. The third threshold is-0.6 as an example, and the fourth threshold is 0.7 as an example, in fact, the third threshold and the fourth threshold may be determined according to the concentration degree of the element distribution determined by the analysis of the normal distribution result, generally, most elements in the vocabulary vector set are distributed in the concentration range of [ -0.6, 0.7], in this interval, the density of the elements at each node is relatively large, for example, occupies more than 20% of the total number of the elements, and in the distribution interval outside this interval, the density of the elements at each node is relatively small, and the dispersion is sporadically small.
It should be understood that, in the embodiments of the present specification, the values of the first threshold and the second threshold may be equal or different. The method may specifically be determined according to the element distribution condition of the vocabulary vector set in each translation model to be quantized, and is not specifically limited herein. Likewise, the third threshold and the fourth threshold may be equal or different. I.e. the quantization interval is a symmetrically distributed interval or an asymmetrically distributed interval.
It should be emphasized that the first threshold, the second threshold, the third threshold, and the fourth threshold may be set according to empirical values, or may be flexibly adjusted according to the current business requirement, as long as the element distribution (which may be understood as word list vector distribution by simple brute force) of the quantization interval determined by the thresholds is ensured to be relatively concentrated.
Step 106: according to the determined quantization precision, the number of the to-be-divided sections is calculated, the quantization interval is divided into a plurality of quantization sections with the same number as the sections in an equal division mode, and a unique interval number is allocated to each quantization section.
Usually, the precision of a word list vector after a translation model is trained is 32-bit floating point numbers, and a quantization process is to map 32-bit high precision to a low precision value, for example, to 8-bit or 16-bit integer, so that precision loss and memory compression effects brought by different quantization precisions are different, and the lower the quantization precision, the more memory is saved, but the greater the quality loss is brought. In the embodiment of the present specification, the quantization precision may be a precision that is reasonably selected to be smaller than a bit of a current original value as the quantization precision according to the precision of the original value of each element of the vocabulary vector in the currently trained translation model, and a specific selection manner is not limited and may be flexibly selected according to a requirement.
In this embodiment of the present specification, after the quantization interval is determined, the quantization interval may be equally divided based on the number of the sections calculated by the quantization precision to obtain a plurality of quantization sections. Meanwhile, each quantization interval is assigned an interval number to uniquely identify the quantization interval. The interval number may be a number or a character or a letter or a different combination of numbers, characters, letters. In an alternative embodiment, the interval number is a number, and the consecutive time segments are respectively identified by consecutive numbers.
The following describes a specific implementation of step 106 in detail, taking the quantization interval determined based on the preset constraint condition as an example.
Defining the number of the interval sections to be N corresponding to the quantization interval determined based on the preset constraint condition 1, wherein N is a positive integer; calculating by taking the determined quantization precision as an index of a base number 2, and subtracting 1 from the calculation result to obtain the number N of the to-be-divided sections; the quantization intervals are evenly divided into N quantization intervals in an equal division mode, and each quantization interval is respectively endowed with an interval number by using each positive integer from 0 to N-1.
For example, assuming that the original value of the element is a 32-bit floating point number, the determined quantization precision may be 8-bit integer, and then, the number of the sections is calculated to be 28-1 ═ 255. The quantization interval may then be divided evenly into 255 quantization intervals, andeach quantization section is assigned a bin number using 0-254, respectively, and for example, it may start from left to right, the bin number of the first quantization section is 0, the bin number of the second quantization section is 1, the bin number of the third quantization section is 2 … …, and the bin number of the 255 th quantization section is 254.
Correspondingly defining the number of the interval sections as N corresponding to the quantization interval determined based on the preset constraint condition 2, wherein N is a positive integer; calculating by taking the determined quantization precision as an index of a base number 2, and subtracting 1 from the calculation result to obtain the number N of the to-be-divided sections; uniformly dividing the quantization interval into N quantization intervals in an equal division mode, and respectively assigning an interval number to each quantization interval by using each positive integer from 0 to N-1; and uniformly assigning a special number N to distribution intervals except the quantization intervals.
For example, assuming that the original value of the element is a 32-bit floating point number, the determined quantization precision may be 8-bit integer, and then, the number of the sections is calculated to be 28-1 ═ 255. The quantization intervals may then be evenly divided into 255 quantization intervals and each quantization interval is assigned an interval number using 0-254, respectively. Then, considering that the quantization interval determined by using the preset constraint condition 2 is only a partial distribution interval, it is also necessary to identify a distribution interval that does not need to be quantized, so that distribution intervals other than the quantization interval can be collectively identified by using 255 as a special number. For example, the quantization intervals may start from left to right, the interval number of the first quantization section is 0, the interval number of the second quantization section is 1, the interval number of the third quantization section is 2 … …, and the interval number of the 255 th quantization section is 254. Distribution intervals other than the quantization intervals are collectively denoted by a special number 255.
Step 108: respectively replacing the original values corresponding to the elements of the vocabulary vectors contained in each quantization interval with the interval numbers of the corresponding quantization interval, and writing the vocabulary vectors represented by the interval numbers into the target translation model; and the bit of the interval number is smaller than the bit of the original value.
After each quantized block obtained by equally dividing is assigned with a block number, the original value corresponding to the element of the vocabulary vector included in each quantized block may be replaced with the block number of the corresponding quantized block, for example, if an element (original value: 0.3456) of the vocabulary vector is within the quantized block of block number 128, the element of the vocabulary vector is replaced with 128 to be represented, and all the vocabulary vectors represented by the block number are written into the target translation model. At the moment, the vocabulary vector in the target translation model is replaced by an integer from the original floating point number, and the bit is reduced, so that the vocabulary vector is subjected to certain compression processing, the target translation model is further compressed, and the occupied space of the target translation model in a system memory is reduced. Moreover, as mentioned above, the scheme of the compression translation model does not involve network computing weight, and therefore, does not have excessive influence on the computing network, thereby ensuring that the translation quality loss after compression is within a tolerance range.
Further, in the embodiment of the present specification, in the case where the quantization interval is specified using the preset constraint condition 2, after the vocabulary vectors represented by the interval numbers are written in the target translation model, the vocabulary vectors included in the distribution intervals other than the quantization intervals may be written in the target translation model as corresponding original values for the distribution intervals other than the quantization intervals, and may be collectively labeled using the special number N. That is, if the quantization interval is [ -0.5,0.5], then the elements in the vocabulary vectors of the distribution intervals (— 0.5) and (0.5, + ∞) are rewritten into the target translation model as they are, i.e., without quantization processing. However, the vocabulary vectors falling within these distribution intervals may collectively be labeled with a special number, such as 255, as mentioned above. In fact, other numbers may be used as long as the number is defined in the method to be a distribution section other than the special mark quantization section.
It should be understood that, in the embodiment of the present specification, the above quantization compression scheme converts the high-order vocabulary vector into the low-order vocabulary vector, and then, when performing network computation using the vocabulary vector, the vocabulary vector represented by the interval number of the low-order bits in the translation model may be dequantized back to the vocabulary vector of the original-order bits, so that after the vocabulary vector represented by the interval number is written into the target translation model, the corresponding relationship between each quantization interval and the allocated interval number may be stored, where the quantization interval is a distribution interval of the vocabulary vector with the original value corresponding to the element of the vocabulary vector as the boundary.
It should be noted that the correspondence relationship may be stored in a storage space outside the system memory, that is, not stored in the translation model. The storage may be specifically stored in a relational table or a non-tabular manner, which is not limited in the embodiments of the present specification.
By the technical scheme, the high-bit word list vector extracted from the target translation model is quantized to obtain the low-bit word list vector, so that the target translation model is compressed, and the occupied space of the target translation model in a system memory is reduced. Moreover, the compression scheme does not involve the change of the calculation weight of the model network, so the influence on the subsequent translation quality is small.
Example two
Referring to fig. 2, a flowchart of steps of a translation method provided in an embodiment of the present disclosure is shown, where an execution main body of the translation method may be a hardware device (e.g., a smart phone, a personal computer, a wearable device, a tablet computer, etc.) or a software device (e.g., a client or a combination of software modules integrated on the foregoing various hardware devices) with certain computing and processing capabilities, and in fact, the execution main body of the translation method may be the same as or different from the execution main body of the translation model compression method in the first embodiment, and this description does not limit this. The translation method may include the steps of:
step 202: and determining a target original text to be translated.
The target original text can be any language segment with the same type as the language type of the source text contained in the word list vector in the target translation model.
Step 204: and searching a word list vector corresponding to each word list in at least one word list contained in the target original text from a target translation model based on the target original text, wherein the word list vector stored in the target translation model is a word list vector which is represented by an interval number after quantization processing is carried out by using the translation model compression method in the steps 102-108.
The specific implementation of step 204 may be performed according to the existing word list lookup manner, and sequentially lookup the word list vector corresponding to each word list included in the target original text from the target translation model. The vocabulary vectors stored in the target translation model are obtained by processing in the quantization compression manner described in the first embodiment, and are specifically described in the first embodiment, which is not described herein again.
Step 206: and determining the final value of each element in the vocabulary vector by inverse quantization according to the corresponding relation between the vocabulary vector and the local storage, wherein the corresponding relation is pre-stored by the translation model compression method in the step 102 to the step 108, and the bits of the final value of each element in the vocabulary vector determined by inverse quantization are the same as the bits of the original value.
After the vocabulary vector corresponding to the vocabulary included in the target original text is found, the final value of each element in the vocabulary vector can be determined through inverse quantization searching based on the corresponding relation of the vocabulary vector stored locally. The related correspondence is determined based on the translation model compression scheme described in the first embodiment, and is specifically described with reference to the first embodiment, which is not described herein again.
In this embodiment of the present specification, when determining a final value corresponding to each element in the vocabulary vector through inverse quantization according to the correspondence between the vocabulary vector and a locally stored correspondence, if the vocabulary vector is represented by a section number, a quantization interval matching the section number corresponding to each element in the vocabulary vector may be searched from the locally stored correspondence based on the vocabulary vector; and determining the node value which accords with the preset rule in the quantization interval as the final value of the element corresponding to the interval number. Alternatively, the node value conforming to the preset rule may be a maximum boundary value, or a minimum boundary value, or a median of the quantized segment. In fact, the node value that meets the preset rule may also be any other value within the quantization interval, which is not limited herein. It should be understood that, for convenience of determining the final value, the node values that meet the preset rule may be unified for all quantized block segments, for example, the median of each quantized block segment is unified as the final value of the current quantized block segment.
It is emphasized that the bits of the final value determined by inverse quantization are the same as the bits of the original value. For example, the original value of the element of the vocabulary vector is a 32-bit floating point number, and is represented by an 8-bit integer interval number after quantization, and is represented by the final value of the 32-bit floating point number after dequantization. Therefore, although the original values and the final values of the elements of the dequantized vocabulary vectors may have certain errors, the prediction distribution of the inference calculation in the later stage of the model is not greatly influenced.
And if the word list vector is marked by a special number, directly determining the original value of each element in the word list vector as the final value of each element in the word list vector.
Step 208: and inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for reasoning and prediction, and outputting a translation result.
Through the technical scheme, in the translation process, the found low-bit word list vector is inversely quantized to the high-bit final value by utilizing the corresponding relation stored in the quantization process, although the final value may have a certain error with the original value, the bit of the final value is the same as that of the original value, and after the final value is finally input to the translation model, a large number of final values participate in the network model calculation, the error can be ignored, and the prediction result obtained through inference prediction does not have large deviation. Therefore, under the condition of realizing translation model compression, the translation quality loss is ensured to be small, and especially for the equipment with limited memory, the reasonable distribution of the whole memory can be improved, and the translation efficiency is improved.
EXAMPLE III
The above translation model compression scheme and the translation scheme are described in detail below as a specific example.
Assuming that a target translation model is trained, the distribution characteristics of elements in a word list vector in the target translation model meet the following requirements: presents normal distribution, and is gathered near 0; the distribution interval range is small, and the distribution is concentrated.
Referring to fig. 3, first, elements in the vocabulary vectors are mapped to the coordinate system shown in fig. 3, and the vocabulary vectors are distributed in an open interval. In practical projects, the quantization precision is intensively distributed between [ -0.5,0.5], and a more intensive interval is generally selected as a quantization interval, so that the loss of quantization precision caused by data dispersion is reduced.
Usually, the precision of the vocabulary vector after model training is 32-bit floating point number, and the quantization process is to map 32-bit high precision to a low precision value, such as 8-bit or 16-bit integer, the precision loss and memory compression effect brought by different quantization precision are different, the lower the quantization precision, the more memory saving, but the larger the quality loss.
And (3) quantization process:
here, [ -0.5,0.5] can be selected]As quantization intervals. Here, the quantization precision can be determined to be 8-bit integer. Quantizing interval [ -0.5,0.5 [ -0.5 [ ]]Equal division position 28255 parts, yielding 255 quantified block segments. And allocates one section number index, e.g., 1-254, to each quantized section, respectively. Quantization interval [ -0.5,0.5 [)]Distribution intervals other than this are assigned a special number 255.
And replacing the original value w with the interval number index and storing the original value w in the translation model aiming at the elements of the vocabulary vector positioned in the quantization interval. For example, int8(index) replaces fp32(w), and is saved in the translation model, specifically, as shown in the figure, it is assumed that int8(102) replaces fp32(-0.1) to write in the translation model; as another example, int (204) writes to the translation model in place of fp32 (0.3).
And directly storing the original value w into the translation model aiming at the elements of the word list vectors positioned outside the quantization interval, and marking the special number index. For example, fp32(w) is saved directly to the translation model and marked with an index, here marked with an index of 255.
Thus, the quantization processing of the vocabulary vectors is completed, namely, the compression of the translation model is realized.
And (3) inverse quantization process:
after the target original text to be translated is determined, the word list vector of the word list in the target original text is searched from the translation model after the compression processing. During actual calculation, the fp 32-bit final value can be dequantized according to index value; if the index is the special number 255, the original value of the fp32 vocabulary vector can be directly read out as a final value, if the index is not the special number, the final value of the fp32 bits corresponding to the quantized block is taken out according to the value of the index, the final value can be the left boundary, the right boundary or the median of the block, and the final value has a certain error with the original value, but has little influence on the predictive analysis of the inference calculation at the later stage of the translation model, namely the bilingual evaluation index bleu value of the final translation result is slightly influenced but is greatly saved in the memory, the fp32 is quantized to int8, so that the memory of nearly 3/4 can be saved, and the influence on the equipment with limited memory resources is great. Moreover, in actual business, if the vocabulary vectors are more concentrated, the determined quantization interval is smaller, and further, the quality loss of the whole translation is smaller.
In the implementation process of the technical scheme, based on the distribution concentration characteristic of elements in the word list vector and the characteristic of the translation model, the word list vector is quantized to compress the translation model, and the translation model after being compressed is used for reasoning and prediction in inverse quantization, the whole process can realize obvious compression of the translation model without any change of the network weight of the translation model, meanwhile, the loss of the translation quality can be almost ignored, the whole compression process is easy to realize, the defect that the traditional compression scheme has large loss of the translation quality is effectively avoided, and the translation efficiency can be integrally improved.
Example four
Referring to fig. 4, a schematic structural diagram of a translation model compression apparatus provided in an embodiment of the present disclosure, where the translation model compression apparatus 400 may include:
the extracting module 402 extracts a vocabulary vector set from the trained target translation model, wherein elements of the vocabulary vectors in the vocabulary vector set are normally distributed.
And a determining module 404, configured to determine a quantization interval based on the normal distribution result mapped by the vocabulary vector set.
The dividing module 406 calculates the number of the to-be-divided blocks according to the determined quantization precision, divides the quantization interval into a plurality of quantization blocks with the same number as the blocks in an equal division manner, and allocates a unique interval number to each quantization block.
The quantization module 408 is configured to replace the original value corresponding to the element of the vocabulary vector included in each quantization interval with the interval number of the corresponding quantization interval, and write the vocabulary vector represented by the interval number into the target translation model; and the bit of the interval number is smaller than the bit of the original value.
Optionally, as an embodiment, when determining the quantization interval based on the normal distribution result mapped by the vocabulary vector set, the determining module 404 is specifically configured to select, as the quantization interval, a vocabulary vector distribution interval that meets a preset constraint condition from the normal distribution mapped by the vocabulary vector set.
In a specific implementation manner of the embodiment of the present specification, the preset constraint condition includes: all the vocabulary vectors in the vocabulary vector set; the determining module 404 is specifically configured to select, as a quantization interval, a vocabulary vector distribution interval corresponding to all vocabulary vectors based on a normal distribution condition mapped by the vocabulary vector set when selecting, as the quantization interval, a vocabulary vector distribution interval meeting a preset constraint condition in the normal distribution mapped by the vocabulary vector set, where a maximum boundary of the quantization interval is an original value of a maximum element in the vocabulary vector set, and a minimum boundary is an original value of a minimum element in the vocabulary vector set.
In another specific implementation manner of the embodiment of the present specification, the preset constraint condition includes: the word list vector concentration degree meets the constraint sub-condition; the determining module 404 is specifically configured to select, as a quantization interval, a vocabulary vector distribution interval whose vocabulary vector concentration degree satisfies a constraint sub-condition based on the normal distribution condition mapped by the vocabulary vector set when selecting, as the quantization interval, the vocabulary vector distribution interval satisfying a preset constraint condition from the normal distribution based on the vocabulary vector set mapping; wherein the constraint sub-conditions include: the value of the element at the maximum boundary of the word list vector distribution interval meets a first threshold value and/or the value of the element at the minimum boundary meets a second threshold value; or the density of the elements at the maximum boundary of the vocabulary vector distribution interval meets a third threshold and/or the density of the elements at the minimum boundary meets a fourth threshold.
In another specific implementation manner of the embodiment of the present specification, the number of the segment sections is N, and N is a positive integer; the dividing module 406 is specifically configured to calculate the number of the to-be-divided blocks according to the determined quantization precision, divide the quantization interval into a plurality of quantization blocks having the same number as the blocks in an equal division manner, and allocate a unique interval number to each quantization block, when the determined quantization precision is used as an index of a base number 2 to calculate, and subtract 1 from the calculation result to obtain the number N of the to-be-divided blocks; the quantization interval is evenly divided into N quantization intervals in an equal division mode, and each quantization interval is respectively endowed with an interval number by using each positive integer from 0 to N-1.
In another specific implementation manner of the embodiment of the present specification, the number of the segment sections is N, and N is a positive integer; the dividing module 406 is specifically configured to calculate the number of the to-be-divided blocks according to the determined quantization precision, divide the quantization interval into a plurality of quantization blocks having the same number as the blocks in an equal division manner, and allocate a unique interval number to each quantization block, when the determined quantization precision is used as an index of a base number 2 to calculate, and subtract 1 from the calculation result to obtain the number N of the to-be-divided blocks; uniformly dividing the quantization interval into N quantization intervals in an equal division manner, and respectively assigning an interval number to each quantization interval by using each positive integer from 0 to N-1; and uniformly assigning a special number N to distribution intervals except the quantization intervals.
In another specific implementation manner of the embodiment of the present specification, the translation model compression apparatus further includes: and a marking module, configured to, after the quantization module 408 writes the vocabulary vectors represented by the interval numbers into the target translation model, write the vocabulary vectors included in the distribution intervals other than the quantization intervals into the target translation model as corresponding original values, and perform uniform marking by using a special number N.
In another specific implementation manner of the embodiments of the present specification, the quantization interval is a symmetric distribution interval, or an asymmetric distribution interval.
In another specific implementation manner of the embodiment of the present specification, the translation model compression apparatus further includes: a storage module, configured to store a correspondence between each quantization interval and an allocated interval number after the quantization module 408 writes the vocabulary vector represented by the interval number into the target translation model, where the quantization interval is a distribution interval of the vocabulary vector with an original value corresponding to an element of the vocabulary vector as a boundary.
Through the technical scheme, the translation model compression device can quantize the high-bit word list vector of the translation model into a low-bit expression mode, so that the purpose of compressing the translation model is achieved, and the space occupied by the translation model in a system memory is reduced. Especially for the device with limited memory, the memory occupation is optimized obviously, and the machine translation efficiency is improved. Moreover, the compression scheme has little or no influence on the calculation weight of the model network, so that the translation quality loss can be ensured to be small under the condition of realizing the compression of the translation model.
EXAMPLE five
Referring to fig. 5, which is a schematic structural diagram of a translation apparatus provided in an embodiment of this specification, the translation apparatus 500 may include:
a first determining module 502, which determines a target original text to be translated;
a searching module 504, configured to search, based on the target original text, a vocabulary vector corresponding to each vocabulary in at least one vocabulary included in the target original text from a target translation model, where the vocabulary vector stored in the target translation model is a vocabulary vector expressed by an interval number after quantization processing is performed by using the translation model compression method described in the first embodiment;
a second determining module 506, configured to determine, by inverse quantization, a final value corresponding to each element in the vocabulary vector according to a correspondence between the vocabulary vector and a locally stored vector, where the correspondence between the vectors is pre-stored by the translation model compression method described in the first embodiment, and bits of the final value and an original value of each element in the vocabulary vector determined by inverse quantization are the same;
and the translation module 508 is used for inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for inference prediction and outputting a translation result.
Optionally, as an embodiment, when determining the final value corresponding to each element in the vocabulary vector through inverse quantization according to the corresponding relationship between the vocabulary vector and the locally stored corresponding relationship, the second determining module 506 is specifically configured to search, based on the vocabulary vector, a quantization interval that matches an interval number corresponding to each element in the vocabulary vector from the locally stored corresponding relationship; and determining the node value which accords with the preset rule in the quantization interval section as the final value of the element corresponding to the interval number.
In a specific implementation manner of the embodiment of the present specification, the node value meeting the preset rule is a maximum boundary value, or a minimum boundary value, or a median of the quantized segment.
It should be understood that, in this embodiment of the present specification, the translation apparatus 500 shown in fig. 5 may belong to the same apparatus as the translation model compression apparatus 400 shown in fig. 4, or may be different apparatuses, or both apparatuses may be integrated in the same electronic device, so as to cooperatively perform a translation service. The embodiments of the present description do not limit the relationship between the two.
Through the technical scheme, the translation device is used for carrying out word list lookup on the target original text, the found word list vector represented by the quantized low-bit precision is inversely quantized into a final value of a high bit, the word list vector represented by the final value is input into a translation model for reasoning and prediction, and a translation result is output. Because the vocabulary vector of the translation model used in the translation process is subjected to quantization processing, the space occupation of the system memory is small, and sufficient system memory is provided for inference prediction. Although the final value of the high-order bits obtained after inverse quantization is not the original value of the elements in the vocabulary vector, the bits of the final value and the original value are the same, and the number of the elements contained in each vocabulary vector is relatively large, so that the difference between the final value and the original value can be basically ignored when the model network calculation is carried out subsequently. Therefore, the translation result predicted by final reasoning does not have too large deviation, and the translation quality loss is ensured to be small.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 6, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a translation model compression device on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
extracting a word list vector set from a target translation model obtained by training, wherein elements of word list vectors in the word list vector set are in normal distribution; determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set; according to the determined quantization precision, calculating the number of the to-be-divided sections, dividing the quantization interval into a plurality of quantization sections with the same number as the sections in an equal division mode, and allocating a unique interval number to each quantization section; respectively replacing the original values corresponding to the elements of the vocabulary vectors contained in each quantization interval with the interval numbers of the corresponding quantization interval, and writing the vocabulary vectors represented by the interval numbers into the target translation model; and the bit of the interval number is smaller than the bit of the original value.
The method performed by the apparatus according to the embodiments shown in fig. 1 and fig. 3 of the present specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The methods, steps, and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may also execute the method shown in fig. 1 and fig. 3, and implement the functions of the corresponding apparatus in the embodiments shown in fig. 1 and fig. 3, which are not described herein again in this specification.
Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 7, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the translation device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
determining a target original text to be translated; searching a word list vector corresponding to each word list in at least one word list contained in the target original text from a target translation model based on the target original text, wherein the word list vector stored in the target translation model is a word list vector which is represented by interval numbers after quantization processing is carried out by using the translation model compression method in the embodiment one; determining the final value of each element in the vocabulary vector in an inverse quantization mode according to the corresponding relation between the vocabulary vector and the local storage, wherein the corresponding relation is pre-stored by the translation model compression method in the embodiment one, and the bit positions of the final value of each element in the vocabulary vector determined in the inverse quantization mode are the same as the bit positions of the original value; and inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for reasoning and prediction, and outputting a translation result.
The method performed by the apparatus according to the embodiments shown in fig. 2 and fig. 3 of the present specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The methods, steps, and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may also execute the method in fig. 2 and fig. 3, and implement the functions of the corresponding apparatus in the embodiments shown in fig. 2 and fig. 3, which are not described herein again in this specification.
Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Embodiments of the present specification also provide a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiments shown in fig. 1-3, and in particular to perform the method of:
extracting a word list vector set from a target translation model obtained through training, wherein elements of word list vectors in the word list vector set are in normal distribution; determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set; according to the determined quantization precision, calculating the number of the to-be-divided sections, dividing the quantization interval into a plurality of quantization sections with the same number as the sections in an equal division mode, and allocating a unique interval number to each quantization section; respectively replacing the original values corresponding to the elements of the vocabulary vectors contained in each quantization interval with the interval numbers of the corresponding quantization interval, and writing the vocabulary vectors represented by the interval numbers into the target translation model; and the bit of the interval number is smaller than the bit of the original value.
Or;
determining a target original text to be translated; searching a vocabulary vector corresponding to each vocabulary in at least one vocabulary contained in the target original text from a target translation model based on the target original text, wherein the vocabulary vector stored in the target translation model is a vocabulary vector which is represented by interval numbers after quantization processing is carried out by using the translation model compression method in the embodiment one; determining the final value of each element in the vocabulary vector in an inverse quantization mode according to the corresponding relation between the vocabulary vector and the local storage, wherein the corresponding relation is pre-stored by the translation model compression method in the embodiment one, and the bit positions of the final value of each element in the vocabulary vector determined in the inverse quantization mode are the same as the bit positions of the original value; and inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for reasoning and prediction, and outputting a translation result.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.
The system, apparatus, module or unit illustrated in one or more embodiments above may be implemented by a computer chip or an entity, or by an article of manufacture with a certain functionality. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims (16)

1. A translation model compression method, comprising:
extracting a word list vector set from a target translation model obtained through training, wherein elements of word list vectors in the word list vector set are in normal distribution;
determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set;
according to the determined quantization precision, calculating the number of the to-be-divided sections, dividing the quantization interval into a plurality of quantization sections with the same number as the sections in an equal division mode, and allocating a unique interval number to each quantization section;
respectively replacing the original values corresponding to the elements of the vocabulary vectors contained in each quantization interval with the interval numbers of the corresponding quantization interval, and writing the vocabulary vectors represented by the interval numbers into the target translation model;
and the bit of the interval number is smaller than the bit of the original value.
2. The translation model compression method of claim 1, determining a quantization interval based on a result of normal distribution of the vocabulary vector set mapping, comprising:
and selecting a word list vector distribution interval meeting a preset constraint condition as a quantization interval from normal distribution mapped by the word list vector set.
3. The translation model compression method of claim 2, the preset constraints comprising: all the vocabulary vectors in the vocabulary vector set;
selecting a vocabulary vector distribution interval meeting a preset constraint condition as a quantization interval in normal distribution mapped by the vocabulary vector set, wherein the selection comprises the following steps:
and selecting a vocabulary vector distribution interval corresponding to all vocabulary vectors as a quantization interval based on the normal distribution condition mapped by the vocabulary vector set, wherein the maximum boundary of the quantization interval is the original value of the maximum element in the vocabulary vector set, and the minimum boundary is the original value of the minimum element in the vocabulary vector set.
4. The translation model compression method of claim 2, the preset constraints comprising: the word list vector concentration degree meets the constraint sub-condition;
selecting a vocabulary vector distribution interval meeting a preset constraint condition as a quantization interval in normal distribution mapped by the vocabulary vector set, wherein the selection comprises the following steps:
selecting a word list vector distribution interval with the word list vector concentration degree meeting constraint sub-conditions from the normal distribution condition mapped by the word list vector set as a quantization interval;
wherein the constraint sub-conditions include: the value of the element at the maximum boundary of the word list vector distribution interval meets a first threshold value and/or the value of the element at the minimum boundary meets a second threshold value; or the density of the elements at the maximum boundary of the vocabulary vector distribution interval meets a third threshold and/or the density of the elements at the minimum boundary meets a fourth threshold.
5. The translation model compression method of any one of claims 1-3, wherein the number of segments is N, and N is a positive integer;
calculating the number of the to-be-divided blocks according to the determined quantization precision, dividing the quantization interval into a plurality of quantization blocks with the same number as the blocks in an equal division manner, and allocating a unique interval number to each quantization block, including:
calculating by taking the determined quantization precision as an index of a base number 2, and subtracting 1 from the calculation result to obtain the number N of the to-be-divided sections; the quantization interval is evenly divided into N quantization intervals in an equal division mode, and each quantization interval is respectively endowed with an interval number by using each positive integer from 0 to N-1.
6. The translation model compression method of any one of claims 1-2, 4, wherein the number of segments is N, and N is a positive integer;
calculating the number of the to-be-divided blocks according to the determined quantization precision, dividing the quantization interval into a plurality of quantization blocks with the same number as the blocks in an equal division manner, and allocating a unique interval number to each quantization block, including:
calculating by taking the determined quantization precision as an index of a base number 2, and subtracting 1 from the calculation result to obtain the number N of the to-be-divided sections; uniformly dividing the quantization interval into N quantization intervals in an equal division manner, and respectively assigning an interval number to each quantization interval by using each positive integer from 0 to N-1; and uniformly assigning a special number N to distribution intervals except the quantization intervals.
7. The translation model compression method of claim 6, after writing a vocabulary vector represented by an interval number to the target translation model, the method further comprising:
and respectively writing the vocabulary vectors contained in the distribution intervals except the quantization intervals into the target translation model by corresponding original values, and uniformly marking by using a special number N.
8. The translation model compression method according to any one of claims 1 to 4, wherein the quantization interval is a symmetrically distributed interval or an asymmetrically distributed interval.
9. The translation model compression method of any one of claims 1-4, after writing a vocabulary vector represented by an interval number to the target translation model, the method further comprising:
and storing the corresponding relation between each quantization interval and the allocated interval number, wherein the quantization interval is a word list vector distribution interval taking the original value corresponding to the element of the word list vector as the boundary.
10. A method of translation, comprising:
determining a target original text to be translated;
searching a word list vector corresponding to each word list in at least one word list contained in the target original text from a target translation model based on the target original text, wherein the word list vector stored in the target translation model is the word list vector subjected to quantization processing by using the translation model compression method of any one of claims 1-9;
determining the final value of each element in the vocabulary vector by inverse quantization according to the corresponding relationship between the vocabulary vector and the local storage, wherein the corresponding relationship is pre-stored by the translation model compression method of claim 9, and the bits of the final value of each element in the vocabulary vector determined by inverse quantization are the same as the bits of the original value;
and inputting the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for reasoning and prediction, and outputting a translation result.
11. The translation method according to claim 10, wherein determining the final value corresponding to each element in the vocabulary vector by inverse quantization according to the correspondence between the vocabulary vector and the locally stored word, comprises:
if the word list vector is represented by the interval number, searching a quantization interval section matched with the interval number corresponding to each element in the word list vector from the corresponding relation stored locally based on the word list vector; determining the node value which accords with a preset rule in the quantization interval as the final value of the element corresponding to the interval number;
and if the word list vector is marked by a special number, directly determining the original value of each element in the word list vector as the final value of each element in the word list vector.
12. The translation method according to claim 11, wherein the node value meeting the predetermined rule is a maximum boundary value, or a minimum boundary value, or a median of the quantized segment.
13. A translation model compression apparatus comprising:
the extraction module is used for extracting a word list vector set from a target translation model obtained through training, wherein elements of word list vectors in the word list vector set are in normal distribution;
the determining module is used for determining a quantization interval based on a normal distribution result mapped by the vocabulary vector set;
the dividing module is used for calculating the number of the to-be-divided sections according to the determined quantization precision, dividing the quantization interval into a plurality of quantization sections with the same number as the sections in an equal division mode, and allocating a unique interval number to each quantization section;
the quantization module is used for replacing the original value corresponding to the element of the vocabulary vector contained in each quantization interval with the interval number of the corresponding quantization interval and writing the vocabulary vector represented by the interval number into the target translation model;
and the bit of the interval number is smaller than the bit of the original value.
14. A translation device, comprising:
the first determining module is used for determining a target original text to be translated;
a searching module, which searches a vocabulary vector corresponding to each vocabulary in at least one vocabulary contained in a target original text from a target translation model based on the target original text, wherein the vocabulary vector stored in the target translation model is a vocabulary vector which is represented by an interval number after being quantized by the translation model compression method of any one of claims 1-9;
a second determining module, configured to determine, by inverse quantization, a final value corresponding to each element in the vocabulary vector according to a correspondence between the vocabulary vector and a locally stored vector, where the correspondence between the vectors is pre-stored by the translation model compression method according to claim 9, and bits of the final value of each element in the vocabulary vector determined by inverse quantization are the same as bits of an original value;
and the translation module inputs the final value determined by inverse quantization of each word list contained in the target original text into the target translation model for inference prediction and outputs a translation result.
15. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that when executed cause the processor to perform the translation model compression method of any one of claims 1 to 9, or the translation method of any one of claims 10 to 12.
16. A computer readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the translation model compression method of any one of claims 1 to 9 or the translation method of any one of claims 10 to 12.
CN202210344547.8A 2022-03-31 2022-03-31 Translation model compression method, translation method and related device Pending CN114662485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210344547.8A CN114662485A (en) 2022-03-31 2022-03-31 Translation model compression method, translation method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210344547.8A CN114662485A (en) 2022-03-31 2022-03-31 Translation model compression method, translation method and related device

Publications (1)

Publication Number Publication Date
CN114662485A true CN114662485A (en) 2022-06-24

Family

ID=82033547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210344547.8A Pending CN114662485A (en) 2022-03-31 2022-03-31 Translation model compression method, translation method and related device

Country Status (1)

Country Link
CN (1) CN114662485A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873789A (en) * 2024-03-13 2024-04-12 之江实验室 Checkpoint writing method and device based on segmentation quantization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873789A (en) * 2024-03-13 2024-04-12 之江实验室 Checkpoint writing method and device based on segmentation quantization
CN117873789B (en) * 2024-03-13 2024-05-10 之江实验室 Checkpoint writing method and device based on segmentation quantization

Similar Documents

Publication Publication Date Title
US11144817B2 (en) Device and method for determining convolutional neural network model for database
CN114840327A (en) Multi-mode multi-task processing method, device and system
CN111243571B (en) Text processing method, device and equipment and computer readable storage medium
CN110968671A (en) Intent determination method and device based on Bert
CN112287968A (en) Image model training method, image processing method, chip, device and medium
CN112380401B (en) Service data checking method and device
CN115203394A (en) Model training method, service execution method and device
CN111507726B (en) Message generation method, device and equipment
CN114662485A (en) Translation model compression method, translation method and related device
CN113408704A (en) Data processing method, device, equipment and computer readable storage medium
CN104021117A (en) Language processing method and electronic device
CN116741197B (en) Multi-mode image generation method and device, storage medium and electronic equipment
CN117196000A (en) Edge side model reasoning acceleration method for containerized deployment
CN116312459A (en) Speech synthesis method, device, electronic equipment and storage medium
KR20230094696A (en) Quantization framework apparatus for efficient matrix decomposition in recommender system and learning method thereof
CN114926706A (en) Data processing method, device and equipment
CN111639260B (en) Content recommendation method, content recommendation device and storage medium
CN114841325A (en) Data processing method and medium of neural network model and electronic device
CN114035804A (en) Code conversion method, device, medium and electronic equipment
CN112926334A (en) Method and device for determining word expression vector and electronic equipment
CN113408724A (en) Model compression method and device
CN114817469B (en) Text enhancement method, training method and training device for text enhancement model
CN110633801A (en) Deep learning model optimization processing method and device and storage medium
US20220398413A1 (en) Quantization method and device for neural network model, and computer-readable storage medium
CN111144066B (en) Adjusting method, device and equipment for font of font library and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination