CN111027619B

CN111027619B - Memristor array-based K-means classifier and classification method thereof

Info

Publication number: CN111027619B
Application number: CN201911248887.5A
Authority: CN
Inventors: 李祎; 周厚继; 陈佳; 缪向水
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2022-03-15
Anticipated expiration: 2039-12-09
Also published as: CN111027619A

Abstract

The invention discloses a memristor array-based K-means classifier and a classification method thereof, dimension information of a clustering center of a K-means algorithm is taken as a training weight, mapped and stored in a memristor array, simulating the dimension information of a clustering center by using the neural network weight, realizing the calculation of Euclidean distance based on the gradient characteristic of a memristor, and directly realizes the online update of each weight of the clustering center on a hardware circuit, realizes the data clustering of a large amount of non-normalized data on the basis of the hardware circuit, reduces the computational complexity caused by data normalization and the circuit complexity caused by the change of the computational weight of an external circuit, meanwhile, the data complexity in the data distance calculation process is reduced, the data storage time and the operation power consumption are reduced, the data interaction consumption is saved, and the calculation time is shorter.

Description

Memristor array-based K-means classifier and classification method thereof

Technical Field

The invention belongs to the technical field of artificial neural networks, and particularly relates to a memristor array-based K-means classifier and a classification method thereof.

Background

With the advent of the network age, the emergence of large amounts of data has made it increasingly difficult to classify data and extract data features that are effective therein. Data classification is the process of algorithmically identifying and grouping together data points that have identical or similar characteristics. The core of classification is to obtain features between different sample data and calculate its generalized distance (or similarity) to achieve the purpose of distinguishing different samples. With the increase of data volume, the computation amount of the classification algorithm increases geometrically, which requires higher data computation and processing capacity of a computing system CPU. The 'von Neumann bottleneck' under the existing computing architecture greatly limits the data classification capability under the big data environment. Memristors are considered one of the best candidates to break the "von neumann bottleneck" limitation with their efficient computing-integrated capabilities as well as parallel computing capabilities.

The K-means algorithm is used as a basic unsupervised clustering algorithm, has the remarkable advantages of high convergence rate, simplicity in operation, few adjustable parameters and the like, and can effectively process the data clustering problem in a big data environment. In the existing application, the problems in the data classification research based on the memristor array are mainly reflected in that: (1) the application of the memristor array is realized based on a networked structure, the existing classification realized by utilizing the memristor array is mainly concentrated on complex network algorithms such as a BP (back propagation) neural network and a multilayer perceptron, the network structure is realized by combining software and hardware at the same time, the classification cannot be realized by independently depending on the hardware structure of the memristor, and the research on non-networked structure clustering algorithms such as K-means is still in the primary stage. (2) In the traditional K-means algorithm, a cluster center adopts a memoryless mean value updating mode, continuity does not exist before and after the update of the cluster center, the cluster center cannot be effectively combined with a weight value updating mode of a neural network, the online update of the weight and the complete expression of the Euclidean distance cannot be realized, and the calculation complexity is high. (3) The K-means algorithm relies on the calculation of the Euclidean distance between the clustering center and sample data to realize clustering, and the traditional Euclidean distance calculation method based on hardware cannot realize square term calculation of input data, so that the error of a data classification result is large, and the accuracy is low.

In summary, it is an urgent need to solve the above-mentioned problems to provide a K-means classifier with low computational complexity and high accuracy and a classification method thereof.

Disclosure of Invention

In view of the above defects or improvement requirements of the prior art, the invention provides a memristor array-based K-means classifier and a classification method thereof, and aims to solve the problem of high computational complexity caused by the fact that online updating of weights and complete expression of Euclidean distances cannot be realized on hardware in the prior art.

In order to achieve the above object, in a first aspect, the present invention provides a memristor array-based K-means classifier, including a first control module, a memristor array, a second control module, a data comparison module, and an output module;

the memristor array comprises a first memristor array, a second memristor array, a third memristor array and a fourth memristor array, each bit line of the first memristor array is connected with each bit line of the fourth memristor array, each bit line of the second memristor array is connected with each bit line of the third memristor array, each word line of the first memristor array is connected with each word line of the second memristor array, and each word line of the third memristor array is connected with each word line of the fourth memristor array;

the first control module is used for randomly selecting a clustering center from an input data set to be classified, respectively storing the clustering center into a first memristor array and a second memristor array after being subjected to writing voltage coding, and respectively storing data to be classified in the data set to be classified into a third memristor array and a fourth memristor array after being subjected to writing voltage coding; after reading voltage coding is carried out on the data to be classified and the opposite numbers of the weights of the clustering center, the data to be classified and the opposite numbers of the weights of the clustering center are respectively applied to bit lines of the second memristor array and the first memristor array, wherein the information of each dimension of the clustering center is the weight;

the memristor array is used for realizing dot product operation between the data to be classified after the read voltage coding input by the first control module and the opposite number of each weight and the self-stored data on the clustering center and the row where the data to be classified are located, accumulating the obtained result according to the row and outputting the accumulated result to the second control module;

the second control module is used for subtracting the calculation results of the row where the data to be classified and the clustering center input by the memristor array are located to obtain the Euclidean distance between the clustering center and the data to be classified, and outputting the Euclidean distance to the data comparison module;

the data comparison module is used for dividing the data to be classified into the class where the clustering center closest to the data to be classified is located, and outputting the classification result to the second control module and the output module respectively;

the second control module is also used for determining the row where the clustering center to be updated is located according to the classification result input by the data comparison module, and respectively outputting the data to be classified in the memristor array and the row where the clustering center to be updated is located after reading voltage coding is carried out on the preset learning rate and the opposite number of the preset learning rate;

the memristor array is also used for realizing the dot product operation between the preset learning rate and the inverse number thereof input by the second control module and the self-stored data on the row where the to-be-classified data and the to-be-updated clustering center are respectively located, accumulating the obtained results according to columns to obtain each weight change value, and outputting the weight change value to the first control module;

the first control module is also used for respectively outputting each weight change value input by the memristor array to a memristor array bit line after being subjected to write coding;

the memristor array is also used for updating the weight of the clustering center to be updated based on each weight change value input on the bit line of the first control module;

and the output module is used for outputting the classification result of the data to be classified input by the data comparison module when the weight of the clustering center is not changed any more.

Further preferably, the memristor array is in translational symmetry with reference to a center line.

Further preferably, the memristor array size is (k +1) × 2M, where k is the number of cluster classes and M is the dimension of sample data; the first memristor array and the second memristor array are in translational symmetry by taking a central line as a reference, and are formed by k rows of memristors and M columns of memristors; the third memristor array and the fourth memristor array are in translational symmetry with the center line as a reference and are formed by 1 row and M columns of memristors.

In a second aspect, the invention provides a memristor array-based K-means classification method, which comprises the following steps:

s1, randomly selecting k data from the data set to be classified as an initial clustering center, and respectively storing the k data into a first memristor array and a second memristor array after writing voltage coding, wherein k is the clustering number;

s2, selecting first data in a data set to be classified as data to be classified, and storing the data to be classified into a third memristor array and a fourth memristor array after writing voltage coding;

s3, after reading voltage coding is carried out on the data to be classified and the opposite numbers of the weights of the first clustering center, the data to be classified and the opposite numbers of the weights of the first clustering center are respectively applied to bit lines of a second memristor array and a first memristor array, dot product operation between the data to be classified and the opposite numbers of the weights of the first clustering center, which are input by the first control module, and self-stored data is respectively realized on the rows where the first clustering center and the data to be classified are located, and the obtained results are accumulated according to the rows and then subtracted to obtain the Euclidean distance between the first clustering center and the data to be classified;

s4, sequentially calculating Euclidean distances between the data to be classified and the rest clustering centers according to the method in the step S3;

s5, dividing the data to be classified into the class where the clustering center closest to the data to be classified is located, and determining the row where the clustering center to be updated is located according to the classification result;

s6, respectively inputting the preset learning rate and the inverse number thereof after the reading voltage coding to the row where the data to be classified and the cluster center to be updated are located, realizing the dot product operation between the preset learning rate and the inverse number thereof input by the second control module and the self-stored data, accumulating the obtained results according to columns to obtain the change value of each weight of the cluster center to be updated, writing the obtained change value into the memristor node of the cluster center to be updated, and updating the weight;

s7, sequentially dividing the residual data in the data set to be classified into corresponding categories according to the method of the steps S2-S6;

s8, repeating the steps S2-S7 to iterate until the weight of each cluster center is not changed;

the first memristor array is connected with each bit line of the fourth memristor array, the second memristor array is connected with each bit line of the third memristor array, the first memristor array is connected with each word line of the second memristor array, the third memristor array is connected with each word line of the fourth memristor array, and each dimension information of the clustering center is the weight.

Further preferably, after the data is written into the memristor by the writing voltage coding, the conductance value of the memristor is linearly related to the actual size of the data.

Further preferably, a first cluster center in the first memristor array is selected, a read voltage encoded coefficient-1 is applied to a bit line of the first cluster center, and the opposite number of each weight of the first cluster center is obtained.

Further preferably, the euclidean distance is determined by an amount of charge accumulation due to an output current on the memristor row, and the amount of accumulated charge is proportional to the euclidean distance.

Further preferably, the cluster center to be updated is the cluster center closest to the data to be classified.

Further preferably, the weight change value Δ W is represented by:

ΔW＝η(U_i-W_p)

wherein eta represents the learning rate, U_iFor the ith data to be classified, W_pIs the cluster center to be updated.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

1. the invention provides a memristor array-based K-means classifier, which is characterized in that a memristor array structure is utilized, a K-means clustering center with practical significance is directly mapped and stored into an array node, the structural networking of an algorithm is realized, all dimension information of the clustering center is used as the weight of the network, the practical significance of the network weight is increased, the non-normalized input data clustering is realized, and the calculation complexity caused by data normalization is reduced; by applying the conductance value gradient characteristic of the memristor to calculation of Euclidean distance and weight updating of multi-dimensional data, the problem of high calculation complexity caused by the fact that complete expression of Euclidean distance and online updating of weight cannot be achieved on hardware in the prior art is solved.

2. The invention provides a memristor array-based K-means classification method, which is characterized in that all dimension information of a clustering center of a K-means algorithm is used as a training weight, the conductance value gradient characteristic of a memristor is applied to the calculation of Euclidean distance of multi-dimensional data, the problem of complete expression of the Euclidean distance on the memristor array is solved, the learning rate is directly applied to the memristor array through voltage coding, the online updating of the clustering center on a hardware circuit is further realized, the circuit complexity caused by the calculation weight change of an external circuit is greatly reduced, and the time and energy consumption of data interaction are saved.

3. The memristor gradient characteristic is applied to Euclidean distance simulation calculation, the method can be used for calculating the Euclidean distance between input data and a clustering center to realize K-means clustering, and the problem of similarity calculation of algorithms such as KNN (K nearest neighbor) and RBF (radial basis function) neural networks and the like in other similar algorithms in a hardware circuit can be solved.

4. According to the K-means classifier based on the memristor array, due to the high-density structure of the nanoscale memristor array and the information storage capacity of the memristor resistor, the circuit size is small, the energy consumption is lower than that of a traditional CMOS structure, the overall performance is better than that of an existing computing framework, and the K-means classifier based on the memristor array is more suitable for an edge computing scene.

Drawings

FIG. 1 is a structural schematic diagram of a memristor array-based K-means classifier provided by the invention;

FIG. 2 is a schematic diagram of a memristor array structure provided by the present disclosure;

FIG. 3 is a flow chart of the K-means clustering algorithm provided by the present invention;

FIG. 4 is a diagram of a neural network weight mapping scheme provided by the present invention;

FIG. 5 is a schematic diagram of a memristor array based weight reading method provided by the present disclosure;

FIG. 6 is a schematic diagram of a method for calculating Euclidean distances between data to be classified and a cluster center based on a memristor array according to the present invention;

FIG. 7 is a schematic diagram of an update method for a cluster center based on a memristor array provided by the present invention; wherein, the diagram (a) is a schematic diagram of a method for calculating a weight change value, and the diagram (b) is a process for writing the weight change value into a row to be updated.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In order to achieve the above object, in a first aspect, the present invention provides a memristor array-based K-means classifier, as shown in fig. 1, including a first control module 1, a memristor array 2, a second control module 3, a data comparison module 4, and an output module 5;

the first control module 1 is connected with the memristor array 2 in a bidirectional mode, the memristor array 2 is connected with the second control module 3 in a bidirectional mode, the second control module 3 is connected with the data comparison module 4 in a bidirectional mode, and the data comparison module 4 is connected with the output module 5. As shown in fig. 2, the memristor array 1 includes a first memristor array 21, a second memristor array 22, a third memristor array 23, and a fourth memristor array 24, each bit line of the first memristor array 21 and the fourth memristor array 24 is connected, each bit line of the second memristor array 22 and each bit line of the third memristor array 23 are connected, each word line of the first memristor array 21 and each word line of the second memristor array 22 are connected, and each word line of the third memristor array 23 and each word line of the fourth memristor array 24 are connected;

the first control module 1 is used for randomly selecting a clustering center from an input data set to be classified, respectively storing the clustering center into the first memristor array 21 and the second memristor array 22 after being subjected to writing voltage coding, and respectively storing data to be classified in the data set to be classified into the third memristor array 23 and the fourth memristor array 24 after being subjected to writing voltage coding; after reading voltage coding is carried out on the data to be classified and the opposite numbers of the weights of the clustering center, the data to be classified and the opposite numbers of the weights of the clustering center are respectively applied to bit lines of the second memristor array 22 and the first memristor array 21, wherein the information of each dimension of the clustering center is the weight;

the memristor array 2 is used for realizing dot product operation between the data to be classified after the read voltage coding input by the first control module 1 and the inverse number of each weight and the self-stored data on the clustering center and the row where the data to be classified are located, accumulating the obtained results according to the rows, and outputting the accumulated results to the second control module 3;

the second control module 3 is used for subtracting the calculation results of the row where the data to be classified and the clustering center input by the memristor array 2 are located to obtain the Euclidean distance between the clustering center and the data to be classified, and outputting the Euclidean distance to the data comparison module 4;

the data comparison module 4 is used for dividing the data to be classified into the class where the clustering center closest to the data to be classified is located, and outputting the classification result to the second control module 3 and the output module 5 respectively;

the second control module 3 is further configured to determine a row where the clustering center to be updated is located according to the classification result input by the data comparison module 4, and output the row where the clustering center to be updated and the data to be classified in the memristor array 2 are located respectively after reading voltage coding is performed on the preset learning rate and the inverse number thereof;

the memristor array 2 is further used for realizing dot product operation between the preset learning rate and the inverse number thereof input by the second control module 3 and self-stored data on the row where the to-be-classified data and the to-be-updated clustering center are located, accumulating the obtained results according to columns to obtain each weight change value, and outputting the weight change value to the first control module 1;

the first control module 1 is further configured to output each weight change value input by the memristor array 2 to a bit line of the memristor array 2 after being subjected to write coding;

the memristor array 2 is further used for updating the weight of the clustering center to be updated based on each weight change value input on the bit line of the first control module 1;

the output module 5 is used for outputting the classification result of the data to be classified input by the data comparison module 4 when the weight of the clustering center is not changed any more.

Specifically, the memristor array 2 has a data storage function and a data storage function, wherein the data storage function is to convert a data voltage obtained by encoding input data through a write voltage into a conductance value of a memristor node to be stored in the array; the data calculation function is to convert the data voltage of the input data after reading the voltage code and the conductance of the node into current and accumulate the charge.

In this embodiment, for the to-be-classified data set S ═ { U ═ U₁,U₂,…,U_tEach to-be-classified data U_iWith M data dimensions, i.e. U_i＝{x_i1,x_i2,…,x_iMDesignating the data in S as k classes, k cluster centers W ═ W are generated₁,W₂,…,W_kAnd each cluster center and data to be classified have M dimensions, namely W_j＝{y_i1,y_i2,…,y_iM}. As shown in fig. 2, the present embodiment employs a memristor array of (k +1) × 2M size, which is in translational symmetry with respect to a central line, wherein the first memristor array and the second memristor array are in translational symmetry with respect to the central line, and are each formed by k rows and M columns of memristors; the third memristor array and the fourth memristor array are in translational symmetry by taking a center line as a reference, and are formed by 1 row and M columns of memristors, wherein the size of the memristor array is (k +1) × 2M, wherein k is the number of clustering classes, and M is the dimensionality of the classified data. The first memristor array and the second memristor array are used for storing dynamically-changed clustering center W ═ W₁,W₂,…,W_k}. In a memristor array, each node of the array represents one dimension of one data. M dimensions, namely M weights, of one clustering center are sequentially stored in each row of memristor units of the first memristor array from left to right, and each weight of K clustering centers can be stored in the M × K memristor units of the first memristor array. Similarly, K cluster centers may be stored into a second memristor array. The third memristor array and the fourth memristor array are used for storing the ith input data U to be classified_iThe data stored by the first memristor array, the second memristor array, the third memristor array and the fourth memristor array are identical and are in translational symmetry with the center line as a reference.

Specifically, the first control module 1 includes a data input unit 11, a first read-write encoding unit 12, a first buffer unit 13, and a second buffer unit 14; the second control module 3 comprises a third buffer unit 31, a second read-write encoding unit 32 and a subtraction unit 33; the output module 5 comprises an output buffer unit 51 and a result output unit 52;

wherein, the output end of the data input unit 11 is connected with one end of the first read-write coding unit 12, the other end of the first read-write coding unit 12 is respectively bidirectionally connected with one end of the first buffer unit 13 and one end of the second buffer unit 14, the bit lines of the first memristor array 21 and the fourth memristor array 24 are bidirectionally connected with the other end of the first buffer unit 13, the bit lines of the second memristor array 22 and the third memristor array 23 are bidirectionally connected with the other end of the second buffer unit 14, the word lines of the memristor array 2 are bidirectionally connected with one end of the third buffer unit 31, the other end of the third buffer unit 31 is respectively connected with the second read-write coding unit 32, one end of the subtraction unit 33 and one end of the data comparison module 4 are connected in a bidirectional manner, the other end of the data comparison module 4 is connected with the input end of the output cache unit 51, and the output end of the output cache unit 51 is connected with the input end of the result output unit 52;

FIG. 3 is a flow chart of the K-means clustering algorithm, which mainly includes data input stages S1-S2, distance calculation stages S3, and weight update stages S4-S5.

Correspondingly, the functions of each module and unit in the K-means classifier in FIG. 1 are as follows:

a data input stage: the data input unit 11 receives the input of the data set to be classified, selects the cluster center data and the data to be classified, and outputs to the first read-write encoding unit 12, the first read-write encoding unit 12 encodes the clustering center data and the data to be classified input by the data input unit 11 based on the write voltage with fixed amplitude, and the encoded data are respectively input to the bit lines of the memristor array 2 through the first buffer unit 13 and the second buffer unit 14, therefore, the clustering centers are respectively stored in the first memristor array 21 and the second memristor array 22, the write-coded data to be classified input by the first read-write module 12 is input to the bit lines of the memristor array 2, therefore, the data to be classified are respectively stored in the third memristor array 23 and the fourth memristor array 24, and the storage of the data to be classified and the dimension information of the clustering center by the memristor array 2 is completed.

A distance calculation stage: with each dimension information of the clustering center as a weight, after reading voltage coding is carried out on data to be classified and the opposite number of each weight, the data to be classified and the opposite number of each weight are applied to bit lines of a second memristor array 22 and a first memristor array 21 after passing through a second cache unit 14 and a first cache unit 13 respectively; the memristor array 2 is used for realizing dot product operation between the data to be classified after input read voltage encoding and the opposite number of each weight and self-stored data on the row where the data to be classified and the cluster center are located, accumulating the obtained results according to the row, outputting the accumulated results to the subtraction unit 33 through the third cache module for subtraction operation, obtaining Euclidean distances between the data to be classified and each cluster center, and outputting the Euclidean distances to the data comparison module 4 through the third cache module 31.

And a weight updating stage: the data comparison module 4 receives the euclidean distances between the data to be classified input by the third cache unit 31 and the clustering centers, compares the euclidean distances, divides the data to be classified into the class where the clustering center closest to the data to be classified is located, outputs the classification result to the third cache unit 31 and the output cache unit 51, and stores the temporary classification result in the output cache unit 51; the third cache unit 31 determines the row where the clustering center to be updated is located according to the classification result input by the data comparison module 4, and outputs the data to be classified and the row where the clustering center to be updated is located in the memristor array 2 after reading voltage coding is performed on the preset learning rate and the opposite number thereof; the memristor array 2 is respectively arranged on the row of the data to be classified and the cluster center to be updated, the dot product operation between the preset learning rate and the inverse number thereof input by the third cache unit 31 and the self-stored data is carried out, the obtained results are accumulated according to columns to obtain each weight change value, and the weight change values are output to the first cache unit 13 and the second cache unit 14; after the first buffer unit 13 and the second buffer unit 14 output the weight change values to the first read-write encoding unit 12 for write encoding, the weight change values are output to bit lines of the memristor array 2 through the first buffer unit 13 and the second buffer unit 14, so that the memristor array updates the weight of the clustering center to be updated.

After each data to be classified in the data set to be classified is subjected to the above process for multiple times, when the category of each data in the data set to be classified is not changed any more, the classification result is transmitted and output to the result output unit 52, so that the final classification result is output.

In a second aspect, the invention provides a memristor array-based K-means classification method. The invention simplifies the K-means algorithm into a single-layer perceptron model, inputs all dimension information of the data to be classified, outputs the class of the data to be classified, and trains the weight as all dimension information of the clustering center. And (3) realizing the sensor model by using the memristor array, repeatedly using data in the data set S to train a clustering center on line, completing the updating of all dimension information, namely weight, and finally realizing clustering. Fig. 4 shows a neural network weight mapping method provided by the present invention.

Specifically, the invention provides a memristor array-based K-means classification method, which comprises the following steps:

s1, classifying the data set S ═ U₁,U₂,…,U_tRandomly selecting k data as initial clustering center weight W ═ W₁,W₂,…,W_kWriting voltage codes are carried out, and then the codes are respectively stored in a first memristor array and a second memristor array, wherein k is a clustering number;

specifically, taking the K-means classifier provided in the first aspect of the present invention as an example, M dimensional data of a jth (j is 1,2, …, K) clustering center are sequentially encoded by using a first read-write data encoding module, and after passing through a first cache unit and a second cache unit, the encoded data to be classified are respectively written into jth row memristor nodes of a first memristor array and a second memristor array.

S2, selecting the first data U in the data set to be classified₁The data to be classified are respectively stored in a third memristor array and a fourth memristor array after being coded by write voltage;

specifically, after the data is written into the memristor by the writing voltage code, the conductance value of the memristor is linearly related to the actual size of the data, and is represented as follows:

wherein G is_xRepresenting the memristor conductance value after the data write voltage is coded and written into the memristor, X representing the data value, G_max,G_minRespectively representing maximum and minimum values of conductance, X_max,X_minRepresenting the maximum and minimum values of the data. The actual data is mapped to the device conductance accordingly, and since the encoding voltage amplitudes are the same, different numbers of voltages must be applied in order for the data to reach a certain conductance value, i.e. the write voltage encoding process. The writing data coding result is N ═ f (G)_x) And the function f is memristor pulse conductance characteristics.

S3, data U to be classified₁And the inverse of each weight of the first cluster center-W₁After read voltage coding is carried out, the read voltage coding is respectively applied to bit lines of the second memristor array and the first memristor array and respectively arranged in the first clustering center W₁And data U to be classified₁On the row, the dot product operation between the data to be classified after the read voltage coding input by the first control module and the opposite number of each weight of the first clustering center and the self-stored data is realized, and the obtained results are accumulated according to the rows and then subtracted to obtain the Euclidean distance between the first clustering center and the data to be classified;

specifically, each dimension information of the clustering center is the weight. Selecting a first cluster center W in a first memristor array₁Applying a coefficient-1 coded by reading voltage on a bit line of the first clustering device, and calculating to obtain the opposite number of each weight of the first clustering center based on ohm's law; as shown in fig. 5, which is a schematic diagram of a memristor array-based weight reading method provided by the present invention, a first clustering center W is selected by a third cache unit₁Applying a read voltage to the line, passing ohm's law through the first cluster center W₁The conductance values of the memristors on the lines act to obtain a first clustering center W₁The inverse of each weight. Specifically, the read voltage encoded coefficient-1 is input to the first clustering center W through the third buffer unit₁In the row, a first cache unit is adopted to collect a first memristor arrayThe current values on each bit line in the column can be used to obtain a first cluster center W₁The corresponding conductance value obtains the opposite number-y of each weight through the mapping relation between the conductance and the actual data₁₁,-y₁₂,…,-y_1MAnd stored in the first cache unit.

Specifically, as shown in FIG. 6, the inverse number (-y) of each weight of the first cluster center is set₁₁,-y₁₂,…,-y_1M) And data to be classified (x)₁₁,x₁₂,…,x_1M) The data of each M dimensionalities are coded by a read data coding module, the coded opposite numbers of each weight are input into a first memristor array and a fourth memristor array through a first cache unit, the coded data to be classified are input into a second memristor array and a third memristor array through a second cache unit, dot product operation between the data to be classified coded by read voltage input by a first control module and the opposite numbers of each weight and self-stored data is achieved, the obtained results are accumulated according to rows, and the obtained results are subtracted to obtain the Euclidean distance between a first clustering center and the data to be classified. The essence of the process is that the reading voltage is converted into current based on the conductance action of the data voltage coded by the input reading voltage and the node, the current can represent the result of addition after vector dot multiplication, and the third cache unit is used for collecting the first clustering center W₁All lines and data to be classified U₁The charge value of the row represents the result of the addition after the vector dot product operation. Data U characterizing the charge value₁ ²-U₁*W₁And a first cluster center W₁Data characterizing charge values of the row

Output to the subtraction unit via the third buffer unit for subtraction, i.e.

The data to be classified and the clustering center W are obtained₁And storing the obtained euclidean distance in a third cache unit. IntoIn one step, the Euclidean distance is determined by the accumulated charge amount brought by the current, and the accumulated charge amount is in direct proportion to the Euclidean distance.

The method applies the gradual change characteristic of the memristor to calculation of the Euclidean distance, not only can be used for calculating the Euclidean distance between input data and a clustering center, and realizing K-means clustering, but also can solve the similarity calculation problem of algorithms such as KNN and RBF neural networks in hardware circuits.

S4, calculating the data to be classified and the rest clustering centers W in sequence according to the method in the step S3₂,W₃,…,W_kThe Euclidean distance between;

s5, the obtained data to be classified and each clustering center W₁,W₂,…,W_kComparing Euclidean distances between the data to be classified, dividing the data to be classified into the class where the clustering center closest to the data to be classified is located, and determining the row where the clustering center to be updated is located according to the classification result, wherein the clustering center to be updated is the clustering center closest to the data to be classified.

Specifically, the charges representing the euclidean distance stored in the third cache unit are transmitted to the data comparison module, the data comparison module divides the data to be classified into the class where the cluster center closest to the data to be classified is located by comparing the magnitude of the charge amount, and feeds the classification result back to the third cache unit and outputs the classification result to the output cache unit.

S6, respectively inputting the preset learning rate eta and the inverse number-eta of the preset learning rate eta after reading voltage coding to the row where the data to be classified and the clustering center to be updated are located, realizing the dot product operation between the preset learning rate and the inverse number of the preset learning rate input by the second control module and the self-stored data, accumulating the obtained results according to columns, obtaining the change value of each weight of the clustering center to be updated, writing the obtained change value into the memristor node of the clustering center to be updated, and updating the weight;

specifically, as shown in fig. 7(a), the voltage pulse corresponding to the learning rate inverse- η after the voltage reading coding is input to the clustering center W closest to the data to be classified through the third buffer unit_pOn the line of the position, the position of the line is determined,meanwhile, voltage pulses corresponding to the learning rate eta after voltage reading coding are input to the data U to be classified through a third cache unit₁On the row. In this embodiment, the number of voltage pulses corresponding to the learning rate is set to 1, which is the minimum number of pulses, where the learning rate η is 0.1. Respectively obtaining the current values of the cluster center closest to the data to be classified and the row on which the data to be classified is located based on ohm's law, and calculating to obtain the weight change value delta W ═ eta (U) of the cluster center closest to the data to be classified after accumulating according to the rows_i-W_p) Where η represents the learning rate, UiU_iFor the ith data to be classified, W_pThe cluster center closest to the data to be classified. Then, as shown in fig. 7(b), Δ W is encoded and written by the write encoding unit into the clustering center W closest to the data to be classified in the memristor array_pOn the row, thereby enabling the update of the cluster center on the memristor array.

the first memristor array is connected with each bit line of the fourth memristor array, the second memristor array is connected with each bit line of the third memristor array, the first memristor array is connected with each word line of the second memristor array, and the third memristor array is connected with each word line of the fourth memristor array.

The invention provides a memristor array-based K-means classifier and a classification method thereof, which take all dimension information of a clustering center of a K-means algorithm as training weight and map the training weight in a memristor array, creatively provides a memristor array-based Euclidean distance calculation method, solves the problem of complete expression of Euclidean distance on the memristor array, and can be used for realizing data clustering of a large amount of data on the basis of a hardware circuit. According to the invention, the memristor array is utilized to reduce the data complexity in the data Euclidean distance calculation process, reduce the data storage time and the operation power consumption, and can be used for an edge calculation scene in the future.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A memristor array-based K-means classifier, comprising: the memristor array comprises a first control module, a memristor array, a second control module, a data comparison module and an output module;

the memristor array is used for realizing dot product operation between the data to be classified and the clustering centers after the read voltage codes input by the first control module are respectively arranged on the clustering centers and the rows where the data to be classified are arranged, and between the opposite numbers of the weights and the clustering centers, and accumulating the obtained results according to the rows and then outputting the accumulated results to the second control module;

the second control module is further used for determining a row where the clustering center to be updated is located according to the classification result input by the data comparison module, and outputting the data to be classified in the memristor array and the row where the clustering center to be updated is located after reading voltage coding is carried out on the preset learning rate and the opposite number of the preset learning rate;

the memristor array is further used for realizing dot product operation between the preset learning rate input by the second control module and the data to be classified and between the opposite number of the preset learning rate and the clustering center to be updated on the row where the data to be classified and the clustering center to be updated are located respectively, accumulating the obtained results according to columns to obtain each weight change value, and outputting the weight change value to the first control module;

the first control module is further used for outputting each weight change value input by the memristor array to a memristor array bit line after being subjected to write coding;

the memristor array is further used for updating the weight of the clustering center to be updated based on each weight change value input on the bit line of the first control module;

2. The memristor array-based K-means classifier of claim 1, wherein the memristor array is translational symmetric with respect to a centerline.

3. The memristor array-based K-means classifier according to claim 1, wherein the memristor array size is (K +1) x 2M, where K is the number of cluster classes and M is the dimension of sample data; the first memristor array and the second memristor array are in translational symmetry by taking a central line as a reference, and are formed by k rows of memristors and M columns of memristors; the third memristor array and the fourth memristor array are in translational symmetry with the center line as a reference, and are formed by 1 row and M columns of memristors.

4. A memristor array-based K-means classification method is characterized by comprising the following steps:

s3, after reading voltage coding is carried out on the data to be classified and the opposite numbers of the weights of the first clustering center, the data to be classified and the opposite numbers of the weights of the first clustering center are respectively applied to bit lines of a second memristor array and a first memristor array, dot product operation between the data to be classified and the first clustering center after the reading voltage coding input by the first control module and between the opposite numbers of the weights of the first clustering center and the first clustering center is respectively realized on the line where the first clustering center and the data to be classified are located, and the obtained results are accumulated according to the line rows and then subtracted to obtain the Euclidean distance between the first clustering center and the data to be classified;

s6, respectively inputting the preset learning rate and the opposite number of the preset learning rate after the reading voltage coding to the row where the data to be classified and the clustering center to be updated are located, realizing the dot product operation between the preset learning rate and the data to be classified input by the second control module and between the opposite number of the preset learning rate and the clustering center to be updated, accumulating the obtained results according to columns to obtain the change value of each weight of the clustering center to be updated, writing the obtained change value into the memristor node of the clustering center to be updated, and updating the weight;

the first memristor array is connected with each bit line of the fourth memristor array, the second memristor array is connected with each bit line of the third memristor array, the first memristor array is connected with each word line of the second memristor array, and the third memristor array is connected with each word line of the fourth memristor array; and the information of each dimension of the clustering center is the weight.

5. The classification method as claimed in claim 4, wherein after writing voltage encoding data to the memristor, the memristor conductance value is linearly related to the actual size of the data.

6. The classification method according to claim 4, wherein a first cluster center in the first memristor array is selected, and a read voltage encoded coefficient-1 is applied to its bit line, resulting in the opposite of each weight of the first cluster center.

7. The classification method according to claim 4, wherein the Euclidean distance is determined by an amount of charge accumulation due to an output current on the memristor row, the amount of accumulated charge being proportional to the Euclidean distance.

8. The classification method according to claim 4, wherein the cluster center to be updated is the cluster center closest to the data to be classified.

9. The classification method according to claim 4, wherein the variation Δ W of the weight is represented as:

ΔW＝η(U_i-W_p)