WO2019172837A1 - Method and system for deriving statistical information from encrypted data - Google Patents

Method and system for deriving statistical information from encrypted data Download PDF

Info

Publication number
WO2019172837A1
WO2019172837A1 PCT/SG2018/050100 SG2018050100W WO2019172837A1 WO 2019172837 A1 WO2019172837 A1 WO 2019172837A1 SG 2018050100 W SG2018050100 W SG 2018050100W WO 2019172837 A1 WO2019172837 A1 WO 2019172837A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
statistical information
discrete
vector
discrete intervals
Prior art date
Application number
PCT/SG2018/050100
Other languages
French (fr)
Inventor
Sze Ling YEO
Le Su
Lilei ZHENG
Chien Eao LEE
Ying Zhang
Vrizlynn Ling Ling Thing
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Priority to PCT/SG2018/050100 priority Critical patent/WO2019172837A1/en
Publication of WO2019172837A1 publication Critical patent/WO2019172837A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6209Protecting access to data via a platform, e.g. using keys or access control rules to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/30Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
    • H04L9/3093Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving Lattices or polynomial equations, e.g. NTRU scheme
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present invention generally relates to a method and a system for deriving statistical information from encrypted data, and more particularly, for deriving statistical information from encrypted data that has been encrypted without having to decrypt the encrypted data.
  • statistical information may include, but is not limited to, a histogram count on the data.
  • the data analytic task to obtain statistical information may be outsourced to a data server having high computational power, such as a cloud server.
  • a data server having high computational power
  • data owners i.e., client devices or systems associated with or being used by the data owners
  • data processors e.g., included in data servers
  • data owners are in control of the raw data, which may potentially be sensitive, such as, a collection of fingerprints and face images.
  • a data owner may wish to obtain useful analytical result (statistical information) from the raw data the data owner controls, but the lack of computational power and/or accurate analytical model may prevent the data owner from doing so at the client end.
  • the data processors may have high computational resources and/or advanced analytical models, but do not have sufficient raw data, such as, cloud services (cloud servers). However, directly providing raw data to a data processor for analytical purpose could be prohibitive for data owners due to privacy concerns.
  • the conventional data processor is not be able to perform the desired statistical analysis on the encrypted data without first decrypting the encrypted data. In this regard, while privacy of the data may be preserved during transmission to the data processor, the need to decrypt the encrypted data at the data processor reintroduces privacy concerns, especially if the data processor is untrusted.
  • a computer-implemented method for deriving statistical information from encrypted data the encrypted data being encrypted based on a homomorphic encryption scheme, the method comprising:
  • the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the data server.
  • the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals
  • the above-mentioned generating a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
  • the summation function is subjected to a modulo operation with respect to a divisor.
  • the above-mentioned generating a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
  • the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
  • the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
  • the above-mentioned deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
  • the statistical information comprises a histogram count on the set of data elements with respect to each discrete interval in the set of discrete intervals.
  • the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
  • the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
  • the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images
  • the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
  • a system for deriving statistical information from encrypted data the encrypted data being encrypted based on a homomorphic encryption scheme, the system comprising:
  • At least one processor communicatively coupled to the memory and configured to: receive the encrypted data
  • the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the system.
  • the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals
  • the above-mentioned generate a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
  • the summation function is subjected to a modulo operation with respect to a divisor.
  • the above-mentioned generate a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
  • the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
  • the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
  • the above-mentioned deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
  • the statistical information comprises a histogram count on the set of data elements with respect to each discrete interval in the set of discrete intervals.
  • the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
  • the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
  • the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images
  • the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
  • FIG. 1 depicts a schematic flow diagram of a method for deriving statistical information from encrypted data according to various embodiments of the present invention
  • FIG. 2 depicts a schematic block diagram of a system for deriving statistical information from encrypted data according to various embodiments of the present invention
  • FIG. 3 depicts a schematic block diagram of an exemplary computer system which may be used to realize or implement the system for deriving statistical information according to various embodiments of the present invention, such as the system as depicted in FIG. 2;
  • FIG. 4 depicts a schematic drawing illustrating an example overview of a system comprising a data owner (i.e., a client device or system associated with or being used by the data owner) and a data server (a cloud server) for deriving statistical information from data according to various embodiments of the present invention
  • a data owner i.e., a client device or system associated with or being used by the data owner
  • a data server a cloud server
  • FIG. 5 depicts a schematic drawing illustrating an example overview of a system comprising a data owner (i.e., a client device or system associated with or being used by the data owner) and a data server for deriving histogram on data according to various example embodiments of the present invention
  • a data owner i.e., a client device or system associated with or being used by the data owner
  • a data server for deriving histogram on data according to various example embodiments of the present invention
  • FIG. 6A depicts an example fingerprint image for which a histogram is derived in an exemplary implementation according to various example embodiments of the present invention.
  • FIG. 6B depicts a plot of the histogram derived on the fingerprint image shown in FIG. 6A from the exemplary implementation.
  • Various embodiments of the present invention provide a method (computer- implemented method) and a system (e.g., a data server including a data processor) for deriving statistical information from encrypted data.
  • the statistical information may include, but is not limited to, a histogram count on the encrypted data.
  • data owners i.e., client/user devices or systems associated with or being used by the data owners
  • data owners may lack sufficient computational power and/or accurate analytical tools (e.g., analytical software) at the client/user end to obtain the desired analytical result (statistical information) from the raw data.
  • analytical tools e.g., analytical software
  • directly providing the raw data to a data server including a data processor
  • analytical purpose may be prohibitive for the data owners due to privacy concerns.
  • the raw data is encrypted by the client device before being sent to the data server for performing the desired statistical analysis on the encrypted data
  • the data server may not be able to perform the desired statistical analysis on the encrypted data without first decrypting the encrypted data.
  • the need to decrypt the encrypted data at the data server reintroduces privacy concerns, especially if the data server is untrusted.
  • the raw data at the client device is encrypted based on a homomorphic encryption scheme, and statistical computational techniques are developed which are compliant with the homomorphic encryption scheme for deriving the statistical information on the encrypted data without the need to decrypt the encrypted data at the data server at all.
  • the data owner may advantageously provide any desired data (via an associated client device or system) to a data server for processing (e.g., statistical analysis/computation) to derive the desired statistical information on the data and then receive the statistical information from the data server, while preserving the privacy of the data throughout the whole process (e.g., since the data is encrypted and no decryption of the encrypted data is required at the data server to derive the statistical information).
  • processing e.g., statistical analysis/computation
  • various embodiments of the present invention is able to assume a minimum system trust model.
  • computing its histogram may be one of the most fundamental statistical tools to acquire some useful information about the data.
  • various example embodiments of the present invention provide techniques for deriving/computing histograms encrypted under modern leveled homomorphic encryption schemes that support limited additions and multiplications. For example, for a set of data encrypted under some homomorphic encryption schemes, the techniques may output the histogram of the encrypted data encrypted under the same scheme and key. As such, the encrypted data and its structure stay protected throughout the whole process of obtaining the histogram count on the encrypted data.
  • the techniques do not require any interaction with the client device, thus advantageously minimizes communication cost and avoids any requirement whereby the client device needs to be online (for communication with the data server) until the end of the whole process of obtaining the histogram count on the data.
  • F1G. 1 depicts a schematic flow diagram of a method 100 (computer- implemented method) for deriving statistical information from encrypted data according to various embodiments of the present invention, the encrypted data being encrypted based on a homomorphic encryption scheme.
  • the method 100 comprises a step 102 of receiving, at a data server, the encrypted data; a step 104 of obtaining a set of data elements (or data points) of the encrypted data based on which the statistical information is to be derived; a step 106 of obtaining a set of discrete intervals associated with the set of data elements; a step 108 of generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements; a step 110 of generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; a step 112 of deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the
  • the data server may be any computer device or system capable of receiving data, processing the data received or the data stored therein based on one or more functional modules (e.g., software programs) as described herein according to various embodiments, and transmit the processed data.
  • the data server may interchangeably be referred to as various other names, such as but not limited to, a computer server, a storage server, a cloud server, and so on, each of which may also be simply referred to as a“server” herein.
  • the data server is configured to be able to communicate with one or more data owners (i.e., one or more client/user devices or systems associated with or being used by the one or more data owners) via any wired or wireless communication protocol known in the art, such as but not limited to, cellular network (e.g., 3G, 4G, or LTE), Wi-Fi network, Bluetooth, and so on, and thus need not be described in detail herein ft will also be appreciated by a person skilled in the art that a server may be realized by or implemented as one unit or a plurality of units (e.g., located at one location or at different locations), as long as the one unit or the plurality of units are configured to process the data received or data stored therein based on the one or more functional modules as described herein according to various embodiments.
  • cellular network e.g., 3G, 4G, or LTE
  • Wi-Fi network e.g., Wi-Fi network
  • Bluetooth e.g., Bluetooth
  • a server may be realized by or
  • the encrypted data may be received at a data server from a data owner (i.e., a client device or system associated with or being used by the data owner to send the encrypted data).
  • a data owner i.e., a client device or system associated with or being used by the data owner to send the encrypted data.
  • the data owner may be outsourcing the task of deriving statistical information on the data to the data server.
  • the raw data is encrypted based on a homomorphic encryption scheme, such as encrypted at the client device or system associated with the data owner.
  • the encrypted data may be received from the data owner via any wired or wireless communication protocol known in the art.
  • the set of data elements of the encrypted data may be a set of data elements desired by the data owner to be statistically analyzed at the data server.
  • the set of data elements may correspond to one or more of the plurality of images desired to be statistically analyzed to obtain the desired statistical information thereon.
  • the set of data elements may be a set of pixel values associated with the set of pixels of the image, respectively.
  • the set of discrete intervals may correspond to a set of discrete intervals with respect to which the statistical information is to be derived.
  • the set of discrete intervals may correspond to a set of discrete intervals (bins) of pixel values (pixel intensity value).
  • the desired statistical information is a histogram count
  • the statistical analysis performed by the data server may thus be a histogram count on the set of pixel values associated with the set of pixels of the image with respect to each discrete interval of pixel value in a set of discrete pixel values associated with the image.
  • the histogram of the image (histogram of the pixel values) indicating the number of pixels in the image at each different/discrete pixel value associated with the image.
  • the set of discrete intervals may be set or may be determined as appropriate. For example, if the set of data corresponds to an image as described above, the set of discrete intervals may correspond to the range of discrete pixel values associated with the image.
  • each pixel may have a pixel value ranging from 0 to 255, and thus the set of discrete intervals of pixel values associated with the image may range from 0 to 255, whereby each discrete interval corresponds to (or is associated with or is represented by) a respective discrete pixel value.
  • the vector in relation to step 108, for example, may be a one-dimensional array comprising a plurality of entries (vector elements), whereby each entry is determined based on the set of data elements.
  • the matrix in relation to step 110, for example, may be a two-dimensional array comprising a plurality of rows of entries (matrix elements), whereby each entry is determined based on the set of discrete intervals.
  • the statistical information on the set of data elements may be derived/computed based on the above-mentioned matrix and vector generated, such as only based on the above-mentioned matrix and vector generated.
  • the statistical information may be derived at the data server and sent to the client device associated with the data owner which requested the statistical analysis to be performed on the set of data elements.
  • the statistical information may be transmitted to the client device associated with the data owner via any wired or wireless communication protocol known in the art.
  • the above-described method 100 is advantageously able to process the encrypted data to derive statistical information thereon without the need to decrypt the encrypted data at the data server at all.
  • the above-mentioned vector comprising a plurality of entries (each entry being determined based on the set of data elements) and generating the above-mentioned matrix comprising a plurality of rows of entries (each row being determined based on the set of discrete intervals)
  • statistical information on encrypted data with respect to each discrete interval in the set of discrete intervals can then be derived/computed based on such matrix and vector generated without having to decrypt the encrypted data at the data server at all.
  • the above-described method 100 advantageously to address or at least mitigate, for example, the data privacy problem explained in the background, while still allowing statistical information on the data to be derived/computed at a data server.
  • the above-described method 100 advantageously enables a data owner to provide (via an associated client device or system) a desired data to a data server for processing (e.g., statistical analysis/computation) to derive the desired statistical information thereon and then receive the statistical information on the data from the data server, while preserving the privacy of the data.
  • the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the data server (at least during the process of deriving/computing the statistical information at the data server).
  • the number of entries in the vector may correspond to the number of discrete intervals in the set of discrete intervals.
  • the number of discrete intervals may be 256 (based on a range of 0 to 255 discrete intervals of pixel values) and thus, the number of entries in the vector may also be the same, that is, 256.
  • the above-mentioned step 108 of generating a vector comprises determining each entry in the vector based on a summation function using the set of data elements.
  • the value of each data element is raised to an n-th power (i.e., raised to the power of n, where n is a particular exponent value) that corresponds to the position of the entry in the vector being determined.
  • an entry in the vector may be determined to be the output of the summation function which sums up all the values of the data elements in the set of data elements. Furthermore, depending on the position of the entry in the vector, the value of each data element may be raised to an n-th power before the summation function is applied to the set of data elements.
  • the value of each data element may be raised to the power of 0; for determining the second entry in the vector immediately adjacent the first entry, the value of each data element may be raised to the power of 1 ; for determining the third entry in the vector immediately adjacent the second entry, the value of each data element may be raised to the power of 2, and so on.
  • the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector before the summation function is applied to the set of data elements to produce an output which is the determined/computed value for the entry.
  • the summation function is subjected to a modulo operation with respect to a divisor, that is, the modulo operation x mod p, where x is the output of the summation function for an entry (the determined value of the entry) and p is the divisor.
  • the divisor may be determined based on the number (iV) of data elements in the set of data elements, such as the smallest prime number larger than N.
  • the above-mentioned step 108 of generating a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with (or corresponding to or being represented by) a discrete value.
  • the first entry in a row e.g., the leftmost entry in the row
  • the second entry in the row immediately adjacent the first entry may be determined based on the corresponding second discrete interval in the set of discrete intervals
  • the third entry in the row immediately adjacent the second entry may be determined based on the corresponding third discrete interval in the set of discrete intervals, and so on.
  • each discrete interval may be associated with a respective discrete pixel value amongst a range of discrete pixel values associated with the image.
  • each discrete interval may be associated with only one discrete pixel value, respectively, or in other words, consist of only one data point.
  • the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals.
  • the number of discrete intervals may be 256 (based on a range of 0 to 255 discrete intervals of pixel values) and thus, the number of rows and the number of columns may each also be the same, that is, 256 rows x 256 columns.
  • each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
  • each entry of the row may be determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power.
  • the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 0; for determining the entries of the second row immediately adjacent/below the first row, the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 1, for determining the entries of the third row immediately adjacent/below the second row, the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 2, and so on.
  • the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor, that is, y mod p, where y is the raised value of the discrete interval and p is the divisor.
  • y is the raised value of the discrete interval
  • p is the divisor.
  • the divisor may be determined based on the number (N) of data elements in the set of data elements, such as the smallest prime number larger than N.
  • the above-mentioned step 112 of deriving the statistical information comprises computing an inverse of the matrix, and then multiplying the inverse matrix with the vector to produce the statistical information.
  • the statistical information comprises a histogram count (or may simply be referred to as a“histogram” herein) on the set of data elements with respect to each discrete interval in the set of discrete intervals.
  • the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
  • the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
  • the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and
  • the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
  • the set of discrete intervals of pixel values may correspond to the range of discrete pixel values associated with the image.
  • each pixel may have a pixel value ranging from 0 to 255, and thus the set of discrete intervals of pixel values associated with the image may range from 0 to 255, whereby each discrete interval corresponds to (or is associated with or is represented by) a respective discrete pixel value.
  • FIG. 2 depicts a schematic block diagram of a system 200 for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, according to various embodiments of the present invention, such as corresponding to the method 100 for deriving statistical information from encrypted data as described hereinbefore according to various embodiments of the present invention.
  • the system 200 may correspond to, or may be embodied as, a data server, such as described hereinbefore.
  • the system 200 comprises a memory 202, and at least one processor 204 communicatively coupled to the memory 202 and configured to: receive the encrypted data; obtain a set of data elements of the encrypted data based on which the statistical information is to be derived; obtain a set of discrete intervals associated with the set of data elements; generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements; generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and send the statistical information.
  • the at least one processor 204 may be configured to perform the required functions or operations through set(s) of instructions (e.g., software modules) executable by the at least one processor 204 to perform the required functions or operations. Accordingly, as shown in FIG.
  • the system 200 may further comprise a receiving module or circuit 206 configured to receive the encrypted data; a data element obtaining module or circuit 208 configured to obtain a set of data elements of the encrypted data based on which the statistical information is to be derived; a discrete interval obtaining module or circuit 210 configured to obtain a set of discrete intervals associated with the set of data elements; a vector generating module or circuit 212 configured to generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements; a matrix generating module or circuit 214 configured to generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; a statistical information deriving module or circuit 216 configured to derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and a sending (transmission) module or circuit 218 configured to send/transmit the statistical information.
  • a receiving module or circuit 206 configured
  • modules are not necessarily separate modules, and one or more modules may be realized by or implemented as one functional module (e.g., a circuit or a software program) as desired or as appropriate without deviating from the scope of the present invention.
  • the receiving module 206 and the sending module 218 may be realized by a transceiver, and the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and the statistical information deriving module 216 may be realized (e.g., compiled together) as one executable software program (e.g., software application or simply referred to as an “app”), which for example may be stored in the memory 202 and executable by the at least one processor 204 to perform the functions/operations as described herein according to various embodiments.
  • an executable software program e.g., software application or simply referred to as an “app”
  • the system 200 corresponds to the method 100 as described hereinbefore with reference to FIG. 1, therefore, various functions or operations configured to be performed by the least one processor 204 may correspond to various steps of the method 100 described hereinbefore according to various embodiments, and thus need not be repeated with respect to the system 200 for clarity and conciseness.
  • various embodiments described herein in context of the methods are analogously valid for the respective systems or devices, and vice versa.
  • the memory 202 may have stored therein the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and/or the statistical information deriving module 216, which respectively correspond to various steps of the method 100 as described hereinbefore, which are executable by the at least one processor 204 to perform the corresponding functions/operations as described herein.
  • a computing system, a controller, a microcontroller or any other system providing a processing capability may be provided according to various embodiments in the present disclosure.
  • Such a system may be taken to include one or more processors and one or more computer-readable storage mediums.
  • the system 200 described hereinbefore may include a processor (or controller) 204 and a computer-readable storage medium (or memory) 202 which are for example used in various processing carried out therein as described herein.
  • a memory or computer-readable storage medium used in various embodiments may be a volatile memory, for example a DRAM (Dynamic Random Access Memory) or a non-volatile memory, for example a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), or a flash memory, e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory).
  • DRAM Dynamic Random Access Memory
  • PROM Programmable Read Only Memory
  • EPROM Erasable PROM
  • EEPROM Electrical Erasable PROM
  • flash memory e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory).
  • a“circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof.
  • a“circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g., a microprocessor (e.g., a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor).
  • A“circuit” may also be a processor executing software, e.g., any kind of computer program, e.g., a computer program using a virtual machine code, e.g., Java.
  • a“module” may be a portion of a system according to various embodiments in the present invention and may encompass a “circuit” as above, or may be understood to be any kind of a logic-implementing entity therefrom.
  • the present specification also at least implicitly discloses a computer program or software/functional module, in that it would be apparent to the person skilled in the art that the individual steps of the methods described herein may be put into effect by computer code.
  • the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
  • the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
  • modules described herein may be software module(s) realized by computer program(s) or set(s) of instructions executable by a computer processor to perform the required functions, or may be hardware module(s) being functional hardware unit(s) designed to perform the required functions. It will also be appreciated that a combination of hardware and software modules may be implemented.
  • a computer program/module or method described herein may be performed in parallel rather than sequentially.
  • Such a computer program may be stored on any computer readable medium.
  • the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer.
  • the computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the methods described herein.
  • a computer program product embodied in one or more computer-readable storage mediums (non-transitory computer- readable storage medium), comprising instructions (e.g., the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and the statistical information deriving module 216) executable by one or more computer processors to perform a method 100 for deriving statistical information from encrypted data as described hereinbefore with reference to FIG. 1.
  • various computer programs or modules described herein may be stored in a computer program product receivable by a system (e.g., a computer system or an electronic device) therein, such as the system 200 as shown in FIG. 2, for execution by at least one processor 204 of the system 200 to perform the required or desired functions.
  • the software or functional modules described herein may also be implemented as hardware modules. More particularly, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the software or functional module(s) described herein can also be implemented as a combination of hardware and software modules.
  • ASIC Application Specific Integrated Circuit
  • the system 200 may be realized by any computer system (e.g., portable or desktop computer system), such as a computer system 300 as schematically shown in FIG. 3 as an example only and without limitation.
  • Various methods/steps or functional modules e.g., the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and/or the statistical information deriving module 216) may be implemented as software, such as a computer program being executed within the computer system 300, and instructing the computer system 300 (in particular, one or more processors therein) to conduct the methods/functions of various embodiments described herein.
  • the computer system 300 may comprise a computer module 302, input modules, such as a keyboard 304 and a mouse 306, and a plurality of output devices such as a display 308, and a printer 310.
  • the computer module 302 may be connected to a computer network 312 via a suitable transceiver device 314, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
  • the computer module 302 in the example may include a processor 318 for executing various instructions, a Random Access Memory (RAM) 320 and a Read Only Memory (ROM) 322.
  • the computer module 302 may also include a number of Input/Output (I/O) interfaces, for example I/O interface 324 to the display 308, and I/O interface 326 to the keyboard 304.
  • I/O Input/Output
  • the components of the computer module 302 typically communicate via an interconnected bus 328 and in a manner known to the person skilled in the relevant art.
  • FIG. 4 depicts a schematic drawing illustrating an example overview of a system 400 comprising a client device or system 402 associated with or being used by a data owner and a data server (in this example, a cloud server) 404 according to various embodiments of the present invention illustrating an example communication/interaction between the client device 402 and the cloud server 404 for the client device 402 to outsource the statistical analysis/computation on data of the data owner to a cloud server 404 to obtain the desired statistical information on the data.
  • FIG. 4 also shows an example flow diagram of the associated method for deriving the statistical information on the data.
  • the client device 402 first encrypts the raw data (plaintext) 406 of the data owner according to a homomorphic encryption scheme 408 using a public key to protect the raw data before sending the encrypted data 410 to the cloud server 404.
  • the cloud server 404 then performs the requested statistical analysis/computation 412 on the encrypted data 410 (e.g., histogram on the encrypted data) and send the statistical information (e.g., the histogram) 414 derived in an encrypted form to the client device 402 without the ability to learn anything about the raw data 406.
  • the client device 402 may then decrypt 416 the encrypted statistical information 414 to obtain the requested statistical information (plaintext) 418.
  • the method for deriving statistical information from the data advantageously does not require any interaction between the client device 402 and the cloud server 404 during the process of statistical analysis/computation. Furthermore, the process of statistical analysis/computation at the cloud server 404 is performed directly on the encrypted data 410 without the need to decrypt the encrypted data 410 at the cloud server 404 at all, thus preserving privacy of the data.
  • the present invention is not limited to homomorphic histogram computation, and various other types of statistical analysis/computations are also within the scope of the present invention, as long as the statistical analysis/computation can be performed on the encrypted data to obtain the desired statistical information based on the method described according to various embodiments described herein without the need to decrypt the encrypted data at the data server, such as but not limited to, probability distribution and cumulative frequency.
  • computing its histogram may be one of the most fundamental statistical tools to acquire some useful information about the data.
  • techniques are provided to compute histograms encrypted under modern leveled homomorphic encryption schemes that support limited additions and multiplications. Specifically, for a set of data encrypted under some homomorphic encryption schemes, the techniques output the histogram of the data encrypted under the same scheme and key. As such, the data and its structure remain protected throughout the whole process of obtaining the histogram on the data.
  • the techniques do not require any interaction with the client device of the data owner, thus advantageously minimizes communication cost and removes the requirement where the data owner needs to be online (e.g., for communication with the data server) until the end of the whole process of obtaining the histogram on the data.
  • a system comprising client devices or systems associated with or being used by the data owners and data processors (data servers).
  • Data owners are in control of raw data, which could potentially be sensitive, for example the collection of fingerprints and face images.
  • a data owner may wish to obtain useful analytical result from the data he/she controls, but the lack of computational power and/or accurate analytical model at the client side may prevent the data owner from doing so.
  • the data processors e.g., cloud services
  • various example embodiments of the present invention apply a homomorphic computation solution that is helpful for this privacy preserving purpose.
  • a data owner only needs to encrypt the raw data with the public key and send the encrypted data to a data processor for homomorphic computing via a client device associated with the data owner.
  • the returned result can be decrypted by the client device with the secret key associated with the client device to get the desired analytical result.
  • a homomorphic histogram computation technique is provided that can be realized through only the basic operations of addition and multiplication.
  • the operations involved in determining an inverse of a matrix may be implemented with or segregated into basic addition and multiplication operations.
  • Histogram finds applications in many different tasks ranging from data visualization to feature representation.
  • histogram is a popular visual tool for presenting variable data through specific kind of vertical bar charts.
  • histogram graph may be a preferred form of presentation over plain text to present various data on paper and slides.
  • histogram may be an important feature to represent data in the form of vectors.
  • the term “frequency vector” may be used to represent text by counting how often each term (e.g., word) occurs in a document.
  • Histograms are also widely used in digital image processing and photography, for example, as a representation of the distribution of pixel values, including but not limited to, the color histogram, the co-occurrence matrix and the histogram of oriented gradients.
  • computing histograms may be a straightforward comparison problem, i.e., to compare each data point to the edges of all the intervals.
  • various embodiments of the present invention identified that performing comparisons on data encrypted with homomorphic encryption schemes requires interaction among different the client and server sides, which requires the client device of the data owner and the data server(s) to be online for instant communication therebetween.
  • various embodiments of the present invention provide techniques that are advantageously non-interactive in the sense that the client device of the data owner simply needs to encrypt the data and send to the encrypted data to the data processor(s) (data server(s)), and does not require any further interaction therebetween until the processed result is derived and sent to the client device of data owner. This removes the need for the client device to be online until the end of the entire process of deriving the statistical information on the data at the data processor.
  • FIG. 5 depicts a schematic drawing illustrating an example overview of a system 500 comprising a client device or system 502 associated with or being used by a data owner and a data server (in this example, a cloud server) 504 according to various example embodiments of the present invention illustrating an example communication/interaction between the client device 502 and the cloud server 504 for the data owner to outsource the histogram computation on his/her data to a cloud server 504 to obtain the requested histogram on the data.
  • FIG. 5 also shows an example flow diagram of an example method for deriving the histogram count on the raw data (plaintext) 506 according to the example embodiments.
  • the system 500 may be the same as the system 400 shown in FIG.
  • the system 500 is configured for the particular case of the statistical analysis/computation on the data performed at the cloud server 504 being histogram computation.
  • the data owner may store his/her image data via an associated client device on a cloud server 504.
  • the data owner may encrypt the raw data 506 via the associated client device according to a homomorphic encryption scheme 508 using a public key before sending the encrypted data 510 to the cloud server 504.
  • the data owner may wish to obtain the histogram of his/her data on the cloud server 504 periodically. In this regard, retrieving the encrypted data 510 from the cloud server 504, decrypting them and computing the histogram on the data by the client device 502 would consume too much resources.
  • the example method enables the cloud server 504 to compute the histogram on the encrypted data 510 and send the histogram derived/computed in an encrypted form 514 without the ability to learn anything about the raw data 506.
  • the client device 502 may then decrypt 516 the encrypted histogram result 514 to obtain the desired histogram (plaintext) 518.
  • the client device 502 of the data owner is not required to interact with the cloud server 504 during the process of histogram computing 512 on the encrypted data 510.
  • homomorphic encryption enables one to perform functions on encrypted texts in a way that preserves the function, that is, outputs an encryption of the function on the corresponding plaintexts.
  • the standard textbook RSA scheme is an example of an encryption scheme homomorphic under multiplication.
  • a couple of other partial homomorphic schemes follow which support homomorphism under either addition or multiplication.
  • a modem (existing) FHE scheme may involve or comprise four main steps or components/modules as follows:
  • this step or module may be configured to take as an input a security parameter and outputs a public key, a secret key and an evaluation key;
  • this step or module may be configured to take as inputs the public key and a message and outputs a ciphertext; • Decryption : this step or module may be configured to take as inputs the secret key and a ciphertext, and outputs the corresponding plaintext; and
  • this step or module may be configured to take as inputs the evaluation key, a function /, and a set of ciphertexts, and outputs the result of the ciphertexts operated under /.
  • an FHE scheme For example, to construct an FHE scheme, one may begin with a somewhat homomorphic encryption scheme (SHE) that supports a limited number of additions and multiplications.
  • SHE homomorphic encryption scheme
  • the SHE scheme typically introduces some noise to the message in the encryption process. This noise tends to increase when evaluating additions and multiplications of the ciphertexts, with the latter increasing more rapidly.
  • Different techniques including bootstrapping and modulus switching are employed to manage the growth of the noise level, thereby turning the SHE scheme into an FHE scheme.
  • the functions to be employed on the data are known in advance. This motivates the notion of a leveled FHE scheme, namely, FHE schemes that support functions with a bounded multiplicative depth.
  • the first leveled FHE scheme was proposed and other improved variants were also proposed.
  • the various FHE schemes in use to date operate on two main plaintext spaces, binary spaces, and polynomial rings. As such, the messages may have to be encoded into one of these spaces to exploit the associated schemes.
  • the Chinese Remainder Theorem By applying the Chinese Remainder Theorem, one can perform batch/SIMD processing on a group of data, resulting in greater efficiencies.
  • the output is an encrypted vector of histogram counts.
  • the histogram count of the element x In order to extract the histogram count of the element x, one performs a single multiplication of the encrypted result by the vector (0, 0, ..., 1, 0, ..., 0), with 1 in the entry indexed by x.
  • a disadvantage of the above conventional approach identified according to various example embodiments is that the length of the ciphertext is expanded by r, and hence, may not be practical when r is very large or when R is infinite.
  • Another limitation is that it uses a special encoding that destroys the structure of the original value. Indeed, the encoding is designed purely to perform histogram counts.
  • the conventional approach may not be applicable when histogram counting occurs as an intermediate step in certain algorithms, and the intermediate inputs are real values. As such, a method for encoding the intermediate encrypted values into such a form may be needed.
  • the conventional technique to check if a 3 b where a and b are encrypted by some HE scheme E, the conventional technique computes ⁇ C a ) E(b ⁇ E(_w f or w > 0 anc j bounded by some suitable bound. Then, a — & if and only if ⁇ 0 is among the outputs. It follows that one needs to decrypt the outputs to obtain the result of the comparison and hence, it requires interaction among the users at the client and server sides.
  • an equality or comparison test to compare a given search query with the encrypted data.
  • Efficient homomorphic equality and comparison tests have been considered in earlier works. For example, given an efficient equality test or a comparison test, one can compute the histogram counts by testing each data point against the interval boundaries (where equality test is used when the interval comprises a small number of points), and summing the corresponding outputs.
  • p may be required to be at least as large as N (the number of data points), thereby resulting in a multiplicative depth of l°g ⁇ + I°g which may not be practical when N is large.
  • Equation 2 xes (Equation 2) for some predetermined s .
  • E denotes an FHE encryption scheme.
  • various example embodiments of the present invention provide histogram counting techniques that can be realized with additions and multiplications, and are therefore compliant with existing homomorphic encryption schemes.
  • various example embodiments of the present invention tackle the problem of privacy preserving histogram counting using homomorphic encryption techniques. Accordingly, instead of a direct comparison of data, various example embodiments of the present invention construct appropriate systems of equations where the solutions or outputs are the desired histogram counts.
  • various example embodiments of the present invention while allowing the data owners to encrypt their sensitive data (e.g., fingerprint images), are still able to accurately compute the histogram count based on the encrypted values (encrypted data).
  • the histogram count may be an important feature for many applications, such as various machine learning algorithms.
  • the data encrypted under the technique according to various example embodiments still preserves its structure. Accordingly, the encrypted data is not restricted to histogram counting only by the technique, but can also be used for other purposes (e.g., determining the probability distribution or cumulative frequency in relation to the encrypted data).
  • the technique according to various example embodiments is non-interactive (i.e., during the process of deriving/computing the histogram count on the encrypted data).
  • the data owner simply needs to (via an associated client device or system) encrypt the data and send the encrypted data to the data server (e.g. processing center), and does not require any further interaction therebetween until the processed result (histogram count) is derived and sent to the client device of the data owner. This removes the need for the client device to be online until the end of the entire process of deriving the histogram count on the data at the data server.
  • an example method for computing histogram counts without requiring any direct comparison or sorting of data will now be described by way of an example only and without limitation.
  • the example method only involve modulo addition and multiplication of data points (data elements).
  • R is a finite subset of real numbers. Observe that by multiplying these numbers with suitable scalars, these real numbers may be approximated by some integers. Hence, without any loss of generality, it suffices to consider the case where R is a discrete subset of integers.
  • the set R (e.g., corresponding to the “set of discrete intervals” as described herein according to various embodiments) is finite
  • c i is the histogram count.
  • [0098] Define (e.g., corresponding to the“matrix comprising a plurality of rows of entries” as described herein according to various embodiments) to be matrix which is a Vandermonde matrix for distinct 3 V s. This matrix has full rank so that 1 exists.
  • ci (e.g., corresponding to the“statistical information” or“histogram count” as described herein according to various embodiments) may be determined as follows: (Equation 7)
  • various example embodiments of the present invention advantageously deduced that the histogram counts (that is, determine the values of can be computed.
  • various example embodiments of the present invention advantageously provide the technical solution for enabling encrypted histogram counting (i.e., histogram counting on encrypted data without having to decrypt the encrypted data).
  • the example method involves only addition and multiplication operations. For example, since the example method involves taking powers of up to k - 1, the multiplicative depth is about
  • Example 1 For illustration purpose only and without limitation, a first example (Example 1) will now be described to show a method for deriving histogram count on encrypted data according an example embodiment of the present invention.
  • the above method may involve handling numbers of the form x‘ for i up to k - 1 which may be extremely large.
  • the pixel values may be from 0 to 255.
  • the numbers being handled may be as large as SS 255 or approximately 2048-bit long. Dealing with such large numbers may either result in data overflow or require specialized procedures to handle them.
  • various example embodiments provide a method for deriving statistical information (e.g., histogram count) without having to deal with such large numbers.
  • the histogram counts are numbers at most N (the number of data points (data elements)).
  • the histogram computation is configured to involve only numbers in this range, and this is achieved through the modulo operations. More specifically, let p denote a prime number that is slightly larger than N (e.g., the smallest prime number larger than N). In this regard, instead of working with integers alone, all the operations are subjected to a modulo operation, that is, modulo p.
  • the above-mentioned system of Equations 4 may become:
  • various example embodiments provide a method or algorithm for computing the histogram counts on encrypted data that requires only addition and multiplication of numbers of size around ⁇ bits.
  • an exemplary method or algorithm may be provided as follows:
  • Algorithm 1 to compute histograms without actual counting
  • Example 2 For illustration purpose only and without limitation, a second example (Example 2) will now be described to show a method for deriving histogram count based on the above-mentioned algorithm 1 according an example embodiment of the present invention.
  • the inverse matrix M 1 can be computed to be:
  • the histogram count can be derived as
  • a composite number is used such that gcd(i
  • E supports functions with multiplicative depth of log k, • Let p be a prime number larger than N.
  • the plaintext space of E is either 3 ⁇ 4 or the polynomial ring 3 ⁇ 4W//W for some polynomial f ( x ); and
  • Algorithm 2 Algorithm to compute histograms for encrypted data points
  • the histogram count c i on the encrypted data set E(S) can be obtained by decrypting
  • Example 3 For illustration purpose only and without limitation, a third example (Example 3) will now be described to show a method for deriving histogram count on encrypted data based on the above-mentioned algorithm 2 according an example embodiment of the present invention.
  • mod p ff(18 mod p) j s computed.
  • each of mod p ) can b e determined by computing &(x mod p) 2
  • the c can be computed in one SIMD operation as long as there are more than k slots. In this case, one fills up k slots with the same data point and computes the 3 ⁇ 4 and the c concurrently.
  • the BGV scheme as implemented in the HElib library and the FV scheme implemented in SEAL V2.0 are examples of FEE schemes that are suitable according to various example embodiments of the present invention (e.g., they satisfy various conditions proposed).
  • the plaintext space may be considered to be the polynomial ring and the ciphertext space may be considered to be the polynomial ring W//C x ) f or some cyclotomic polynomial /C x ) and appropriately chosen with P I*?.
  • the method for deriving histogram on encrypted data was applied on real fingerprint images according to an exemplary implementation to demonstrate the effectiveness of the method in deriving the histogram on the encrypted data without compromising the privacy of the encrypted data.
  • a 4- bit fingerprint image of size 100 x 72 as shown in FIG. 6A was processed, i.e., 7200 pixel values with value range [0, 2 4 - 1], i.e., [0, 15].
  • the expected result of the exemplary implementation is the 16-bin histogram counts of the image which represents the distribution of the pixel values.
  • the exemplary implementation uses the SEAL library to implement FHE functions such as addition and multiplication on the plaintext and ciphertext.
  • the exemplary implementation may be segregated into five high-level steps or stages, namely encode, encrypt, count, decrypt and decode as follows:
  • a client device associated with the data owner first transforms (encode) the raw data into special form of mathematical expressions, namely, polynomials with system-defined properties. This is for the purpose of successful and secure calculation at a later stage.
  • the Chinese Remainder Theorem may be used to perform batch processing on a group of data, i.e., to encode thousands of pixels into a single polynomial.
  • the client device further encrypts the polynomial into a randomized format, to prevent non-authorized personnel to learn the underlying information. For example, this may be achieved through homomorphic encryption on the encoded polynomial.
  • the encrypted data as well as the public key and evaluation key are sent to the data processor (data server).
  • the data processor runs a modulo-based Gaussian elimination algorithm (as described hereinbefore according to various example embodiments of the present invention) to compute the inverse matrix M a ; solve the equations and obtain the encrypted histogram count result with respect to the 16 bins.
  • the data processor has no knowledge of the histogram counts as they are encrypted. Accordingly, this achieves both data privacy and process privacy.
  • Decrypt with the encrypted histogram counts transmitted to the client device of the data owner the client device is configured to use the secret key to decrypt them, and obtains the count in mathematical polynomial form.
  • FIG. 6B shows a plot of the histogram (histogram counting plot) derived on the fingerprint image shown in FIG. 6A from the above exemplary implementation.
  • various embodiments of the present invention advantageously provide a method for deriving statistical information from encrypted data.
  • a method for computing histograms on encrypted data is provided using only modulo addition and multiplication, which is compliant with existing homomorphic encryption schemes.
  • example methods or algorithms for homomorphic histogram computation on encrypted data have been described herein.
  • various example embodiments of the present invention avoid the comparison-based histogram count technique as performed conventionally (e.g., comparing encrypted value with histogram interval boundaries) which leak data range, thus protecting the data privacy against range leakage.
  • exemplary implementation described herein demonstrated that the method according to various embodiments achieves privacy preserving for the data owner, for example, since the secret key is kept to the client device of the data owner, it is difficult for other non-authorized parties including the data processor to learn the underlying information from the encrypted data.
  • the final histogram count results derived are also actual or exact values, instead of being estimated values.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

There is provided a method for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme. The method includes receiving, at a data server, the encrypted data; obtaining a set of data elements of the encrypted data based on which the statistical information is to be derived; obtaining a set of discrete intervals associated with the set of data elements, each discrete interval corresponding to a discrete data element; generating a vector including a plurality of entries, each entry being determined based on the set of data elements; generating a matrix including a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and sending, from the data server, the statistical information.

Description

l
METHOD AND SYSTEM FOR DERIVING STATISTICAL INFORMATION
FROM ENCRYPTED DATA
TECHNICAL FIELD
[0001] The present invention generally relates to a method and a system for deriving statistical information from encrypted data, and more particularly, for deriving statistical information from encrypted data that has been encrypted without having to decrypt the encrypted data.
BACKGROUND
[0002] There is a growing need to obtain useful information (e.g., statistical information) from a set of data, such as a large set or a mass amount of data. For example, statistical information may include, but is not limited to, a histogram count on the data.
[0003] For example, to obtain statistical information on data (e.g., document file, image file, audio file, and so on), conventional approaches may directly apply various statistical computational tools or techniques on the raw data which may contain private information, without considering data privacy aspects.
[0004] For example, for a large amount of data, the data analytic task to obtain statistical information may be outsourced to a data server having high computational power, such as a cloud server. As an example, in a system comprising data owners (i.e., client devices or systems associated with or being used by the data owners) and data processors (e.g., included in data servers), data owners are in control of the raw data, which may potentially be sensitive, such as, a collection of fingerprints and face images. A data owner may wish to obtain useful analytical result (statistical information) from the raw data the data owner controls, but the lack of computational power and/or accurate analytical model may prevent the data owner from doing so at the client end. The data processors, on the other hand, may have high computational resources and/or advanced analytical models, but do not have sufficient raw data, such as, cloud services (cloud servers). However, directly providing raw data to a data processor for analytical purpose could be prohibitive for data owners due to privacy concerns. [0005] On the other hand, if the raw data is encrypted by the client device before being sent to the data processor for performing the desired statistical analysis on the encrypted data, the conventional data processor is not be able to perform the desired statistical analysis on the encrypted data without first decrypting the encrypted data. In this regard, while privacy of the data may be preserved during transmission to the data processor, the need to decrypt the encrypted data at the data processor reintroduces privacy concerns, especially if the data processor is untrusted.
[0006] A need therefore exists to provide a method and a system for deriving statistical information from encrypted data that seek to overcome, or at least ameliorate, one or more of the deficiencies in conventional methods and systems, such as but not limited to, enabling statistical information on data to be derived/computed at a data server without compromising the privacy of the data. It is against this background that the present invention has been developed.
SUMMARY
[0007] According to a first aspect of the present invention, there is provided a computer-implemented method for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the method comprising:
receiving, at a data server, the encrypted data;
obtaining a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtaining a set of discrete intervals associated with the set of data elements;
generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals;
deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and sending, from the data server, the statistical information. [0008] In various embodiments, the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the data server.
[0009] In various embodiments, the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals, and the above-mentioned generating a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
[0010] In various embodiments, for determining each entry in the vector, the summation function is subjected to a modulo operation with respect to a divisor.
[0011] In various embodiments, the above-mentioned generating a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
[0012] In various embodiments, the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
[0013] In various embodiments, for determining each entry of each row, the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
[0014] In various embodiments, the above-mentioned deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
[0015] In various embodiments, the statistical information comprises a histogram count on the set of data elements with respect to each discrete interval in the set of discrete intervals.
[0016] In various embodiments:
the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values; the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and
the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
[0017] According to a second aspect of the present invention, there is provided a system for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the system comprising:
a memory; and
at least one processor communicatively coupled to the memory and configured to: receive the encrypted data;
obtain a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtain a set of discrete intervals associated with the set of data elements; generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals;
derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and
send the statistical information.
[0018] In various embodiments, the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the system.
[0019] In various embodiments, the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals, and the above-mentioned generate a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
[0020] In various embodiments, for determining each entry in the vector, the summation function is subjected to a modulo operation with respect to a divisor.
[0021] In various embodiments, the above-mentioned generate a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
[0022] In various embodiments, the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
[0023] In various embodiments, for determining each entry of each row, the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
[0024] In various embodiments, the above-mentioned deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
[0025] In various embodiments, the statistical information comprises a histogram count on the set of data elements with respect to each discrete interval in the set of discrete intervals.
[0026] In various embodiments:
the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and
the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values. [0027] According to a third aspect of the present invention, there is provided a computer program product, embodied in one or more non-transitory computer-readable storage mediums, comprising instructions executable by at least one processor to perform a method for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the method comprising: obtaining a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtaining a set of discrete intervals associated with the set of data elements;
generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; and
deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] Embodiments of the present invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:
FIG. 1 depicts a schematic flow diagram of a method for deriving statistical information from encrypted data according to various embodiments of the present invention;
FIG. 2 depicts a schematic block diagram of a system for deriving statistical information from encrypted data according to various embodiments of the present invention;
FIG. 3 depicts a schematic block diagram of an exemplary computer system which may be used to realize or implement the system for deriving statistical information according to various embodiments of the present invention, such as the system as depicted in FIG. 2;
FIG. 4 depicts a schematic drawing illustrating an example overview of a system comprising a data owner (i.e., a client device or system associated with or being used by the data owner) and a data server (a cloud server) for deriving statistical information from data according to various embodiments of the present invention;
FIG. 5 depicts a schematic drawing illustrating an example overview of a system comprising a data owner (i.e., a client device or system associated with or being used by the data owner) and a data server for deriving histogram on data according to various example embodiments of the present invention;
FIG. 6A depicts an example fingerprint image for which a histogram is derived in an exemplary implementation according to various example embodiments of the present invention; and
FIG. 6B depicts a plot of the histogram derived on the fingerprint image shown in FIG. 6A from the exemplary implementation.
DETAILED DESCRIPTION
[0029] Various embodiments of the present invention provide a method (computer- implemented method) and a system (e.g., a data server including a data processor) for deriving statistical information from encrypted data. For example, the statistical information may include, but is not limited to, a histogram count on the encrypted data.
[0030] As mentioned in the background, data owners (i.e., client/user devices or systems associated with or being used by the data owners) in control of raw data (which may contain sensitive information) may lack sufficient computational power and/or accurate analytical tools (e.g., analytical software) at the client/user end to obtain the desired analytical result (statistical information) from the raw data. However, directly providing the raw data to a data server (including a data processor) for analytical purpose (statistical analysis) may be prohibitive for the data owners due to privacy concerns. Further still, if the raw data is encrypted by the client device before being sent to the data server for performing the desired statistical analysis on the encrypted data, the data server may not be able to perform the desired statistical analysis on the encrypted data without first decrypting the encrypted data. Therefore, while privacy of the data may be preserved during the transmission of the data to the data server, the need to decrypt the encrypted data at the data server reintroduces privacy concerns, especially if the data server is untrusted. [0031] According to various embodiments of the present invention, to address or at least mitigate the above-mentioned data privacy problem, while still allowing statistical information on data to be derived/computed at a data server (e.g., statistical analysis/computation outsourced to a data server), the raw data at the client device is encrypted based on a homomorphic encryption scheme, and statistical computational techniques are developed which are compliant with the homomorphic encryption scheme for deriving the statistical information on the encrypted data without the need to decrypt the encrypted data at the data server at all. For example, conventional techniques are not able to (does not provide any technical solution to) obtain/derive statistical information directly on encrypted data (i.e., without having to decrypt the data), but on the other hand, methods according to various embodiments of the present invention are able to (provide the technical solution to) obtain/derive statistical information directly on the encrypted data (e.g., enabling encrypted counting). As a result, the data owner may advantageously provide any desired data (via an associated client device or system) to a data server for processing (e.g., statistical analysis/computation) to derive the desired statistical information on the data and then receive the statistical information from the data server, while preserving the privacy of the data throughout the whole process (e.g., since the data is encrypted and no decryption of the encrypted data is required at the data server to derive the statistical information). Accordingly, various embodiments of the present invention is able to assume a minimum system trust model.
[0032] As a non-limiting example, given a large set of data, computing its histogram may be one of the most fundamental statistical tools to acquire some useful information about the data. In this regard, various example embodiments of the present invention provide techniques for deriving/computing histograms encrypted under modern leveled homomorphic encryption schemes that support limited additions and multiplications. For example, for a set of data encrypted under some homomorphic encryption schemes, the techniques may output the histogram of the encrypted data encrypted under the same scheme and key. As such, the encrypted data and its structure stay protected throughout the whole process of obtaining the histogram count on the encrypted data. In addition, while the data server is deriving/computing the statistical information from the encrypted data, the techniques do not require any interaction with the client device, thus advantageously minimizes communication cost and avoids any requirement whereby the client device needs to be online (for communication with the data server) until the end of the whole process of obtaining the histogram count on the data.
[0033] F1G. 1 depicts a schematic flow diagram of a method 100 (computer- implemented method) for deriving statistical information from encrypted data according to various embodiments of the present invention, the encrypted data being encrypted based on a homomorphic encryption scheme. The method 100 comprises a step 102 of receiving, at a data server, the encrypted data; a step 104 of obtaining a set of data elements (or data points) of the encrypted data based on which the statistical information is to be derived; a step 106 of obtaining a set of discrete intervals associated with the set of data elements; a step 108 of generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements; a step 110 of generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; a step 112 of deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and a step 114 of sending, from the data server, the statistical information.
[0034] It can be understood by a person skilled in the art that the data server may be any computer device or system capable of receiving data, processing the data received or the data stored therein based on one or more functional modules (e.g., software programs) as described herein according to various embodiments, and transmit the processed data. The data server may interchangeably be referred to as various other names, such as but not limited to, a computer server, a storage server, a cloud server, and so on, each of which may also be simply referred to as a“server” herein. For example, the data server is configured to be able to communicate with one or more data owners (i.e., one or more client/user devices or systems associated with or being used by the one or more data owners) via any wired or wireless communication protocol known in the art, such as but not limited to, cellular network (e.g., 3G, 4G, or LTE), Wi-Fi network, Bluetooth, and so on, and thus need not be described in detail herein ft will also be appreciated by a person skilled in the art that a server may be realized by or implemented as one unit or a plurality of units (e.g., located at one location or at different locations), as long as the one unit or the plurality of units are configured to process the data received or data stored therein based on the one or more functional modules as described herein according to various embodiments.
[0035] In various embodiments, in relation to step 102, the encrypted data may be received at a data server from a data owner (i.e., a client device or system associated with or being used by the data owner to send the encrypted data). In this regard, for example, the data owner may be outsourcing the task of deriving statistical information on the data to the data server. Furthermore, prior to sending the raw data to the data server, the raw data is encrypted based on a homomorphic encryption scheme, such as encrypted at the client device or system associated with the data owner. For example, the encrypted data may be received from the data owner via any wired or wireless communication protocol known in the art.
[0036] In various embodiments, in relation to step 104, for example, the set of data elements of the encrypted data may be a set of data elements desired by the data owner to be statistically analyzed at the data server. By way of example only and without limitation, if the encrypted data corresponds to a plurality of images, the set of data elements may correspond to one or more of the plurality of images desired to be statistically analyzed to obtain the desired statistical information thereon. As an illustrative example, in an exemplary case of the set of data elements corresponding to one image, the set of data elements may be a set of pixel values associated with the set of pixels of the image, respectively.
[0037] In various embodiments, in relation to step 106, for example, the set of discrete intervals may correspond to a set of discrete intervals with respect to which the statistical information is to be derived. For example, following on from the above example whereby the set of data elements corresponds to one image, the set of discrete intervals may correspond to a set of discrete intervals (bins) of pixel values (pixel intensity value). For example, if the desired statistical information is a histogram count, the statistical analysis performed by the data server may thus be a histogram count on the set of pixel values associated with the set of pixels of the image with respect to each discrete interval of pixel value in a set of discrete pixel values associated with the image. In other words, the histogram of the image (histogram of the pixel values) indicating the number of pixels in the image at each different/discrete pixel value associated with the image. In this regard, it can be understood by a person skilled in the art that the set of discrete intervals may be set or may be determined as appropriate. For example, if the set of data corresponds to an image as described above, the set of discrete intervals may correspond to the range of discrete pixel values associated with the image. As an example, for an 8-bit image, each pixel may have a pixel value ranging from 0 to 255, and thus the set of discrete intervals of pixel values associated with the image may range from 0 to 255, whereby each discrete interval corresponds to (or is associated with or is represented by) a respective discrete pixel value.
[0038] In various embodiments, in relation to step 108, for example, the vector may be a one-dimensional array comprising a plurality of entries (vector elements), whereby each entry is determined based on the set of data elements.
[0039] In various embodiments, in relation to step 110, for example, the matrix may be a two-dimensional array comprising a plurality of rows of entries (matrix elements), whereby each entry is determined based on the set of discrete intervals.
[0040] In various embodiments, in relation to step 112, the statistical information on the set of data elements may be derived/computed based on the above-mentioned matrix and vector generated, such as only based on the above-mentioned matrix and vector generated.
[0041] In various embodiments, in relation to step 114, the statistical information may be derived at the data server and sent to the client device associated with the data owner which requested the statistical analysis to be performed on the set of data elements. For example, the statistical information may be transmitted to the client device associated with the data owner via any wired or wireless communication protocol known in the art.
[0042] Accordingly, the above-described method 100 according to various embodiments is advantageously able to process the encrypted data to derive statistical information thereon without the need to decrypt the encrypted data at the data server at all. In particular, by generating the above-mentioned vector comprising a plurality of entries (each entry being determined based on the set of data elements) and generating the above-mentioned matrix comprising a plurality of rows of entries (each row being determined based on the set of discrete intervals), it has been discovered according to various embodiments that statistical information on encrypted data with respect to each discrete interval in the set of discrete intervals can then be derived/computed based on such matrix and vector generated without having to decrypt the encrypted data at the data server at all. Therefore, the above-described method 100 according to various embodiments advantageously to address or at least mitigate, for example, the data privacy problem explained in the background, while still allowing statistical information on the data to be derived/computed at a data server. As a result, the above-described method 100 according to various embodiments advantageously enables a data owner to provide (via an associated client device or system) a desired data to a data server for processing (e.g., statistical analysis/computation) to derive the desired statistical information thereon and then receive the statistical information on the data from the data server, while preserving the privacy of the data.
[0043] Accordingly, in various embodiments, the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the data server (at least during the process of deriving/computing the statistical information at the data server).
[0044] In various embodiments, the number of entries in the vector may correspond to the number of discrete intervals in the set of discrete intervals. For example, following on from the above example whereby the set of data element corresponds to one image, the number of discrete intervals may be 256 (based on a range of 0 to 255 discrete intervals of pixel values) and thus, the number of entries in the vector may also be the same, that is, 256. Furthermore, in various embodiments, the above-mentioned step 108 of generating a vector comprises determining each entry in the vector based on a summation function using the set of data elements. In this regard, for determining each entry in the vector, the value of each data element is raised to an n-th power (i.e., raised to the power of n, where n is a particular exponent value) that corresponds to the position of the entry in the vector being determined.
[0045] For example, an entry in the vector may be determined to be the output of the summation function which sums up all the values of the data elements in the set of data elements. Furthermore, depending on the position of the entry in the vector, the value of each data element may be raised to an n-th power before the summation function is applied to the set of data elements. For example, for determining the first entry in the vector (e.g., the leftmost entry in a row vector or a topmost entry in a column vector), the value of each data element may be raised to the power of 0; for determining the second entry in the vector immediately adjacent the first entry, the value of each data element may be raised to the power of 1 ; for determining the third entry in the vector immediately adjacent the second entry, the value of each data element may be raised to the power of 2, and so on. Accordingly, in various embodiments, to determine/compute an entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector before the summation function is applied to the set of data elements to produce an output which is the determined/computed value for the entry.
[0046] In various embodiments, for determining each entry in the vector, the summation function is subjected to a modulo operation with respect to a divisor, that is, the modulo operation x mod p, where x is the output of the summation function for an entry (the determined value of the entry) and p is the divisor. In various embodiments, the divisor may be determined based on the number (iV) of data elements in the set of data elements, such as the smallest prime number larger than N.
[0047] In various embodiments, the above-mentioned step 108 of generating a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with (or corresponding to or being represented by) a discrete value. For example, the first entry in a row (e.g., the leftmost entry in the row) may be determined based on the corresponding first discrete interval in the set of discrete intervals, the second entry in the row immediately adjacent the first entry may be determined based on the corresponding second discrete interval in the set of discrete intervals, the third entry in the row immediately adjacent the second entry may be determined based on the corresponding third discrete interval in the set of discrete intervals, and so on. As an example, in the exemplary case of the set of discrete intervals being a set of discrete intervals of pixel values, each discrete interval may be associated with a respective discrete pixel value amongst a range of discrete pixel values associated with the image. In this regard, each discrete interval may be associated with only one discrete pixel value, respectively, or in other words, consist of only one data point.
[0048] Furthermore, in various embodiments, the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals. For example, following on from the above example whereby the set of data elements corresponds to an image, the number of discrete intervals may be 256 (based on a range of 0 to 255 discrete intervals of pixel values) and thus, the number of rows and the number of columns may each also be the same, that is, 256 rows x 256 columns. In addition, each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined. In other words, depending on the position of the row in the matrix, each entry of the row may be determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power. For example, for determining the entries of the first row (e.g., the topmost row in the matrix), the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 0; for determining the entries of the second row immediately adjacent/below the first row, the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 1, for determining the entries of the third row immediately adjacent/below the second row, the entries may be determined based on the value of the corresponding discrete intervals, respectively, raised to the power of 2, and so on.
[0049] Similarly, in various embodiments, for determining each entry of each row, the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor, that is, y mod p, where y is the raised value of the discrete interval and p is the divisor. It can be understood by a person skilled in the art that the term“raised value” of the discrete interval simply denote the resultant value of the discrete interval after it has been raised to an n-th power, which does not necessarily result in a higher value (e.g., when raised to the power of zero or one). Similarly, in various embodiments, the divisor may be determined based on the number (N) of data elements in the set of data elements, such as the smallest prime number larger than N. [0050] In various embodiments, the above-mentioned step 112 of deriving the statistical information comprises computing an inverse of the matrix, and then multiplying the inverse matrix with the vector to produce the statistical information.
[0051] In various embodiments, the statistical information comprises a histogram count (or may simply be referred to as a“histogram” herein) on the set of data elements with respect to each discrete interval in the set of discrete intervals.
[0052] In various example embodiments, the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values; the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived; the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values. For example, if the set of data corresponds to a set of pixels of one image, the set of discrete intervals of pixel values may correspond to the range of discrete pixel values associated with the image. As an example, for an 8-bit image, each pixel may have a pixel value ranging from 0 to 255, and thus the set of discrete intervals of pixel values associated with the image may range from 0 to 255, whereby each discrete interval corresponds to (or is associated with or is represented by) a respective discrete pixel value.
[0053] FIG. 2 depicts a schematic block diagram of a system 200 for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, according to various embodiments of the present invention, such as corresponding to the method 100 for deriving statistical information from encrypted data as described hereinbefore according to various embodiments of the present invention. In various embodiments, the system 200 may correspond to, or may be embodied as, a data server, such as described hereinbefore.
[0054] The system 200 comprises a memory 202, and at least one processor 204 communicatively coupled to the memory 202 and configured to: receive the encrypted data; obtain a set of data elements of the encrypted data based on which the statistical information is to be derived; obtain a set of discrete intervals associated with the set of data elements; generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements; generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and send the statistical information.
[0055] It will be appreciated by a person skilled in the art that the at least one processor 204 may be configured to perform the required functions or operations through set(s) of instructions (e.g., software modules) executable by the at least one processor 204 to perform the required functions or operations. Accordingly, as shown in FIG. 2, the system 200 may further comprise a receiving module or circuit 206 configured to receive the encrypted data; a data element obtaining module or circuit 208 configured to obtain a set of data elements of the encrypted data based on which the statistical information is to be derived; a discrete interval obtaining module or circuit 210 configured to obtain a set of discrete intervals associated with the set of data elements; a vector generating module or circuit 212 configured to generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements; a matrix generating module or circuit 214 configured to generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; a statistical information deriving module or circuit 216 configured to derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and a sending (transmission) module or circuit 218 configured to send/transmit the statistical information.
[0056] It will be appreciated by a person skilled in the art that the above-mentioned modules are not necessarily separate modules, and one or more modules may be realized by or implemented as one functional module (e.g., a circuit or a software program) as desired or as appropriate without deviating from the scope of the present invention. For example, the receiving module 206 and the sending module 218 may be realized by a transceiver, and the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and the statistical information deriving module 216 may be realized (e.g., compiled together) as one executable software program (e.g., software application or simply referred to as an “app”), which for example may be stored in the memory 202 and executable by the at least one processor 204 to perform the functions/operations as described herein according to various embodiments.
[0057] In various embodiments, the system 200 corresponds to the method 100 as described hereinbefore with reference to FIG. 1, therefore, various functions or operations configured to be performed by the least one processor 204 may correspond to various steps of the method 100 described hereinbefore according to various embodiments, and thus need not be repeated with respect to the system 200 for clarity and conciseness. In other words, various embodiments described herein in context of the methods are analogously valid for the respective systems or devices, and vice versa.
[0058] For example, in various embodiments, the memory 202 may have stored therein the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and/or the statistical information deriving module 216, which respectively correspond to various steps of the method 100 as described hereinbefore, which are executable by the at least one processor 204 to perform the corresponding functions/operations as described herein.
[0059] A computing system, a controller, a microcontroller or any other system providing a processing capability may be provided according to various embodiments in the present disclosure. Such a system may be taken to include one or more processors and one or more computer-readable storage mediums. For example, the system 200 described hereinbefore may include a processor (or controller) 204 and a computer-readable storage medium (or memory) 202 which are for example used in various processing carried out therein as described herein. A memory or computer-readable storage medium used in various embodiments may be a volatile memory, for example a DRAM (Dynamic Random Access Memory) or a non-volatile memory, for example a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), or a flash memory, e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory). [0060] In various embodiments, a“circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof. Thus, in an embodiment, a“circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g., a microprocessor (e.g., a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A“circuit” may also be a processor executing software, e.g., any kind of computer program, e.g., a computer program using a virtual machine code, e.g., Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a“circuit” in accordance with various alternative embodiments. Similarly, a“module” may be a portion of a system according to various embodiments in the present invention and may encompass a “circuit” as above, or may be understood to be any kind of a logic-implementing entity therefrom.
[0061] Some portions of the present disclosure are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.
[0062] Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as“receiving”,“obtaining”,“generating”,“deriving”,“sending” or the like, refer to the actions and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical quantities within the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices. [0063] The present specification also discloses a system, a device or an apparatus for performing the operations/functions of the methods described herein. Such a system, device or apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose machines may be used with computer programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate.
[0064] In addition, the present specification also at least implicitly discloses a computer program or software/functional module, in that it would be apparent to the person skilled in the art that the individual steps of the methods described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention. It will be appreciated by a person skilled in the art that various modules described herein (e.g., the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and the statistical information deriving module 216) may be software module(s) realized by computer program(s) or set(s) of instructions executable by a computer processor to perform the required functions, or may be hardware module(s) being functional hardware unit(s) designed to perform the required functions. It will also be appreciated that a combination of hardware and software modules may be implemented.
[0065] Furthermore, one or more of the steps of a computer program/module or method described herein may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer. The computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the methods described herein.
[0066] In various embodiments, there is provided a computer program product, embodied in one or more computer-readable storage mediums (non-transitory computer- readable storage medium), comprising instructions (e.g., the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and the statistical information deriving module 216) executable by one or more computer processors to perform a method 100 for deriving statistical information from encrypted data as described hereinbefore with reference to FIG. 1. Accordingly, various computer programs or modules described herein may be stored in a computer program product receivable by a system (e.g., a computer system or an electronic device) therein, such as the system 200 as shown in FIG. 2, for execution by at least one processor 204 of the system 200 to perform the required or desired functions.
[0067] The software or functional modules described herein may also be implemented as hardware modules. More particularly, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the software or functional module(s) described herein can also be implemented as a combination of hardware and software modules.
[0068] In various embodiments, the system 200 may be realized by any computer system (e.g., portable or desktop computer system), such as a computer system 300 as schematically shown in FIG. 3 as an example only and without limitation. Various methods/steps or functional modules (e.g., the data element obtaining module 208, the discrete interval obtaining module 210, the vector generating module 212, the matrix generating module 214, and/or the statistical information deriving module 216) may be implemented as software, such as a computer program being executed within the computer system 300, and instructing the computer system 300 (in particular, one or more processors therein) to conduct the methods/functions of various embodiments described herein. The computer system 300 may comprise a computer module 302, input modules, such as a keyboard 304 and a mouse 306, and a plurality of output devices such as a display 308, and a printer 310. The computer module 302 may be connected to a computer network 312 via a suitable transceiver device 314, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN). The computer module 302 in the example may include a processor 318 for executing various instructions, a Random Access Memory (RAM) 320 and a Read Only Memory (ROM) 322. The computer module 302 may also include a number of Input/Output (I/O) interfaces, for example I/O interface 324 to the display 308, and I/O interface 326 to the keyboard 304. The components of the computer module 302 typically communicate via an interconnected bus 328 and in a manner known to the person skilled in the relevant art.
[0069] It will be appreciated by a person skilled in the art that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0070] FIG. 4 depicts a schematic drawing illustrating an example overview of a system 400 comprising a client device or system 402 associated with or being used by a data owner and a data server (in this example, a cloud server) 404 according to various embodiments of the present invention illustrating an example communication/interaction between the client device 402 and the cloud server 404 for the client device 402 to outsource the statistical analysis/computation on data of the data owner to a cloud server 404 to obtain the desired statistical information on the data. FIG. 4 also shows an example flow diagram of the associated method for deriving the statistical information on the data. As shown, in the example, the client device 402 first encrypts the raw data (plaintext) 406 of the data owner according to a homomorphic encryption scheme 408 using a public key to protect the raw data before sending the encrypted data 410 to the cloud server 404. The cloud server 404 then performs the requested statistical analysis/computation 412 on the encrypted data 410 (e.g., histogram on the encrypted data) and send the statistical information (e.g., the histogram) 414 derived in an encrypted form to the client device 402 without the ability to learn anything about the raw data 406. The client device 402 may then decrypt 416 the encrypted statistical information 414 to obtain the requested statistical information (plaintext) 418. Accordingly, it can be observed that the method for deriving statistical information from the data advantageously does not require any interaction between the client device 402 and the cloud server 404 during the process of statistical analysis/computation. Furthermore, the process of statistical analysis/computation at the cloud server 404 is performed directly on the encrypted data 410 without the need to decrypt the encrypted data 410 at the cloud server 404 at all, thus preserving privacy of the data.
[0071] In order that the present invention may be readily understood and put into practical effect, various example embodiments of the present invention will be described hereinafter by way of examples only and not limitations. It will be appreciated by a person skilled in the art that the present invention may, however, be embodied in various different forms or configurations and should not be construed as limited to the example embodiments set forth hereinafter. Rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the present invention to those skilled in the art.
[0072] In particular, for better understanding of the present invention and without limitation or loss of generality, various example embodiments of the present invention will now be described with respect to the statistical analysis/computation requested to be performed on the encrypted data at the data server being homomorphic histogram computation, and thus the statistical information on the encrypted data derived/computed is a histogram count (or simply referred to as a“histogram”). However, it will be appreciated by a person skilled in the art that the present invention is not limited to homomorphic histogram computation, and various other types of statistical analysis/computations are also within the scope of the present invention, as long as the statistical analysis/computation can be performed on the encrypted data to obtain the desired statistical information based on the method described according to various embodiments described herein without the need to decrypt the encrypted data at the data server, such as but not limited to, probability distribution and cumulative frequency.
[0073] Given a large set of data, computing its histogram may be one of the most fundamental statistical tools to acquire some useful information about the data. According to various example embodiments of the present invention, techniques are provided to compute histograms encrypted under modern leveled homomorphic encryption schemes that support limited additions and multiplications. Specifically, for a set of data encrypted under some homomorphic encryption schemes, the techniques output the histogram of the data encrypted under the same scheme and key. As such, the data and its structure remain protected throughout the whole process of obtaining the histogram on the data. In addition, when the data server is deriving/computing the histogram on the data, the techniques do not require any interaction with the client device of the data owner, thus advantageously minimizes communication cost and removes the requirement where the data owner needs to be online (e.g., for communication with the data server) until the end of the whole process of obtaining the histogram on the data.
[0074] Since the first fully homomorphic encryption scheme (FHE) was proposed in 2009, considerable efforts have been spent on developing high-level homomorphic computation for privacy preserving purpose. With FHE allowing the operations of addition and multiplication on encrypted data, i.e., ciphertexts, various example embodiments of the present invention advantageously decompose a high-level computation process into a set of limited additions and multiplications so that the private data can be homomorphically processed without compromising its privacy.
[0075] According to various example embodiments, a system is provided comprising client devices or systems associated with or being used by the data owners and data processors (data servers). Data owners are in control of raw data, which could potentially be sensitive, for example the collection of fingerprints and face images. A data owner may wish to obtain useful analytical result from the data he/she controls, but the lack of computational power and/or accurate analytical model at the client side may prevent the data owner from doing so. The data processors (e.g., cloud services), on the other hand, may have computational resources and/or advanced analytical models, but may not have sufficient raw data. Directly providing raw data to a data processor for analytical purpose may be prohibitive for data owners due to privacy concerns. In this regard, various example embodiments of the present invention apply a homomorphic computation solution that is helpful for this privacy preserving purpose. As a result, a data owner only needs to encrypt the raw data with the public key and send the encrypted data to a data processor for homomorphic computing via a client device associated with the data owner. The returned result can be decrypted by the client device with the secret key associated with the client device to get the desired analytical result.
[0076] In various example embodiments, a homomorphic histogram computation technique is provided that can be realized through only the basic operations of addition and multiplication. In this regard, for example, it can be appreciated by a person skilled in the art that the operations involved in determining an inverse of a matrix may be implemented with or segregated into basic addition and multiplication operations. Histogram finds applications in many different tasks ranging from data visualization to feature representation. For example, histogram is a popular visual tool for presenting variable data through specific kind of vertical bar charts. In this regard, histogram graph may be a preferred form of presentation over plain text to present various data on paper and slides. In the field of machine learning, histogram may be an important feature to represent data in the form of vectors. For example, as a text document is composed of words and the composition differs with respect to different documents, the term “frequency vector” may be used to represent text by counting how often each term (e.g., word) occurs in a document. Histograms are also widely used in digital image processing and photography, for example, as a representation of the distribution of pixel values, including but not limited to, the color histogram, the co-occurrence matrix and the histogram of oriented gradients.
[0077] When the data points (data elements) and the intervals are known, computing histograms may be a straightforward comparison problem, i.e., to compare each data point to the edges of all the intervals. However, various embodiments of the present invention identified that performing comparisons on data encrypted with homomorphic encryption schemes requires interaction among different the client and server sides, which requires the client device of the data owner and the data server(s) to be online for instant communication therebetween. In contrast, various embodiments of the present invention provide techniques that are advantageously non-interactive in the sense that the client device of the data owner simply needs to encrypt the data and send to the encrypted data to the data processor(s) (data server(s)), and does not require any further interaction therebetween until the processed result is derived and sent to the client device of data owner. This removes the need for the client device to be online until the end of the entire process of deriving the statistical information on the data at the data processor.
[0078] FIG. 5 depicts a schematic drawing illustrating an example overview of a system 500 comprising a client device or system 502 associated with or being used by a data owner and a data server (in this example, a cloud server) 504 according to various example embodiments of the present invention illustrating an example communication/interaction between the client device 502 and the cloud server 504 for the data owner to outsource the histogram computation on his/her data to a cloud server 504 to obtain the requested histogram on the data. FIG. 5 also shows an example flow diagram of an example method for deriving the histogram count on the raw data (plaintext) 506 according to the example embodiments. The system 500 may be the same as the system 400 shown in FIG. 4 except that the system 500 is configured for the particular case of the statistical analysis/computation on the data performed at the cloud server 504 being histogram computation. For example, the data owner may store his/her image data via an associated client device on a cloud server 504. In order to protect the privacy of the data, the data owner may encrypt the raw data 506 via the associated client device according to a homomorphic encryption scheme 508 using a public key before sending the encrypted data 510 to the cloud server 504. For example, the data owner may wish to obtain the histogram of his/her data on the cloud server 504 periodically. In this regard, retrieving the encrypted data 510 from the cloud server 504, decrypting them and computing the histogram on the data by the client device 502 would consume too much resources. In contrast, the example method according to various example embodiments enables the cloud server 504 to compute the histogram on the encrypted data 510 and send the histogram derived/computed in an encrypted form 514 without the ability to learn anything about the raw data 506. The client device 502 may then decrypt 516 the encrypted histogram result 514 to obtain the desired histogram (plaintext) 518. As can be observed from FIG. 5, advantageously, the client device 502 of the data owner is not required to interact with the cloud server 504 during the process of histogram computing 512 on the encrypted data 510.
Homomorphic Encryption
[0079] The notion of homomorphic encryption was first proposed by Rivest in 1978. Informally, homomorphic encryption enables one to perform functions on encrypted texts in a way that preserves the function, that is, outputs an encryption of the function on the corresponding plaintexts. The standard textbook RSA scheme is an example of an encryption scheme homomorphic under multiplication. A couple of other partial homomorphic schemes follow which support homomorphism under either addition or multiplication.
[0080] However, it was only in the year 2009 that an encryption scheme that supports homomorphism under arbitrary functions was conceptualized. In this breakthrough work, Gentry designed a fully homomorphic encryption (FHE) scheme using ideal lattices that can perform homomorphism under arbitrary circuits. This spawned new and improved FHE schemes, exploiting different mathematical problems including the approximate greatest common divisor (GCD) problem, the standard Learning with Errors (LWE) problem, the ring LWE problem and the NTRU problem. The first attempts at implementing FHE schemes were not too promising. Nonetheless, much progress has since taken place from both the underlying design and practical optimizations with recent works reporting more promising results.
[0081] In general, a modem (existing) FHE scheme may involve or comprise four main steps or components/modules as follows:
• Key generation : this step or module may be configured to take as an input a security parameter and outputs a public key, a secret key and an evaluation key;
• Encryption : this step or module may be configured to take as inputs the public key and a message and outputs a ciphertext; • Decryption : this step or module may be configured to take as inputs the secret key and a ciphertext, and outputs the corresponding plaintext; and
• Evaluation· this step or module may be configured to take as inputs the evaluation key, a function /, and a set of ciphertexts, and outputs the result of the ciphertexts operated under /.
[0082] For example, to construct an FHE scheme, one may begin with a somewhat homomorphic encryption scheme (SHE) that supports a limited number of additions and multiplications. The SHE scheme typically introduces some noise to the message in the encryption process. This noise tends to increase when evaluating additions and multiplications of the ciphertexts, with the latter increasing more rapidly. Different techniques including bootstrapping and modulus switching are employed to manage the growth of the noise level, thereby turning the SHE scheme into an FHE scheme. In most situations, the functions to be employed on the data are known in advance. This motivates the notion of a leveled FHE scheme, namely, FHE schemes that support functions with a bounded multiplicative depth. In this regard, the first leveled FHE scheme was proposed and other improved variants were also proposed.
[0083] The various FHE schemes in use to date operate on two main plaintext spaces, binary spaces, and polynomial rings. As such, the messages may have to be encoded into one of these spaces to exploit the associated schemes. By applying the Chinese Remainder Theorem, one can perform batch/SIMD processing on a group of data, resulting in greater efficiencies. At present, there are several libraries implementing some of the proposed leveled FHE schemes and their optimizations. Examples include the HElib library for the BGV scheme and the SEAL library by Microsoft for the FV scheme.
Histogram Counting
[0084] According to various example embodiments, the problem of histogram counting may be defined as follows. Let ~ {¾<½—
Figure imgf000029_0001
a set of N data points. Let R denote the range of the data points, that is, =
Figure imgf000029_0002
. Let h& < < hk je reai numbers and denote by
Figure imgf000029_0003
the half-open interval
[Lέί ή £+1), £ = 0* 1, ...,k— 1 Suppose that ^
Figure imgf000029_0004
Thus, according to various example embodiments, for each ^— 0, 1, k 1, the problem of histogram counting seeks to determine:
¾ #{l ^ M I Xj £ Hi}' (Equation 1)
In other words, it is desired to count the number of data points in each interval ¾.
[0085] In a conventional technique, the problem of homomorphic histogram counting was considered among other statistical and categorical functions. Specifically, the conventional technique presented an encoding on the data that facilitates histogram counting. Using the notations above, suppose that 1^1—
Figure imgf000030_0001
Then each point * e ^ iS encoded as an r-integer vector such that each entry is indexed by a value in R, and the entry indexed by x is 1 while the other entries are 0. For example, suppose R = (0, 1, 2}. Then 0, 1, 2 are encoded respectively as (1, 0, 0), (0, 1, 0), (0, 0, 1). These vectors are then encrypted by any suitable SHE scheme and the histogram counts are obtained by summing up all the encrypted vectors. Observe that the output is an encrypted vector of histogram counts. In order to extract the histogram count of the element x, one performs a single multiplication of the encrypted result by the vector (0, 0, ..., 1, 0, ..., 0), with 1 in the entry indexed by x.
[0086] However, a disadvantage of the above conventional approach identified according to various example embodiments is that the length of the ciphertext is expanded by r, and hence, may not be practical when r is very large or when R is infinite. Another limitation is that it uses a special encoding that destroys the structure of the original value. Indeed, the encoding is designed purely to perform histogram counts. Thus, the conventional approach may not be applicable when histogram counting occurs as an intermediate step in certain algorithms, and the intermediate inputs are real values. As such, a method for encoding the intermediate encrypted values into such a form may be needed.
[0087] For example, a direct way to check if a value x lies in the interval Hi
Figure imgf000030_0002
is to compare whether A
Figure imgf000030_0003
and x ^ ^i+i . However, performing comparisons securely on data encrypted with homomorphic encryption schemes remains a big challenge. In this regard, a conventional comparison-based technique was disclosed.
In the conventional technique, to check if a ³ b where a and b are encrypted by some HE scheme E, the conventional technique computes ^Ca) E(b} E(_w for w > 0 ancj bounded by some suitable bound. Then, a— & if and only if ^0 is among the outputs. It follows that one needs to decrypt the outputs to obtain the result of the comparison and hence, it requires interaction among the users at the client and server sides.
[0088] For example, in order to perform private database queries, one may require an equality or comparison test to compare a given search query with the encrypted data. An equality test (or a comparison test) takes in two inputs a and b, and outputs 1 if a = b or 0 otherwise (i.e., if a < b or a > b). Efficient homomorphic equality and comparison tests have been considered in earlier works. For example, given an efficient equality test or a comparison test, one can compute the histogram counts by testing each data point against the interval boundaries (where equality test is used when the interval comprises a small number of points), and summing the corresponding outputs. For instance,
Figure imgf000031_0001
comprises one point the number of x ^ which equal to ¾ can be counted by summing all over x G s the outputs of an equality test on and
Figure imgf000031_0002
. Nonetheless, the equality and comparison tests considered in the earlier works may not be desirable for the following reasons. For example, such tests may deal with binary input and output spaces. However, it may be desirable to sum the outputs as integers and not as binary values. Similarly, a conventional technique may consider plaintext spaces of the finite field r for some prime p and small positive integer /. In order to directly apply such an equality test to histogram counting, p may be required to be at least as large as N (the number of data points), thereby resulting in a multiplicative depth of l°g^ + I°g which may not be practical when N is large.
[0089] For example, another conventional approach for computing histogram counts seeks to find functions to approximate the histogram layer in a neural network. It was suggested that the Gaussian function can be used as an approximate function to compute the histogram counts. Specifically, to obtain the number of x e 5 for which x ~
Figure imgf000031_0003
one computes:
Figure imgf000031_0004
xes (Equation 2) for some predetermined s. However, such a conventional approach may not be feasible when the values are encrypted as it remains very inefficient to compute
Figure imgf000032_0001
, where E denotes an FHE encryption scheme.
[0090] Accordingly, to overcome, or at least ameliorate, one or more of the deficiencies in the conventional approaches, various example embodiments of the present invention provide histogram counting techniques that can be realized with additions and multiplications, and are therefore compliant with existing homomorphic encryption schemes. In this regard, various example embodiments of the present invention tackle the problem of privacy preserving histogram counting using homomorphic encryption techniques. Accordingly, instead of a direct comparison of data, various example embodiments of the present invention construct appropriate systems of equations where the solutions or outputs are the desired histogram counts.
[0091] Therefore, various example embodiments of the present invention, while allowing the data owners to encrypt their sensitive data (e.g., fingerprint images), are still able to accurately compute the histogram count based on the encrypted values (encrypted data). For example, the histogram count may be an important feature for many applications, such as various machine learning algorithms. In addition, the data encrypted under the technique according to various example embodiments still preserves its structure. Accordingly, the encrypted data is not restricted to histogram counting only by the technique, but can also be used for other purposes (e.g., determining the probability distribution or cumulative frequency in relation to the encrypted data). Furthermore, the technique according to various example embodiments is non-interactive (i.e., during the process of deriving/computing the histogram count on the encrypted data). In other words, the data owner simply needs to (via an associated client device or system) encrypt the data and send the encrypted data to the data server (e.g. processing center), and does not require any further interaction therebetween until the processed result (histogram count) is derived and sent to the client device of the data owner. This removes the need for the client device to be online until the end of the entire process of deriving the histogram count on the data at the data server. Histogram Computation Compliant with Homomorphic Encryption
[0092] According to various example embodiments of the present invention, an example method for computing histogram counts without requiring any direct comparison or sorting of data will now be described by way of an example only and without limitation. In various example embodiments, the example method only involve modulo addition and multiplication of data points (data elements).
Computation involving only Addition and Multiplication
[0093] Assume that the set R, or the range of values, is a finite subset of real numbers. Observe that by multiplying these numbers with suitable scalars, these real numbers may be approximated by some integers. Hence, without any loss of generality, it suffices to consider the case where R is a discrete subset of integers.
[0094] Since the set R (e.g., corresponding to the “set of discrete intervals” as described herein according to various embodiments) is finite, the set S (e.g., corresponding to the“set of data elements” as described herein according to various embodiments) can be considered as a union of finite intervals, with each interval comprising exactly one point. More concretely, suppose that Hi ^ R =
Figure imgf000033_0001
that is, each interval ¾ consists of exactly one point in R. For * = 0, 1, k— 1,
Figure imgf000033_0002
example embodiments count how many points x e take the value of k.
[0095] Let j be such that O < j < k— 1 Consider the sum (e.g., corresponding to the“summation function using the set of data elements” as described herein according to various embodiments):
Figure imgf000033_0003
(Equation 3)
Since addition of integers is commutative, it follows that the following two example techniques of computing i (e.g., corresponding to“each entry” of the vector generated as described herein according to various embodiments) would produce identical results. 1) Let Si - °.
While 5 is not empty, do:
Randomly pick an x e ^. Let
Figure imgf000034_0001
Return ¾ .
2) Let ¾ = °.
For / = 0 to k - 1, do:
Compute
Figure imgf000034_0002
Let
Figure imgf000034_0003
Return ¾.
[0096] As described hereinbefore with reference to Equation 1, ci is the histogram count. As an illustrative example, given a set S = {1, 2, 3, 1, 3, 4}, Sj for j = 1 may be determined as l + 2 + 3 + l + 3 + 4 = 2(1) + 2 + 2(3) + 4 = 14. Various example embodiments may then construct k linear equations for / = °»
Figure imgf000034_0004
^— 1 as:
Figure imgf000034_0005
(Equations 4)
[0097] For example, based on the above illustrative example, S} may be determined in the following manner. First, based on the set S, it can be deduced that k = 4 (that is, there are fourth discrete intervals). In the set S, it can be observed that there are two l’s, one 2, two 3’s, and one 4. For example, assume j =
Figure imgf000034_0006
2 x (l)1 = 2, whereby co is the number of counts and yo is the value or number associated with the interval, and in this example, co = 2, ¾ = 1, and j = 1. Similarly, for the next iteration i = 1 , JI,I = c\y\ 1 = 1 x (2)1 = 2, and so on, assuming ; = 1. As another example, assume j = 2, it can be deduced that 2 x (l)2 + 1 x (2)2 + 2 x (3)2 + 1 x (4)2 = 40.
[0098] Define (e.g., corresponding to the“matrix comprising a plurality of rows of entries” as described herein according to various embodiments) to be matrix
Figure imgf000035_0001
which is a Vandermonde matrix for distinct 3 V s. This matrix has full rank so that 1 exists.
[0099] Now consider the matrix equation:
M
Figure imgf000035_0002
(Equation 5) Since ^ 1 exists obtain:
Figure imgf000035_0007
(Equation 6)
M-1 =
[00100] Accordingly, if
Figure imgf000035_0003
o.i . fc- i, then for each 1
Figure imgf000035_0004
ci (e.g., corresponding to the“statistical information” or“histogram count” as described herein according to various embodiments) may be determined as follows:
Figure imgf000035_0005
(Equation 7)
[00101] Therefore, various example embodiments of the present invention advantageously deduced that the histogram counts (that is, determine the values of
Figure imgf000035_0006
can be computed. Thus, various example embodiments of the present invention advantageously provide the technical solution for enabling encrypted histogram counting (i.e., histogram counting on encrypted data without having to decrypt the encrypted data). Furthermore, the example method involves only addition and multiplication operations. For example, since the example method involves taking powers of up to k - 1, the multiplicative depth is about
[00102] Various example embodiments note that while M is an integer matrix, ^ 1 is likely not an integer matrix but a matrix over the rational numbers. Nonetheless, each ci is still an integer. [00103] For illustration purpose only and without limitation, a first example (Example 1) will now be described to show a method for deriving histogram count on encrypted data according an example embodiment of the present invention.
[00104] Example 1 : Let
Figure imgf000036_0001
— 3 and y3 = 4 (e.g.; the single discrete values respectively associated with intervals (bins) of 0.5 to 1.5, 1.5 to 2.5, 2.5 to 3.5, and 3.5 to 4.5), and let the set of data elements ^ = ft ¾ , , 1, is 2} por this example, it can be seen that ¾ =
Figure imgf000036_0002
now be described how the ci’s can be obtained using the method according to an example embodiment of the present invention.
[00105] First, computing
Figure imgf000036_0003
3 = 124 This yields he following matrix equation:
Figure imgf000036_0004
(Equation 8)
Figure imgf000036_0006
[00107] Therefore, the histogram count (ci’s) on the set of data elements can be derived/computed as:
Figure imgf000036_0005
(Equation 9)
Modulo Computation
[00108] Various example embodiments observed that in computing ct, for example, the above method may involve handling numbers of the form x‘ for i up to k - 1 which may be extremely large. For example, in the context of grey-scale images, the pixel values may be from 0 to 255. Hence, the numbers being handled may be as large as SS255 or approximately 2048-bit long. Dealing with such large numbers may either result in data overflow or require specialized procedures to handle them.
[00109] To overcome, or at least mitigate, such a potential issue, various example embodiments provide a method for deriving statistical information (e.g., histogram count) without having to deal with such large numbers. In this regard, various example embodiments identified that the histogram counts are numbers at most N (the number of data points (data elements)). Thus, according to various example embodiments, the histogram computation is configured to involve only numbers in this range, and this is achieved through the modulo operations. More specifically, let p denote a prime number that is slightly larger than N (e.g., the smallest prime number larger than N). In this regard, instead of working with integers alone, all the operations are subjected to a modulo operation, that is, modulo p. Hence, for example, the above-mentioned system of Equations 4 may become:
¾ + c + + c fe-l º N mod p
Figure imgf000037_0001
(Equations 10)
[00110] Let — M mod p an^ S — 5j mod p Consider the following matrix equation:
mod p
Figure imgf000037_0002
(Equation 11)
[00111] First, since M is a Vandermonde matrix and the Vs are all distinct, the determinant is non-zero. It follows that ^ is invertible over the field ¾ and M 1 mod p exists. Thus, similar to Example 1, the histogram count (co> ci>— * cfc-i) can be derived/computed by multiplying M 1 to Equation 11. It can also be noted that
Figure imgf000038_0001
which follows directly from the following two equations:
• 0 < ffj < i¥ < p ·
a c- = c mod p
[00112] Thus, various example embodiments provide a method or algorithm for computing the histogram counts on encrypted data that requires only addition and multiplication of numbers of size around
Figure imgf000038_0002
^ bits. By way of an example only and without limitation, an exemplary method or algorithm may be provided as follows:
Algorithm 1: to compute histograms without actual counting
- Inputs: The data set 5 and the intervals
Figure imgf000038_0003
- Output: The histogram counts.
Figure imgf000038_0005
mod p.
Figure imgf000038_0004
[00113] For illustration purpose only and without limitation, a second example (Example 2) will now be described to show a method for deriving histogram count based on the above-mentioned algorithm 1 according an example embodiment of the present invention.
[00114] Example 2: Referring back to Example 1, let /? = 11 which is the smallest prime number bigger than 8. Based on the above-described algorithm 1, the matrix M may be constructed as: (Equation 12)
Figure imgf000039_0001
[00115] In addition, based on the above-described algorithm 1, ¾ may be computed
Figure imgf000039_0002
[00116] Furthermore, the inverse matrix M1 can be computed to be:
(Equation 13)
Figure imgf000039_0003
= (3 2 1 2)
[00117] Accordingly, the histogram count can be derived as
Figure imgf000039_0004
[00118] In various example embodiments, instead of using a prime number p, a composite number
Figure imgf000039_0005
is used such that gcd(i|, D) = 1, where D = P£</L - ¾) .
This is because D represents the determinant of and thus, is invertible if and only if gcd(4,D) = 1. Various example embodiments identified that it suffices to let P > ¾, where ¾ is the maximum count of the data set. Since we may not know the value of ¾ apriori, choosing p > N will guarantee a correct answer.
Homomorphic Histogram Computation
[00119] According to various example embodiments of the present invention, techniques of histogram counting that can be combined with modem (existing) homomorphic encryption schemes to provide a secure way to compute histograms without revealing the data will now be described. As there exist various different homomorphic encryption schemes and new efficient schemes are constantly being designed, the following description will mainly focus on the features of the encryption schemes that are required.
[00120] As with previous Examples, assuming there are N data points (data elements) with k histogram intervals, let E be a leveled FHE scheme with corresponding decryption function D. Suppose that E satisfies the following:
· E supports functions with multiplicative depth of log k, • Let p be a prime number larger than N. The plaintext space of E is either ¾ or the polynomial ring ¾W//W for some polynomial f (x); and
• E supports modulo p operations homomorphically. More specifically, for any plaintexts x and and a constant a:
D(_E(E) ) = x mod p
D (EGC) + Ely ) = x + y mod p
Figure imgf000040_0001
(Equation 14)
[00121] Suppose that the data points encrypted by E are accessible, that is, let
Figure imgf000040_0002
be the set
Figure imgf000040_0003
Various example embodiments provide a method for computing ci, where c£ denotes the histogram counts for the point >’i. In this regard, by way of an example only and without limitation, an exemplary method or algorithm may by provided as follows, for example, by modifying Algorithm 1 as described hereinbefore:
Algorithm 2 Algorithm to compute histograms for encrypted data points
- Inputs: The encrypted data set E(5) and the intervals ya,
Figure imgf000040_0004
Output: The encrypted histogram counts ££¾).
Figure imgf000040_0007
Figure imgf000040_0005
Compute
Compute mod p.
Figure imgf000040_0006
[00122] Accordingly, the histogram count ci on the encrypted data set E(S) can be obtained by decrypting
Figure imgf000041_0001
[00123] For illustration purpose only and without limitation, a third example (Example 3) will now be described to show a method for deriving histogram count on encrypted data based on the above-mentioned algorithm 2 according an example embodiment of the present invention.
[00124] Example 3: In Examples 1 and 2, the set of data elements was assumed to be S = {1, 2, 3, 4, 4, 1, 1, 2} jn the encrypted form, the set of encrypted data may thus be expressed as {E( 1 mod p ), E( 2 mod p), E 3 mod p), E( 4 mod p), E(4 mod p), E{ 1 mod p), E(l mod p), E(2 mod p)}. To compute
Figure imgf000041_0002
mod p) = ff(18 mod p) js computed. Since E is homomorphic, it suffices to compute E( 1 mod p) + E(2 mod p) + E(3 mod p) + E(4 mod p) + E{4 mod p) + E( 1 mod p) + E(\ mod p) + E{ 2 mod p). Similarly, each of
Figure imgf000041_0003
mod p ) can be determined by computing &(x mod p)2
[00125] EFsing batch processing, the c can be computed in one SIMD operation as long as there are more than k slots. In this case, one fills up k slots with the same data point and computes the ¾ and the c concurrently.
[00126] By way of examples only and without limitations, the BGV scheme as implemented in the HElib library and the FV scheme implemented in SEAL V2.0 are examples of FEE schemes that are suitable according to various example embodiments of the present invention (e.g., they satisfy various conditions proposed). According to various example embodiments, the plaintext space may be considered to be the polynomial ring
Figure imgf000041_0004
and the ciphertext space may be considered to be the polynomial ring W//Cx) for some cyclotomic polynomial /Cx) and appropriately chosen with P I*?.
Implementation of Homomorphic Histogram Computation on Fingerprint Images
[00127] For illustration purpose only and without limitation, the method for deriving histogram on encrypted data according to various example embodiments was applied on real fingerprint images according to an exemplary implementation to demonstrate the effectiveness of the method in deriving the histogram on the encrypted data without compromising the privacy of the encrypted data. In the exemplary implementation, a 4- bit fingerprint image of size 100 x 72 as shown in FIG. 6A was processed, i.e., 7200 pixel values with value range [0, 24 - 1], i.e., [0, 15]. The expected result of the exemplary implementation is the 16-bin histogram counts of the image which represents the distribution of the pixel values. The exemplary implementation uses the SEAL library to implement FHE functions such as addition and multiplication on the plaintext and ciphertext. The exemplary implementation may be segregated into five high-level steps or stages, namely encode, encrypt, count, decrypt and decode as follows:
• Encode for a data owner with sensitive raw data (assume to be in numerical representation) such as the fingerprint image, a client device associated with the data owner first transforms (encode) the raw data into special form of mathematical expressions, namely, polynomials with system-defined properties. This is for the purpose of successful and secure calculation at a later stage. For example, the Chinese Remainder Theorem may be used to perform batch processing on a group of data, i.e., to encode thousands of pixels into a single polynomial.
• Encrypt: with the raw data encoded into polynomials, the client device further encrypts the polynomial into a randomized format, to prevent non-authorized personnel to learn the underlying information. For example, this may be achieved through homomorphic encryption on the encoded polynomial. After that, the encrypted data as well as the public key and evaluation key are sent to the data processor (data server).
• Compute : with the encrypted data received at the data processor, the data processor runs a modulo-based Gaussian elimination algorithm (as described hereinbefore according to various example embodiments of the present invention) to compute the inverse matrix M a ; solve the equations and obtain the encrypted histogram count result with respect to the 16 bins. In this regard, the data processor has no knowledge of the histogram counts as they are encrypted. Accordingly, this achieves both data privacy and process privacy. • Decrypt with the encrypted histogram counts transmitted to the client device of the data owner, the client device is configured to use the secret key to decrypt them, and obtains the count in mathematical polynomial form.
• Decode: with the decrypted polynomials that represent the numerical forms of the final count, the client device is configured to further decode them into a numerical format, which then completes the entire process. As an example illustration, FIG. 6B shows a plot of the histogram (histogram counting plot) derived on the fingerprint image shown in FIG. 6A from the above exemplary implementation.
[00128] Accordingly, various embodiments of the present invention advantageously provide a method for deriving statistical information from encrypted data. In various example embodiments, a method for computing histograms on encrypted data is provided using only modulo addition and multiplication, which is compliant with existing homomorphic encryption schemes. In this regard, for illustration purposes only, example methods or algorithms for homomorphic histogram computation on encrypted data have been described herein. For example, various example embodiments of the present invention avoid the comparison-based histogram count technique as performed conventionally (e.g., comparing encrypted value with histogram interval boundaries) which leak data range, thus protecting the data privacy against range leakage. Furthermore, exemplary implementation described herein demonstrated that the method according to various embodiments achieves privacy preserving for the data owner, for example, since the secret key is kept to the client device of the data owner, it is difficult for other non-authorized parties including the data processor to learn the underlying information from the encrypted data. The final histogram count results derived are also actual or exact values, instead of being estimated values.
[00129] While embodiments of the invention have been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the method comprising:
receiving, at a data server, the encrypted data;
obtaining a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtaining a set of discrete intervals associated with the set of data elements;
generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals;
deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and
sending, from the data server, the statistical information.
2. The method according to claim 1, wherein the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the data server.
3. The method according to claim 2, wherein the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals, and said generating a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
4. The method according to claim 3, wherein for determining each entry in the vector, the summation function is subjected to a modulo operation with respect to a divisor.
5. The method according to claim 1, wherein said generating a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
6. The method according to claim 5, wherein the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
7. The method according to claim 6, wherein for determining each entry of each row, the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
8. The method according to claim 1, wherein said deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
9. The method according to claim 1 , wherein the statistical information comprises a histogram count on the set of data elements with respect to each discrete interval in the set of discrete intervals.
10. The method according to claim 1, wherein: the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and
the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
11. A system for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the system comprising:
a memory; and
at least one processor communicatively coupled to the memory and configured to:
receive the encrypted data;
obtain a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtain a set of discrete intervals associated with the set of data elements; generate a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generate a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals;
derive the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector; and
send the statistical information.
12. The system according to claim 11, wherein the statistical information is derived directly from the encrypted data without decrypting the encrypted data at the system.
13. The system according to claim 12, wherein the number of entries in the vector corresponds to the number of discrete intervals in the set of discrete intervals, and said generate a vector comprises determining each entry in the vector based on a summation function using the set of data elements, wherein for determining each entry in the vector, the value of each data element is raised to an n-th power that corresponds to the position of the entry in the vector being determined.
14. The system according to claim 13, wherein for determining each entry in the vector, the summation function is subjected to a modulo operation with respect to a divisor.
15. The system according to claim 11, wherein said generate a matrix comprises determining each entry of each row based on a corresponding discrete interval in the set of discrete intervals, each discrete interval being associated with a discrete value.
16. The system according to claim 15, wherein the number of rows and the number of columns both correspond to the number of discrete intervals in the set of discrete intervals, and each entry of each row is determined based on the value of a corresponding discrete interval in the set of discrete intervals raised to an n-th power that corresponds to the position of the row in the matrix being determined.
17. The system according to claim 16, wherein for determining each entry of each row, the raised value of the corresponding discrete interval is subjected to a modulo operation with respect to a divisor.
18. The system according to claim 11, wherein said deriving the statistical information comprises computing an inverse of the matrix, and multiplying the inverse matrix with the vector to produce the statistical information.
19. The system according to claim 11, wherein:
the encrypted data corresponds to one or more images comprising a plurality of pixels, each pixel having a pixel value amongst a range of pixel values;
the set of data elements corresponds to a set of pixels of the one or more images based on which the statistical information is to be derived;
the set of discrete intervals corresponds to a set of discrete intervals of pixel values associated with the one or more images; and
the statistical information comprises a histogram count on the set of pixels with respect to each discrete interval of pixel value in the set of discrete intervals of pixel values.
20. A computer program product, embodied in one or more non-transitory computer- readable storage mediums, comprising instructions executable by at least one processor to perform a method for deriving statistical information from encrypted data, the encrypted data being encrypted based on a homomorphic encryption scheme, the method comprising:
obtaining a set of data elements of the encrypted data based on which the statistical information is to be derived;
obtaining a set of discrete intervals associated with the set of data elements;
generating a vector comprising a plurality of entries, each entry being determined based on the set of data elements;
generating a matrix comprising a plurality of rows of entries, each row of entries being determined based on the set of discrete intervals; and deriving the statistical information on the set of data elements with respect to each discrete interval in the set of discrete intervals based on the matrix and the vector.
PCT/SG2018/050100 2018-03-05 2018-03-05 Method and system for deriving statistical information from encrypted data WO2019172837A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2018/050100 WO2019172837A1 (en) 2018-03-05 2018-03-05 Method and system for deriving statistical information from encrypted data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2018/050100 WO2019172837A1 (en) 2018-03-05 2018-03-05 Method and system for deriving statistical information from encrypted data

Publications (1)

Publication Number Publication Date
WO2019172837A1 true WO2019172837A1 (en) 2019-09-12

Family

ID=67847321

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2018/050100 WO2019172837A1 (en) 2018-03-05 2018-03-05 Method and system for deriving statistical information from encrypted data

Country Status (1)

Country Link
WO (1) WO2019172837A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487466A (en) * 2020-12-16 2021-03-12 厦门市美亚柏科信息股份有限公司 Featureless encrypted file detection method, terminal equipment and storage medium
CN112995076A (en) * 2019-12-17 2021-06-18 国家电网有限公司大数据中心 Discrete data frequency estimation method, user side, data center and system
EP4149045A4 (en) * 2020-06-15 2024-04-24 Crypto Lab Inc Device and method for performing statistical calculation on homomorphic ciphertext

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070154089A1 (en) * 2006-01-03 2007-07-05 Chang-Jung Kao Method and apparatus for calculating image histogram with configurable granularity
CN103050076A (en) * 2012-12-10 2013-04-17 华映视讯(吴江)有限公司 Method for improving color image contrast of display system and image processing system
US20130097417A1 (en) * 2011-10-13 2013-04-18 Microsoft Corporation Secure private computation services
US20130339722A1 (en) * 2011-11-07 2013-12-19 Parallels IP Holdings GmbH Method for protecting data used in cloud computing with homomorphic encryption
CN106295656A (en) * 2016-08-03 2017-01-04 徐庆 Image outline characteristic extraction method based on image color lump content and device
CN106952212A (en) * 2017-03-14 2017-07-14 电子科技大学 A kind of HOG image characteristics extraction algorithms based on vectorial homomorphic cryptography

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070154089A1 (en) * 2006-01-03 2007-07-05 Chang-Jung Kao Method and apparatus for calculating image histogram with configurable granularity
US20130097417A1 (en) * 2011-10-13 2013-04-18 Microsoft Corporation Secure private computation services
US20130339722A1 (en) * 2011-11-07 2013-12-19 Parallels IP Holdings GmbH Method for protecting data used in cloud computing with homomorphic encryption
CN103050076A (en) * 2012-12-10 2013-04-17 华映视讯(吴江)有限公司 Method for improving color image contrast of display system and image processing system
CN106295656A (en) * 2016-08-03 2017-01-04 徐庆 Image outline characteristic extraction method based on image color lump content and device
CN106952212A (en) * 2017-03-14 2017-07-14 电子科技大学 A kind of HOG image characteristics extraction algorithms based on vectorial homomorphic cryptography

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995076A (en) * 2019-12-17 2021-06-18 国家电网有限公司大数据中心 Discrete data frequency estimation method, user side, data center and system
CN112995076B (en) * 2019-12-17 2022-09-27 国家电网有限公司大数据中心 Discrete data frequency estimation method, user side, data center and system
EP4149045A4 (en) * 2020-06-15 2024-04-24 Crypto Lab Inc Device and method for performing statistical calculation on homomorphic ciphertext
CN112487466A (en) * 2020-12-16 2021-03-12 厦门市美亚柏科信息股份有限公司 Featureless encrypted file detection method, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
US11843687B2 (en) Systems, devices, and processes for homomorphic encryption
Al Badawi et al. Implementation and performance evaluation of RNS variants of the BFV homomorphic encryption scheme
Chen et al. Homomorphic lower digits removal and improved FHE bootstrapping
Cheon et al. Homomorphic encryption for arithmetic of approximate numbers
Yavuz et al. A chaos-based image encryption algorithm with simple logical functions
CN107683502B (en) Generating cryptographic function parameters from compact source code
Li et al. An image encryption scheme based on chaotic tent map
Aslett et al. A review of homomorphic encryption and software tools for encrypted statistical machine learning
Janakiraman et al. Lightweight chaotic image encryption algorithm for real-time embedded system: Implementation and analysis on 32-bit microcontroller
Rohith et al. Image encryption and decryption using chaotic key sequence generated by sequence of logistic map and sequence of states of Linear Feedback Shift Register
Lamba et al. S4: A novel & secure method for enforcing privacy in cloud data warehouses
Zou et al. Image encryption algorithm with matrix semi-tensor product
Zhu et al. Image encryption algorithm with an avalanche effect based on a six-dimensional discrete chaotic system
US20170091485A1 (en) Method of obfuscating data
Zhang et al. Cryptanalysis of image scrambling based on chaotic sequences and Vigenère cipher
Hu et al. A uniform chaotic system with extended parameter range for image encryption
Hanchinamani et al. An efficient image encryption scheme based on a Peter De Jong chaotic map and a RC4 stream cipher
Jäschke et al. Accelerating homomorphic computations on rational numbers
Abdeldaym et al. Modified RSA algorithm using two public key and Chinese remainder theorem
WO2019172837A1 (en) Method and system for deriving statistical information from encrypted data
Fenner et al. Privacy-preserving gaussian process regression–A modular approach to the application of homomorphic encryption
Li et al. Fully homomorphic encryption with table lookup for privacy-preserving smart grid
Kumar et al. Privacy preserving, verifiable and efficient outsourcing algorithm for matrix multiplication to a malicious cloud server
US11593516B2 (en) Private information retrieval with sublinear public-key operations
CN112272082B (en) Image encryption/decryption method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908777

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908777

Country of ref document: EP

Kind code of ref document: A1