CN114579046B - Cloud storage similar data detection method and system - Google Patents

Cloud storage similar data detection method and system Download PDF

Info

Publication number
CN114579046B
CN114579046B CN202210070755.3A CN202210070755A CN114579046B CN 114579046 B CN114579046 B CN 114579046B CN 202210070755 A CN202210070755 A CN 202210070755A CN 114579046 B CN114579046 B CN 114579046B
Authority
CN
China
Prior art keywords
data
training
vector
cloud storage
semantics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210070755.3A
Other languages
Chinese (zh)
Other versions
CN114579046A (en
Inventor
田纹龙
何婷婷
叶旭明
薛晓晔
李瑞轩
万亚平
欧阳纯萍
刘永彬
刘征海
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of South China
Original Assignee
University of South China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of South China filed Critical University of South China
Priority to CN202210070755.3A priority Critical patent/CN114579046B/en
Publication of CN114579046A publication Critical patent/CN114579046A/en
Application granted granted Critical
Publication of CN114579046B publication Critical patent/CN114579046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • G06F16/152File search processing using file content signatures, e.g. hash values
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a cloud storage similar data detection method and a cloud storage similar data detection system, wherein the method comprises the following steps: a model training stage, preprocessing training data to obtain a training data block; extracting feature vectors of all training data blocks by using a MinHash algorithm to obtain first vectors without embedded semantics and training a machine learning model, and obtaining a weight matrix between the first vectors and the vectors with embedded context semantics and a trained model after training; processing the predicted data by using the trained model and the same processing method as the preprocessing training data to obtain a predicted data block; extracting feature vectors of all predicted data blocks by using a MinHash algorithm to obtain vectors of the predicted data without embedded semantics; matrix multiplication is carried out on vectors of the predicted data without embedded semantics and the weight matrix, so that vectors of the training data after the embedded semantics are obtained; the most similar data block is found by the Annoy algorithm. The method can reduce calculation cost, solve the problem of unstable extraction of the characteristic values and improve detection accuracy.

Description

Cloud storage similar data detection method and system
Technical Field
The invention relates to the technical field of similar data detection, in particular to a cloud storage similar data detection method and system.
Background
With the development of networks and storage technologies, cloud storage has been widely used in daily life, and people prefer to pay for their online data through the cloud storage service due to the reliability and flexibility of the cloud storage service. However, cloud storage services are flooded with a large amount of redundant data. Such redundant data not only reduces the storage utilization of the cloud storage service provider, but also increases the financial budget of the user's cloud storage service. For this reason, the conventional redundant data deduplication technology is one of the important technologies widely used in cloud storage at present. Through identifying and eliminating redundant data blocks, the technology can effectively improve the cloud storage utilization rate and save the user data storage cost. However, the conventional redundant data deduplication technology can only distinguish redundant data blocks from non-redundant data blocks, but cannot identify and eliminate redundant data portions in similar data blocks. For this reason, existing similar data detection techniques use the fingerprint value and distribution of the data blocks to determine redundant data portions among similar data blocks. However, the existing method has no good robustness in similar data detection, is easily interfered by other factors, and causes unstable feature extraction problems, such as modification and deletion of data block content, influence of block length change and the like.
Disclosure of Invention
In view of the above problems, the present invention provides a cloud storage similar data detection method, in particular, a cloud storage similar data detection method based on block-level semantics, where the cloud storage similar data detection method includes:
model training stage, training steps are:
preprocessing training data to obtain a training data block;
extracting feature vectors of all training data blocks by using a MinHash algorithm to obtain a first vector without embedded semantics;
training a machine learning model based on the first vector to obtain a weight matrix between the first vector and the vector embedded with the context semantics and a trained model;
in the model prediction stage, the prediction steps are as follows:
processing the predicted data by using the trained model and the same processing method as the preprocessing training data to obtain a predicted data block;
extracting feature vectors of all predicted data blocks by using a MinHash algorithm to obtain vectors of the predicted data without embedded semantics;
matrix multiplication is carried out on vectors of the predicted data without embedded semantics and the weight matrix, so that vectors of the training data after the embedded semantics are obtained;
all vectors of training data after embedding semantics are constructed into a binary tree through an Annoy algorithm, each vector is a node of the binary tree, and other nodes closest to the node corresponding to the current data block are judged, so that the data block most similar to the current data block is found.
Preferably, the step of extracting feature vectors of all training data blocks using the MinHash algorithm includes:
taking a preset number of hash functions, scanning the content of the training data blocks, calculating to obtain a hash value corresponding to each hash function, and then summing and averaging the calculated hash values to obtain an initial characteristic value of the training data blocks;
and scanning the initial characteristic value by using a sliding window, taking the data information in the window as a sub-characteristic value every time the sliding window moves, generating a characteristic vector corresponding to the sub-characteristic value according to a mapping function between the sub-characteristic value and the characteristic vector, and finally summing and averaging the characteristic vectors corresponding to all the sub-characteristic values to be used as the characteristic vector of the data block.
Preferably, the step of preprocessing the training data to obtain a training data block comprises:
unifying the input training data types into bit streams;
and divides the bit stream into a number of training data blocks.
Preferably, training a machine learning model based on the first vector to obtain a weight matrix between the first vector and the vector embedded with context semantics, and specifically comprising the steps of:
and inputting a first vector corresponding to the context of the data block into an input layer of the machine learning model, taking the first vector corresponding to the data block as an output layer of the machine learning model, taking a difference value between the input layer and the output layer as a loss, continuously updating a weight matrix, and finally obtaining the weight matrix embedded with the context information.
Preferably, the weight matrix specifically includes a weight matrix of an output layer and a weight matrix of an input layer.
Preferably, performing matrix multiplication on the vector of the predicted data without embedded semantics and the weight matrix to obtain a vector of the training data after embedded semantics further includes:
and the weight matrix for matrix multiplication is the weight matrix of the output layer of the machine learning model.
Preferably, after the most similar data blocks are found, redundant portions between similar data blocks are also deleted using differential encoding.
According to another aspect of the present invention, there is also disclosed a cloud storage similar data detection system, in particular a cloud storage similar data detection system based on block-level semantics, the cloud storage similar data detection system comprising a memory and a processor, the memory storing a computer program therein;
the processor is configured to execute a cloud storage similar data detection method according to any one of the preceding claims when the computer program is run.
The invention fully considers the context relation among the data blocks, namely semantic information among the data blocks, provides a cloud storage similar data detection technology based on block-level semantics, utilizes machine learning to perform characterization learning, breaks through the thinking of the traditional similar block recognition technology relying on hash value extraction, combines the context of the data blocks, embeds the semantics into the feature set of the data blocks, reduces the calculation cost, solves the problem of unstable feature value extraction existing in the prior art, improves the accuracy of similar data block detection, and improves the storage utilization rate and user experience.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart of a detection method according to an embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
In the first embodiment, an example of a similar data detection method based on block-level semantics is described in detail below with reference to fig. 1.
Model training stage, training step includes:
1. preprocessing training data to obtain a training data block;
2. extracting feature vectors of all training data blocks by using a MinHash algorithm to obtain vectors without embedded semantics as initial vectors (namely first vectors);
3. training a machine learning model based on the first vector to obtain a weight matrix between the first vector and the vector embedded with the context semantics and a trained model;
in the model prediction stage, the prediction steps are as follows:
1. processing the predicted data by using the trained model and the same processing method as the preprocessing training data to obtain a predicted data block;
2. extracting feature vectors of all predicted data blocks by using a MinHash algorithm to obtain vectors of the predicted data without embedded semantics;
3. matrix multiplication is carried out on vectors of the predicted data without embedded semantics and the weight matrix, so that vectors of the training data after the embedded semantics are obtained;
all vectors of training data after embedding semantics are constructed into a binary tree through an Annoy algorithm, each vector is a node of the binary tree, and other nodes closest to the node corresponding to the current data block are judged, so that the data block most similar to the current data block is found.
In some embodiments, the step of extracting feature vectors of all training data blocks in the model training phase using a MinHash algorithm includes:
taking n hash functions (n is 80, 400 or other selectable numbers and the like), scanning the content of a training data block, calculating to obtain a hash value corresponding to each hash function, and then summing and averaging the calculated hash values to obtain an initial characteristic value of the training data block, so that characteristic deviation caused by different contents in the data block is reduced;
and scanning the initial characteristic value by using the sliding window, taking the data information in the window as a sub-characteristic value every time the sliding window moves, generating a characteristic vector corresponding to the sub-characteristic value according to a mapping function between the sub-characteristic value and the characteristic vector, and finally summing and averaging the characteristic vectors corresponding to all the sub-characteristic values to be used as the characteristic vector of the data block.
In some embodiments, preprocessing the training data to obtain a training data block includes:
unifying the input training data types into bit streams;
and divides the bit stream into N training data blocks.
In some embodiments, training a machine learning model based on the first vector to obtain a weight matrix between the first vector and the vector after embedding the context semantics, specifically comprising the steps of:
and inputting the first vector corresponding to the context of the data block into an input layer of the machine learning model, taking the first vector corresponding to the data block as an output layer of the machine learning model, taking the difference value between the input layer and the output layer as loss, continuously updating the weight matrix, and finally obtaining the weight matrix embedded with the context information.
Specifically, the machine learning network is an input layer, an intermediate layer and an output layer respectively. Wherein the middle layer is respectively associated with the input layer and the output layer through two weight matrixes W, U. Here, the input layer is the context vector of the data block, i.e., the initial vector of the first k and last k data blocks of the current data block, the output layer is the initial vector of the current data block, and the middle layer represents the vector of the embedded semantics (initial value is 0). The input layer XW is hidden1 and the output layer XU-1 is hidden2, both of which can be seen as vectors embedding semantics. Therefore, the difference between hidden1 and hidden2 is taken as a loss, the weight matrices W and U are continuously updated, and finally the weight matrices W and U embedded with the context information are obtained. By the weight matrix U, only the initial feature vector of the data block can be input, and the feature vector after embedding the semantics can be obtained without inputting the context information.
In some embodiments, matrix multiplying the vector of the predicted data without embedded semantics with the weight matrix in the prediction stage to obtain the vector of the training data after embedded semantics further includes:
the weight matrix for matrix multiplication is the weight matrix U of the output layer of the machine learning model, the weight matrix U obtained through the training process can be reused, training can be additionally carried out (on the basis of the model, the weight matrix is carried out again through other data), repeated calculation is avoided, and the time and the calculation cost for data deduplication are reduced.
In some embodiments, after the most similar data blocks are found in the prediction stage, redundant portions between similar data blocks are also deleted using differential encoding.
Specifically, the data compression step is:
(1) And acquiring a data block of the training data and a semantic model corresponding to the data block, and setting a compression threshold g.
(2) And extracting parameters in the semantic model corresponding to all the data blocks to be used as a compression characteristic matrix of the data blocks.
(3) Traversing all the data blocks to perform the following operations:
step one, obtaining a compression characteristic matrix of a current data block.
Step two, traversing all basic block compression feature matrixes, and searching a basic block corresponding to the compression feature matrix with the minimum distance from the current compression feature matrix.
And thirdly, if the distance between the two compression feature matrixes is still larger than the set threshold g, proving that the current data block is not suitable for compression, storing the current data block as is, and adding the compression feature matrix of the current data block into the compression feature matrix of the basic block, namely, using the compression feature matrix as a Base channel.
And step four, if the distance between the two compression feature matrixes is still smaller than the set threshold g, compressing the current data block, generating a Delta data block by using a Delta Compression algorithm, wherein the Delta data block only comprises different parts of the Base data block and the current data block, and adding the index of the found most similar data block and the Delta data block into a Delta file.
(6) And (3) compressing the data uploaded by the user into a Base file and a Delta file through the step (5), wherein the sum of the volumes of the Base file and the Delta file is smaller than the volume of the data file uploaded originally. Finally, the purpose of deleting redundant parts among similar data blocks by utilizing differential coding is achieved.
According to another embodiment, a cloud storage similar data detection system, in particular a cloud storage similar data detection system based on block-level semantic embedding, is disclosed, comprising a memory and a processor, wherein a computer program is stored in the memory;
a processor configured to perform a cloud storage similar data detection method based on block level semantic embedding in any of the embodiments described above when running a computer program.
The cloud storage similar data detection system based on the block-level semantic embedding can be operated in computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The cloud storage similar data detection system based on the block-level semantic embedding can be used for operating devices including, but not limited to, a processor and a memory.
Those skilled in the art will appreciate that the examples are merely examples of a block-level semantic-embedding-based cloud storage similar data detection system, and are not limiting of the block-level semantic-embedding-based cloud storage similar data detection system, and may include more or fewer components than examples, or may combine certain components, or different components, e.g., the block-level semantic-embedding-based cloud storage similar data detection system may further include an input-output device, a network access device, a bus, etc. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), field-Programmable Gate array (FPGA), other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components, or the like. The general processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor is a control center of the cloud storage similar data detection system based on the block-level semantic embedding, and connects various parts of the whole cloud storage similar data detection system operable system based on the block-level semantic embedding by using various interfaces and lines. The memory may be used to store the computer program and/or module, and the processor may implement various functions of the cloud storage similar data detection system based on block-level semantic embedding by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a memory program area and a memory data area. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart-Media-Card (SMC), secure-Digital (SD) Card, flash Card (Flash-Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalents, and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

Claims (8)

1. A cloud storage similar data detection method, in particular to a cloud storage similar data detection method based on block-level semantics, which is characterized by comprising the following steps:
model training stage, training steps are:
preprocessing training data to obtain a training data block;
extracting feature vectors of all training data blocks by using a MinHash algorithm to obtain a first vector without embedded semantics;
training a machine learning model based on the first vector to obtain a weight matrix between the first vector and the vector embedded with the context semantics and a trained model;
in the model prediction stage, the prediction steps are as follows:
processing the predicted data by using the trained model and the same processing method as the preprocessing training data to obtain a predicted data block;
extracting feature vectors of all predicted data blocks by using a MinHash algorithm to obtain vectors of the predicted data without embedded semantics;
matrix multiplication is carried out on vectors of the predicted data without embedded semantics and the weight matrix, so that vectors of the training data after the embedded semantics are obtained;
all vectors of training data after embedding semantics are constructed into a binary tree through an Annoy algorithm, each vector is a node of the binary tree, and other nodes closest to the node corresponding to the current data block are judged, so that the data block most similar to the current data block is found.
2. The cloud storage similarity data detection method of claim 1, wherein the step of extracting feature vectors of all training data blocks using a MinHash algorithm comprises:
taking a preset number of hash functions, scanning the content of the training data blocks, calculating to obtain a hash value corresponding to each hash function, and then summing and averaging the calculated hash values to obtain an initial characteristic value of the training data blocks;
and scanning the initial characteristic value by using a sliding window, taking the data information in the window as a sub-characteristic value every time the sliding window moves, generating a characteristic vector corresponding to the sub-characteristic value according to a mapping function between the sub-characteristic value and the characteristic vector, and finally summing and averaging the characteristic vectors corresponding to all the sub-characteristic values to be used as the characteristic vector of the data block.
3. The cloud storage similar data detection method as in claim 1, wherein preprocessing training data to obtain training data blocks comprises:
unifying the input training data types into bit streams;
and divides the bit stream into a number of training data blocks.
4. The cloud storage similarity data detection method of claim 1, wherein training a machine learning model based on the first vector to obtain a weight matrix between the first vector and a vector embedded with context semantics, specifically comprises the steps of:
and inputting a first vector corresponding to the context of the data block into an input layer of the machine learning model, taking the first vector corresponding to the data block as an output layer of the machine learning model, taking a difference value between the input layer and the output layer as a loss, continuously updating a weight matrix, and finally obtaining the weight matrix embedded with the context information.
5. The cloud storage similar data detection method as in claim 4, wherein said weight matrix comprises a weight matrix of an output layer and a weight matrix of an input layer.
6. The cloud storage similarity data detection method of claim 5, wherein matrix multiplying the vector of the predicted data without embedded semantics with the weight matrix to obtain the vector of the training data with embedded semantics further comprises:
and the weight matrix for matrix multiplication is the weight matrix of the output layer of the machine learning model.
7. The cloud storage similar data detection method as claimed in claim 1, wherein redundant parts between similar data blocks are also deleted by differential encoding after finding the most similar data blocks.
8. A cloud storage similar data detection system, in particular to a cloud storage similar data detection system based on block-level semantics, which comprises a memory and a processor, wherein a computer program is stored in the memory;
the processor, when executing the computer program, is configured to perform a cloud storage similar data detection method according to any one of claims 1-7.
CN202210070755.3A 2022-01-21 2022-01-21 Cloud storage similar data detection method and system Active CN114579046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210070755.3A CN114579046B (en) 2022-01-21 2022-01-21 Cloud storage similar data detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210070755.3A CN114579046B (en) 2022-01-21 2022-01-21 Cloud storage similar data detection method and system

Publications (2)

Publication Number Publication Date
CN114579046A CN114579046A (en) 2022-06-03
CN114579046B true CN114579046B (en) 2024-01-02

Family

ID=81771683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210070755.3A Active CN114579046B (en) 2022-01-21 2022-01-21 Cloud storage similar data detection method and system

Country Status (1)

Country Link
CN (1) CN114579046B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20080534A0 (en) * 2008-09-22 2008-09-22 Envault Corp Oy Safe and selectively contested file storage
CN102158557A (en) * 2011-04-12 2011-08-17 华中科技大学 Security strategy decomposition and verification system in cloud storage environment
CN105338027A (en) * 2014-07-30 2016-02-17 杭州海康威视***技术有限公司 Method, system and device for cloud storage of video data
CN106776370A (en) * 2016-12-05 2017-05-31 哈尔滨工业大学(威海) Cloud storage method and device based on the assessment of object relevance
EP3176717A2 (en) * 2015-12-02 2017-06-07 Panasonic Intellectual Property Management Co., Ltd. Control method, processing apparatus, and non-transitory computer-readable recording medium
CN108287816A (en) * 2017-01-10 2018-07-17 腾讯科技(深圳)有限公司 Point of interest on-line checking, Machine learning classifiers training method and device
CN110472045A (en) * 2019-07-11 2019-11-19 中山大学 A kind of short text falseness Question Classification prediction technique and device based on document insertion
CN111639197A (en) * 2020-05-28 2020-09-08 山东大学 Cross-modal multimedia data retrieval method and system with label embedded online hash
CN112287662A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Natural language processing method, device and equipment based on multiple machine learning models
CN112580507A (en) * 2020-12-18 2021-03-30 合肥高维数据技术有限公司 Deep learning text character detection method based on image moment correction

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20080534A0 (en) * 2008-09-22 2008-09-22 Envault Corp Oy Safe and selectively contested file storage
CN102158557A (en) * 2011-04-12 2011-08-17 华中科技大学 Security strategy decomposition and verification system in cloud storage environment
CN105338027A (en) * 2014-07-30 2016-02-17 杭州海康威视***技术有限公司 Method, system and device for cloud storage of video data
EP3176717A2 (en) * 2015-12-02 2017-06-07 Panasonic Intellectual Property Management Co., Ltd. Control method, processing apparatus, and non-transitory computer-readable recording medium
CN106776370A (en) * 2016-12-05 2017-05-31 哈尔滨工业大学(威海) Cloud storage method and device based on the assessment of object relevance
CN108287816A (en) * 2017-01-10 2018-07-17 腾讯科技(深圳)有限公司 Point of interest on-line checking, Machine learning classifiers training method and device
CN110472045A (en) * 2019-07-11 2019-11-19 中山大学 A kind of short text falseness Question Classification prediction technique and device based on document insertion
CN111639197A (en) * 2020-05-28 2020-09-08 山东大学 Cross-modal multimedia data retrieval method and system with label embedded online hash
CN112287662A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Natural language processing method, device and equipment based on multiple machine learning models
CN112580507A (en) * 2020-12-18 2021-03-30 合肥高维数据技术有限公司 Deep learning text character detection method based on image moment correction

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SURF与FLANN算法结合的图像匹配方法;周志伟;袁锋伟;张亢;吴智;;智能计算机与应用(06);全文 *
一种基于短文本相似度计算的知识子图融合方法;郑志蕴;吴建萍;李钝;刘允;米高扬;;小型微型计算机***(01);全文 *
一种融合动态预测的感知哈希目标跟踪算法;陈优良;肖钢;卞焕;胡敏;;测绘通报(02);全文 *
联合哈希特征和分类器学习的跨模态检索算法;刘昊鑫;吴小俊;庾骏;;模式识别与人工智能(02);全文 *

Also Published As

Publication number Publication date
CN114579046A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN111461637A (en) Resume screening method and device, computer equipment and storage medium
KR102432600B1 (en) Method and system for detecting duplicated document using vector quantization
CN112328909B (en) Information recommendation method and device, computer equipment and medium
CN111159413A (en) Log clustering method, device, equipment and storage medium
CN113689285B (en) Method, device, equipment and storage medium for detecting user characteristics
CN110825894A (en) Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium
CN110969172A (en) Text classification method and related equipment
CN114245896A (en) Vector query method and device, electronic equipment and storage medium
CN111340075B (en) Network data detection method and device for ICS
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
WO2023029350A1 (en) Click behavior prediction-based information pushing method and apparatus
CN109885831B (en) Keyword extraction method, device, equipment and computer readable storage medium
CN107562853A (en) A kind of method that streaming towards magnanimity internet text notebook data is clustered and showed
CN110390011B (en) Data classification method and device
CN113496123A (en) Rumor detection method, rumor detection device, electronic equipment and storage medium
CN114579046B (en) Cloud storage similar data detection method and system
CN116226681A (en) Text similarity judging method and device, computer equipment and storage medium
CN116703659A (en) Data processing method and device applied to engineering consultation and electronic equipment
CN116366603A (en) Method and device for determining active IPv6 address
CN113947185B (en) Task processing network generation method, task processing device, electronic equipment and storage medium
CN115292008A (en) Transaction processing method, device, equipment and medium for distributed system
CN114625315B (en) Cloud storage similar data detection method and system based on meta-semantic embedding
CN112860626A (en) Document sorting method and device and electronic equipment
CN113934842A (en) Text clustering method and device and readable storage medium
CN115686597A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant