CN115048539A - Social media data online retrieval method and system based on dynamic memory - Google Patents

Social media data online retrieval method and system based on dynamic memory Download PDF

Info

Publication number
CN115048539A
CN115048539A CN202210971339.0A CN202210971339A CN115048539A CN 115048539 A CN115048539 A CN 115048539A CN 202210971339 A CN202210971339 A CN 202210971339A CN 115048539 A CN115048539 A CN 115048539A
Authority
CN
China
Prior art keywords
hash
sample data
round
data
social media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210971339.0A
Other languages
Chinese (zh)
Other versions
CN115048539B (en
Inventor
罗昕
王娜
丁陈璐
许信顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210971339.0A priority Critical patent/CN115048539B/en
Publication of CN115048539A publication Critical patent/CN115048539A/en
Application granted granted Critical
Publication of CN115048539B publication Critical patent/CN115048539B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a social media data online retrieval method and system based on dynamic memory, and relates to the technical field of large-scale stream data retrieval, wherein the method comprises the following steps: acquiring sample data of a plurality of turns and corresponding user tags; starting from the first round, carrying out hash function learning on sample data of each round in sequence to obtain a hash code of the sample data, and storing the hash code in a database; receiving social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of sample data in a database to obtain a retrieval result. The method is suitable for the requirements of online scenes, pairwise similarity matrixes between new and old data labels in sample data of different rounds are used for guiding generation of refined pseudo labels, and a Hash loss function is determined according to the refined pseudo labels, so that the negative influence of user labels can be relieved, and the quality of the generated Hash codes is improved.

Description

Social media data online retrieval method and system based on dynamic memory
Technical Field
The invention belongs to the technical field of large-scale stream data retrieval, and particularly relates to a social media data online retrieval method and system based on dynamic memory.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art that is already known to a person of ordinary skill in the art.
In the past decades, social media data such as images, texts and videos have been growing explosively, and the demand for retrieving social media data has been increasing. The hash learning has become a popular approximate nearest neighbor technology by virtue of its advantages of fast retrieval speed, low storage consumption, etc., and it maps high-dimensional data into binary codes while maintaining the similarity of the data in the original space. In addition, the data is expressed in the form of binary codes, and the advantage of quick retrieval can be obtained, because the computer has high efficiency in processing pairwise comparison between binary codes, so that the retrieval speed can be fast.
Currently, hash learning can be divided into supervised learning, weakly supervised learning and unsupervised learning, and generation of hash codes is guided by using labels marked by experts, labels provided by users and unsupervised information respectively. Weakly supervised hash learning has attracted increasing attention because the user-provided labels are easily accessible, have diversity, and can provide additional information beyond visual features. However, the user-provided tags are not perfect compared to the clean tags marked by experts, such as tag errors, tag duplications, tag deletions, etc., which may affect the performance of the search model. In order to alleviate the negative effects of user tags, some methods have been proposed to alleviate the problem of tag imperfection by utilizing semantic information of the tags, etc. While these approaches achieve good performance, most of them are batch-based, not only increasing memory and computational cost with the arrival of streaming data, but also violating the natural attributes of streaming media generated by social media data collected in batches. Although some online weakly supervised hashing methods for streaming data have improved remarkably in recent years, they still cannot overcome the limitations of label loss and catastrophic forgetting of online scenes.
Disclosure of Invention
In order to solve the problems, the invention provides a social media data online retrieval method and system based on dynamic memory, which utilize pairwise similarity matrixes between new and old data labels in sample data of different rounds to construct refined pseudo labels, and determine a hash loss function according to the refined pseudo labels, so as to relieve the negative effects of user labels and improve the quality of generated hash codes.
In order to achieve the above object, the present invention mainly includes the following aspects:
in a first aspect, an embodiment of the present invention provides a method for social media data online retrieval based on dynamic memory, including:
acquiring sample data of a plurality of turns and corresponding user tags;
starting from the first round, carrying out hash function learning on sample data of each round in sequence to obtain a hash code of the sample data, and storing the hash code in a database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data before the t-th round; determining a hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the hash function by minimizing the hash loss function, and obtaining a hash code of the sample data of the t round;
and receiving social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of sample data in a database to obtain a retrieval result.
In one possible embodiment, the sample data includes text data, image data, and video data; after sample data of multiple rounds and corresponding user labels are obtained, before hash function learning is sequentially carried out on each round training sample, the method further comprises the following steps: and extracting the characteristics of the sample data, and carrying out one-hot coding on the user label to obtain a label representation.
In a possible implementation manner, a label matrix is determined according to the sample data of the t-th round and the label representation corresponding to the sample data before the t-th round; multiplying the transpose of the label matrix by the transpose of the label matrix to obtain a pair-wise similar matrix of the label; and carrying out standardization processing on the paired similar matrixes to obtain refined pseudo labels of the t round.
In one possible implementation, the method for determining the hash loss function includes:
determining a paired similarity matrix of the sample data in the t-th round according to a paradigm of Hash learning and the constructed refined pseudo labels, and constructing a first objective function for learning Hash codes of the sample data in the t-th round;
constructing a second objective function for learning the hash code of the sample data of the t-th round according to the pairwise similarity matrix between the representative point of the sample data of the t-th round in the memory and the sample data;
capturing the nonlinear characteristics of the sample data, and performing hash function learning by using linear regression to obtain a third target function for learning the hash code of the sample data of the t-th round;
and integrating the first objective function, the second objective function and the third objective function into a Hash loss function to obtain a final Hash loss function.
In a possible implementation manner, the distances between the refined pseudo label and each sample point in the user label are calculated, the obtained distances are sorted from small to large, a preset number of sample points arranged in the front are selected as representative points, in the hash learning process, a plurality of representative points are fixedly stored in the memory, and each round of newly selected representative points replaces the representative points in the preset part of the memory.
In one possible implementation, the sample data is processed by using the radial basis kernel function, and the nonlinear characteristics of the sample data are captured.
In a possible embodiment, an iterative optimization method is used to minimize the hash loss function, specifically: in each iteration process, only the set target variable is optimized, and other variables except the target variable in the Hash loss function are kept unchanged; and setting the partial derivative of the Hash loss function relative to the target function as zero, and solving to obtain an optimized target variable.
In a possible embodiment, the obtaining a search result by comparing the hash code of the social media data with the hash code of the sample data in the database includes: calculating the Hamming distance between the hash code of the social media data and the hash code of the sample data in the database, and outputting the sample data with preset quantity according to the Hamming distance.
In a second aspect, an embodiment of the present invention provides a social media data online retrieval system based on dynamic memory, including:
the data acquisition module is used for acquiring sample data of multiple rounds and corresponding user tags;
the hash function learning module is used for sequentially carrying out hash function learning on sample data of each round from the first round to obtain a hash code of the sample data, and storing the hash code into the database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data of the t-1 round; determining a hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the hash function by minimizing the hash loss function, and obtaining a hash code of the sample data of the t round;
and the retrieval module is used for receiving the social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of the sample data in the database to obtain a retrieval result.
In one possible implementation, the method further includes:
and the preprocessing module is used for extracting the characteristics of the sample data and carrying out unique hot coding on the user label to obtain label representation.
The above one or more technical solutions have the following beneficial effects:
(1) according to the method, paired similarity matrixes (namely label co-occurrence relation) between new and old data labels in sample data of different rounds are used for guiding generation of refined pseudo labels, and a Hash loss function is determined according to the refined pseudo labels, so that the negative influence of user labels can be relieved, and the quality of the generated Hash codes is improved.
(2) The invention provides a memory-based similarity learning strategy, samples with refined pseudo labels closest to original user labels are selected from old data and taken as representative points and stored in a memory, so that semantic relevance between new data and old data is maintained, and the problem of catastrophic forgetting of an online scene is effectively solved.
(3) The invention provides a method for minimizing the Hash loss function by adopting an iterative optimization mode, which can ensure that the learning efficiency meets the requirement of an online scene.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flowchart illustrating a social media data online retrieval method based on dynamic memory according to an embodiment of the present invention;
FIG. 2 is a block diagram of a social media data online retrieval method based on dynamic memory according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a social media data online retrieval system based on dynamic memory according to a second embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In order to solve the problems of label loss and catastrophic forgetting of an online scene in the existing online weak supervision hash method, the invention provides a social media data online retrieval method and system based on dynamic memory, which mainly focus on the following three aspects: 1) aiming at social media data such as images, texts and videos, the value of a user tag is fully utilized, the negative influence of the user tag is reduced, and the quality of Hash learning is improved; 2) how to solve the problem of catastrophic forgetting of streaming data under an online scene, so that the model further improves the online learning quality; 3) how to make the time efficiency of the method meet the requirements of an online scenario, i.e. the time complexity is as low as possible, so that the method can be extended to large-scale datasets.
Example one
The embodiment provides an online social media data retrieval method based on dynamic memory, as shown in fig. 1, including the following steps:
s101: acquiring sample data of a plurality of turns and corresponding user tags;
s102: starting from the first round, carrying out hash function learning on sample data of each round in sequence to obtain a hash code of the sample data, and storing the hash code in a database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data before the t-th round; determining a Hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the Hash function by minimizing the Hash loss function, and obtaining a Hash code of the sample data of the t round;
s103: and receiving social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of sample data in a database to obtain a retrieval result.
As an optional implementation, the sample data includes text data, image data, and video data; after sample data of multiple rounds and corresponding user labels are obtained, before hash function learning is sequentially carried out on each round training sample, the method further comprises the following steps: and extracting the characteristics of the sample data, and carrying out one-hot (one-hot) coding on the user label to obtain a label representation.
In specific implementation, the sample data includes text data, image data and video data, and features of the sample data are extracted for different types of data respectively. Taking image data as an example, image feature extraction is performed by using a VGG-F depth network, and 4096-dimensional features output at the fully-connected layer fc7 are taken as visual features X of an image. And for the user label, obtaining a label matrix Y by using one-hot coding.
As an optional implementation manner, determining a tag matrix according to the sample data of the tth round and the tag representation corresponding to the sample data before the tth round; multiplying the transpose of the label matrix by the transpose of the label matrix to obtain a pair-wise similar matrix of the label; and carrying out standardization processing on the paired similar matrixes to obtain refined pseudo labels of the t round.
In a specific implementation, a pairwise similarity matrix a of labels is constructed by multiplying the transpose of the label matrix Y with itself. Here, a is also the co-occurrence matrix of the tags, and the higher the frequency with which two tags appear together, the higher their similarity.
It is noted that the matrix a is not constant because new data is constantly present and the overall similarity between labels may change accordingly. In particular, we consider the tag matrices for old and new data and define the tag similarity matrix at t rounds
Figure DEST_PATH_IMAGE001
Comprises the following steps:
Figure 777903DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE003
a tag matrix representing one-hot codes prior to the t-th round;
Figure 586590DEST_PATH_IMAGE004
a tag matrix representing one-hot codes in the current t-th round; herein, the
Figure DEST_PATH_IMAGE005
Can be written as
Figure 130835DEST_PATH_IMAGE006
Thus, therefore, it is
Figure DEST_PATH_IMAGE007
The update may be calculated as follows:
Figure 265144DEST_PATH_IMAGE008
therefore, the temperature of the molten metal is controlled,
Figure DEST_PATH_IMAGE009
may only calculate the second term per round of updates, while the first term has been obtained in the previous round. For convenience of calculation, pair
Figure 629261DEST_PATH_IMAGE010
Standardized by
Figure DEST_PATH_IMAGE011
To represent
Figure 241639DEST_PATH_IMAGE001
Normalized to [0,1 ]]Similarity matrix of interval, refined pseudo label matrix of t round defined
Figure 905970DEST_PATH_IMAGE012
Comprises the following steps:
Figure DEST_PATH_IMAGE013
wherein,
Figure 7918DEST_PATH_IMAGE014
indicating the balance parameters. Here, ,
Figure DEST_PATH_IMAGE015
is a real-valued matrix, obtained by correlating
Figure 390489DEST_PATH_IMAGE016
The partial derivative of (2) is set to zero, so as to obtain
Figure 681924DEST_PATH_IMAGE012
Comprises the following steps:
Figure DEST_PATH_IMAGE017
as an optional implementation, the method for determining the hash loss function includes:
determining a paired similarity matrix of the sample data in the t-th round according to a paradigm of Hash learning and the constructed refined pseudo labels, and constructing a first objective function for learning Hash codes of the sample data in the t-th round;
constructing a second objective function for learning the hash code of the sample data of the t-th round according to the pairwise similarity between the representative point of the sample data of the t-th round in the memory and the sample data;
capturing the nonlinear characteristics of the sample data, and performing hash function learning by using linear regression to obtain a third target function for learning the hash code of the sample data of the t-th round;
and integrating the first objective function, the second objective function and the third objective function into a Hash loss function to obtain a final Hash loss function.
In a specific implementation, as shown in fig. 2, the online hash learning stage mainly includes the following steps:
learning a hash code based on the similarity of the refined pseudo labels.
Paradigm following hash learning
Figure 200761DEST_PATH_IMAGE018
Wherein the pairwise similarity matrix S nn The construction of (A) is as follows:
Figure DEST_PATH_IMAGE019
wherein,
Figure 208031DEST_PATH_IMAGE020
a pairwise similarity matrix representing new data at the tth round,
Figure DEST_PATH_IMAGE021
representing by refining the pseudo-label matrix
Figure 546740DEST_PATH_IMAGE012
The obtained mixture is mixed with a solvent to obtain a mixture,
Figure 625554DEST_PATH_IMAGE022
j denotes the jth column of the matrix,
Figure DEST_PATH_IMAGE023
representing the modulus of the vector. Then, a first objective function for learning hash codes of the sample data of the t-th round may be written as:
Figure 998898DEST_PATH_IMAGE024
wherein,
Figure DEST_PATH_IMAGE025
a hyper-parameter representing the term of balance,
Figure 911490DEST_PATH_IMAGE026
a 2-norm of the matrix is represented,
Figure DEST_PATH_IMAGE027
representing the hash code of the current t-th round.
And ② learning based on the similarity of the memory.
To solve the catastrophic forgetting problem, the present embodiment proposes a new strategy, i.e. memory-based similarity learning. Getting refined pseudo label
Figure 471916DEST_PATH_IMAGE028
With original user tags
Figure DEST_PATH_IMAGE029
And taking the closest sample point as a representative point, specifically, calculating the distance between the refined pseudo label and each sample point in the user label, sequencing the obtained distances from small to large, and selecting the sample points arranged in the front in a preset number as the representative points. In the process of Hash learning, the memory fixedly stores n q And (4) counting the number of representative points, and updating the content of the memory in each round, namely replacing the representative points in the partial memory with the representative points newly selected in each round.
Specifically, when the first round of data occurs, since no old data exists, only the current data block needs to be used to guide hash learning, and thus the memory of the first round is empty. For other data rounds, under the condition of not losing generality, the process of selecting the representative point under the t-th round is as follows: after the t-1 st round of training is finished,
Figure 471929DEST_PATH_IMAGE030
from n to 1 An assistant
Figure DEST_PATH_IMAGE031
Of randomly selected points and n 2 A representative point selected from the t-1 th round, wherein n 1 And n 2 Is a hyperparameter, and n 1 +n 2 =n q . After each round of training, the data is continuously updated
Figure 434200DEST_PATH_IMAGE032
The similarity between the new data and the old data can be always maintained.
When a new round of data is subjected to hash learning, information stored in a memory is acquired first, and semantic association between new data and old data is maintained by using the information. In particular, the following similarities are defined:
Figure DEST_PATH_IMAGE033
wherein,
Figure 252114DEST_PATH_IMAGE034
representing the pairwise similarity matrix between the data points in memory and the new data for the tth round,
Figure 830994DEST_PATH_IMAGE030
a refined pseudo-label matrix representing representative points selected from the old data after normalization. Thus, the second objective function corresponding to the similarity between the representative point in memory and the new data sample point can be expressed as:
Figure DEST_PATH_IMAGE035
wherein,
Figure 126978DEST_PATH_IMAGE036
which is indicative of a balance-out-of-parameter,
Figure DEST_PATH_IMAGE037
the hash code representing the corresponding point of the representative point in the memory is known information stored in the memory.
And thirdly, learning a hash function.
And processing the sample data by using a Radial Basis Function (RBF) to capture the nonlinear characteristics of the sample data. In particular, visual feature X is kernel-function
Figure 209334DEST_PATH_IMAGE038
Processing to capture non-linear features, i.e.
Figure DEST_PATH_IMAGE039
Wherein,
Figure 463729DEST_PATH_IMAGE040
representing anchor points randomly selected in the first round of training data, m representing the number of anchor points selected,
Figure DEST_PATH_IMAGE041
representing the kernel width.
Using classical linear regression for hash function learning, the associated third objective function can be written as:
Figure 998747DEST_PATH_IMAGE042
Figure DEST_PATH_IMAGE043
wherein mu represents a balance hyperparameter,
Figure 364000DEST_PATH_IMAGE044
representing a hyper-parameter that prevents over-fitting,
Figure DEST_PATH_IMAGE045
representing a hash function used to generate a hash code for a test sample.
Integrating the target functions of the three parts into a Hash loss function to obtain a final Hash loss function:
Figure 35284DEST_PATH_IMAGE046
wherein,
Figure DEST_PATH_IMAGE047
representing approximations to discrete hash codes
Figure 195001DEST_PATH_IMAGE048
By introducing a real-valued matrix
Figure DEST_PATH_IMAGE049
The solution of the hash code can be simplified. In addition, uncorrelated constraints (
Figure 748473DEST_PATH_IMAGE050
) And bit balance constraint: (
Figure DEST_PATH_IMAGE051
) The hash code can be made to have more discrimination performance.
As an optional implementation manner, an iterative optimization manner is adopted to minimize the hash loss function, specifically: in each iteration process, only the set target variable is optimized, and other variables except the target variable in the Hash loss function are kept unchanged; and setting the partial derivative of the Hash loss function relative to the target function as zero, and solving to obtain an optimized target variable.
The specific optimization strategy is as follows:
the first step is as follows: fixed variable
Figure 386259DEST_PATH_IMAGE052
Updating variables
Figure DEST_PATH_IMAGE053
. Relating an objective function to
Figure 443208DEST_PATH_IMAGE053
The partial derivative of (a) is set to zero,
Figure 773826DEST_PATH_IMAGE054
the update of (1) is:
Figure DEST_PATH_IMAGE055
wherein,
Figure 283436DEST_PATH_IMAGE056
Figure DEST_PATH_IMAGE057
because of
Figure 990492DEST_PATH_IMAGE058
Figure DEST_PATH_IMAGE059
Therefore, the temperature of the molten steel is controlled,
Figure 292161DEST_PATH_IMAGE060
Figure DEST_PATH_IMAGE061
thus by storing an intermediate variable C 1 And C 2 And only the items containing new data are calculated in each round of updating, and the items containing old data do not need to be calculated, so that the learning rate is increased.
The second step is that: fixed variable
Figure 528101DEST_PATH_IMAGE062
Updating variables
Figure DEST_PATH_IMAGE063
. When other variables are fixed, the objective function can be rewritten as:
Figure 525007DEST_PATH_IMAGE064
the optimization was simplified by extending the Frobenius norm, with the following results:
Figure DEST_PATH_IMAGE065
wherein,
Figure 35754DEST_PATH_IMAGE066
. By passing
Figure DEST_PATH_IMAGE067
To reduce the time complexity and, therefore,
Figure 536137DEST_PATH_IMAGE068
this yields a closed-form solution:
Figure DEST_PATH_IMAGE069
the third step: fixed variable
Figure 208557DEST_PATH_IMAGE070
Updating variables
Figure DEST_PATH_IMAGE071
. After the other variables are fixed, the process is completed,
Figure 958339DEST_PATH_IMAGE071
the optimal solution of (c) can be written as:
Figure 741618DEST_PATH_IMAGE072
wherein,
Figure DEST_PATH_IMAGE073
the present embodiment is achieved by
Figure 893245DEST_PATH_IMAGE074
To reduce the time complexity and, therefore,
Figure DEST_PATH_IMAGE075
the optimization problem can be optimized as follows:
first, to
Figure 736567DEST_PATH_IMAGE076
Performing eigenvalue decomposition, wherein the solution is as follows:
Figure DEST_PATH_IMAGE077
wherein,
Figure 442486DEST_PATH_IMAGE078
Figure DEST_PATH_IMAGE079
the square root of the non-zero eigenvalues,
Figure 295036DEST_PATH_IMAGE080
respectively, are eigenvectors corresponding to non-zero and zero eigenvalues. Subsequent calculation
Figure DEST_PATH_IMAGE081
Figure 770010DEST_PATH_IMAGE082
The number of non-zero eigenvalues.
Figure DEST_PATH_IMAGE083
Initially set to a random matrix and then subjected to Gram-Schmidt orthogonalization. Finally obtaining
Figure 784234DEST_PATH_IMAGE084
The solution of (a):
Figure DEST_PATH_IMAGE085
wherein,
Figure 508607DEST_PATH_IMAGE086
representing the square root of the number of samples of the current t-th round.
In the retrieval process, in the t round, when the social media data to be retrieved arrives, the hash code of the social media data is inquired
Figure DEST_PATH_IMAGE087
Can be calculated by the following formula:
Figure 368110DEST_PATH_IMAGE088
and calculating the hamming distance between the hash code of the social media data and the hash code of the sample data in the database by using the hash code, measuring the similarity between the two, and returning the sample data with preset quantity according to the hamming distance. For example, the sample data in the database is sorted according to the hamming distance, and a preset number of sample data with a shorter hamming distance are returned according to the requirement.
Example two
The embodiment of the invention also provides a social media data online retrieval system based on dynamic memory, which comprises:
the data acquisition module is used for acquiring sample data of multiple rounds and corresponding user tags;
the hash function learning module is used for sequentially carrying out hash function learning on sample data of each round from the first round to obtain a hash code of the sample data, and storing the hash code into the database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data before the t-th round; determining a Hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the Hash function by minimizing the Hash loss function, and obtaining a Hash code of the sample data of the t round;
and the retrieval module is used for receiving the social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of the sample data in the database to obtain a retrieval result.
The social media data online retrieval system based on dynamic memory provided in this embodiment is used to implement the social media data online retrieval method based on dynamic memory, so the specific implementation manner of the social media data online retrieval system based on dynamic memory can be found in the foregoing embodiment section of the social media data online retrieval method based on dynamic memory, and is not described herein again.
In a specific implementation, as shown in fig. 3, the hash function learning module mainly includes two parts: the system comprises a refined pseudo label matrix learning module and an online hash learning module. In a refined pseudo tag learning module, in order to reduce the negative influence of the user tags, a refined pseudo tag matrix is constructed based on the pairwise similarity matrix between the tags. The improved pseudo label matrix can better reveal the association between the samples and the labels and guide the learning of the hash code. In an online hash learning module, in order to solve the problem of catastrophic forgetting, a memory-based similarity learning strategy is proposed to learn hash codes, specifically, in each training round, some most typical data points are selected from old data to update a memory, then the similarity between the data in the memory and new data is calculated to maintain the correlation between the new data and the old data, and the new data and the old data are embedded into an objective function, so that the hash codes corresponding to each instance are obtained. In addition, the embodiment provides an efficient discrete online optimization algorithm, and the time complexity of the algorithm is linearly related to the size of new data, so that the model is easily expanded to a large-scale data set.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A social media data online retrieval method based on dynamic memory is characterized by comprising the following steps:
acquiring sample data of a plurality of turns and corresponding user tags;
starting from the first round, carrying out hash function learning on sample data of each round in sequence to obtain a hash code of the sample data, and storing the hash code in a database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data before the t-th round; determining a Hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the Hash function by minimizing the Hash loss function, and obtaining a Hash code of the sample data of the t round;
and receiving social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of sample data in a database to obtain a retrieval result.
2. The dynamic memory-based online retrieval method of social media data as claimed in claim 1, wherein the sample data comprises text data, image data and video data; after sample data of multiple rounds and corresponding user labels are obtained, before hash function learning is sequentially carried out on each round training sample, the method further comprises the following steps: and extracting the characteristics of the sample data, and carrying out one-hot coding on the user label to obtain a label representation.
3. The social media data online retrieval method based on dynamic memory of claim 2, wherein a tag matrix is determined according to tag representations corresponding to the sample data of the tth round and the sample data before the tth round; multiplying the transpose of the label matrix by the transpose of the label matrix to obtain a pair-wise similar matrix of the label; and carrying out standardization processing on the paired similar matrixes to obtain refined pseudo labels of the t round.
4. The method for online retrieval of social media data based on dynamic memory as claimed in claim 1, wherein the method for determining the hash loss function comprises:
determining a paired similarity matrix of the sample data in the t-th round according to a paradigm of Hash learning and the constructed refined pseudo labels, and constructing a first objective function for learning Hash codes of the sample data in the t-th round;
constructing a second objective function for learning the hash code of the sample data of the t-th round according to the pairwise similarity between the representative point of the sample data of the t-th round in the memory and the sample data;
capturing the nonlinear characteristics of the sample data, and performing hash function learning by using linear regression to obtain a third target function for learning the hash code of the sample data of the t-th round;
and integrating the first objective function, the second objective function and the third objective function into a Hash loss function to obtain a final Hash loss function.
5. The dynamic memory-based online social media data retrieval method as claimed in claim 4, wherein the distances between the refined pseudo tags and the sample points in the user tags are calculated, the obtained distances are sorted in the order from small to large, the sample points in the preset number arranged at the front are selected as the representative points, in the hash learning process, the memory stores the representative points in the preset number fixedly, and the representative points in the preset part of the memory are replaced by the representative points selected newly in each round.
6. The method of claim 4, wherein the sample data is processed using a radial basis function to capture non-linear features of the sample data.
7. The social media data online retrieval method based on dynamic memory as claimed in claim 1, wherein an iterative optimization manner is adopted to minimize the hash loss function, specifically: in each iteration process, only the set target variable is optimized, and other variables except the target variable in the Hash loss function are kept unchanged; and setting the partial derivative of the Hash loss function relative to the target function as zero, and solving to obtain an optimized target variable.
8. The online social media data searching method based on dynamic memory as claimed in claim 1, wherein the obtaining of the search result by comparing the hash code of the social media data with the hash code of the sample data in the database comprises: and calculating the Hamming distance between the hash code of the social media data and the hash code of the sample data in the database, and returning the sample data with preset quantity according to the Hamming distance.
9. A social media data online retrieval system based on dynamic memory, comprising:
the data acquisition module is used for acquiring sample data of multiple rounds and corresponding user tags;
the hash function learning module is used for sequentially carrying out hash function learning on sample data of each round from the first round to obtain a hash code of the sample data, and storing the hash code into the database; aiming at the sample data of the t-th round, constructing refined pseudo labels of the sample data of the t-th round according to pairwise similarity matrixes between the sample data of the t-th round and user labels corresponding to the sample data before the t-th round; determining a Hash loss function according to the constructed refined pseudo label, optimizing relevant parameters of the Hash function by minimizing the Hash loss function, and obtaining a Hash code of the sample data of the t round;
and the retrieval module is used for receiving the social media data to be retrieved, mapping according to the optimized hash function to obtain a corresponding hash code, and comparing the hash code of the social media data with the hash code of the sample data in the database to obtain a retrieval result.
10. The social media data online retrieval system based on dynamic memory of claim 9, further comprising:
and the preprocessing module is used for extracting the characteristics of the sample data and carrying out one-hot coding on the user label to obtain a label representation.
CN202210971339.0A 2022-08-15 2022-08-15 Social media data online retrieval method and system based on dynamic memory Active CN115048539B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210971339.0A CN115048539B (en) 2022-08-15 2022-08-15 Social media data online retrieval method and system based on dynamic memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210971339.0A CN115048539B (en) 2022-08-15 2022-08-15 Social media data online retrieval method and system based on dynamic memory

Publications (2)

Publication Number Publication Date
CN115048539A true CN115048539A (en) 2022-09-13
CN115048539B CN115048539B (en) 2022-11-15

Family

ID=83168106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210971339.0A Active CN115048539B (en) 2022-08-15 2022-08-15 Social media data online retrieval method and system based on dynamic memory

Country Status (1)

Country Link
CN (1) CN115048539B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329118A (en) * 2022-10-14 2022-11-11 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN116089731A (en) * 2023-04-10 2023-05-09 山东大学 Online hash retrieval method and system for relieving catastrophic forgetting

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348579A (en) * 2019-05-28 2019-10-18 北京理工大学 A kind of domain-adaptive migration feature method and system
CN111090765A (en) * 2019-11-25 2020-05-01 山东师范大学 Social image retrieval method and system based on missing multi-modal hash
CN113326287A (en) * 2021-08-04 2021-08-31 山东大学 Online cross-modal retrieval method and system using three-step strategy
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN114117153A (en) * 2022-01-25 2022-03-01 山东建筑大学 Online cross-modal retrieval method and system based on similarity relearning
CN114329109A (en) * 2022-03-15 2022-04-12 山东建筑大学 Multimodal retrieval method and system based on weakly supervised Hash learning
WO2022104540A1 (en) * 2020-11-17 2022-05-27 深圳大学 Cross-modal hash retrieval method, terminal device, and storage medium
CN114898098A (en) * 2022-06-27 2022-08-12 北京航空航天大学 Brain tissue image segmentation method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348579A (en) * 2019-05-28 2019-10-18 北京理工大学 A kind of domain-adaptive migration feature method and system
CN111090765A (en) * 2019-11-25 2020-05-01 山东师范大学 Social image retrieval method and system based on missing multi-modal hash
WO2022104540A1 (en) * 2020-11-17 2022-05-27 深圳大学 Cross-modal hash retrieval method, terminal device, and storage medium
CN113326287A (en) * 2021-08-04 2021-08-31 山东大学 Online cross-modal retrieval method and system using three-step strategy
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN114117153A (en) * 2022-01-25 2022-03-01 山东建筑大学 Online cross-modal retrieval method and system based on similarity relearning
CN114329109A (en) * 2022-03-15 2022-04-12 山东建筑大学 Multimodal retrieval method and system based on weakly supervised Hash learning
CN114898098A (en) * 2022-06-27 2022-08-12 北京航空航天大学 Brain tissue image segmentation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINYU LIU等: "ARBITRARY-SHAPED SCENE TEXT DETECTION WITH SCORING MASK QUALITY", 《2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329118A (en) * 2022-10-14 2022-11-11 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN115329118B (en) * 2022-10-14 2023-02-28 山东省凯麟环保设备股份有限公司 Image similarity retrieval method and system for garbage image
CN116089731A (en) * 2023-04-10 2023-05-09 山东大学 Online hash retrieval method and system for relieving catastrophic forgetting

Also Published As

Publication number Publication date
CN115048539B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
Xie et al. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval
CN115048539B (en) Social media data online retrieval method and system based on dynamic memory
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN107122411B (en) Collaborative filtering recommendation method based on discrete multi-view Hash
CN110598022B (en) Image retrieval system and method based on robust deep hash network
Song et al. Deep and fast: Deep learning hashing with semi-supervised graph construction
CN112527993A (en) Cross-media hierarchical deep video question-answer reasoning framework
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code
CN113656700A (en) Hash retrieval method based on multi-similarity consistent matrix decomposition
Zeng et al. Pyramid hybrid pooling quantization for efficient fine-grained image retrieval
CN115410199A (en) Image content retrieval method, device, equipment and storage medium
CN106570196B (en) Video program searching method and device
CN114817581A (en) Cross-modal Hash retrieval method based on fusion attention mechanism and DenseNet network
CN115329120A (en) Weak label Hash image retrieval framework with knowledge graph embedded attention mechanism
CN117216668B (en) Data classification hierarchical processing method and system based on machine learning
CN108647295B (en) Image labeling method based on depth collaborative hash
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN111914108A (en) Discrete supervision cross-modal Hash retrieval method based on semantic preservation
CN115292533B (en) Cross-modal pedestrian retrieval method driven by visual positioning
Wang et al. A convolutional neural network image classification based on extreme learning machine
Lin et al. A probabilistic contrastive framework for semi-supervised learning
CN114610941A (en) Cultural relic image retrieval system based on comparison learning
CN110704575B (en) Dynamic self-adaptive binary hierarchical vocabulary tree image retrieval method
Huang et al. Unsupervised fusion feature matching for data bias in uncertainty active learning
Kang et al. Online discriminative cross-modal hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant