CN116578602A - Time sequence ordering method and device - Google Patents

Time sequence ordering method and device Download PDF

Info

Publication number
CN116578602A
CN116578602A CN202310717116.6A CN202310717116A CN116578602A CN 116578602 A CN116578602 A CN 116578602A CN 202310717116 A CN202310717116 A CN 202310717116A CN 116578602 A CN116578602 A CN 116578602A
Authority
CN
China
Prior art keywords
data
attribute
time
determining
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310717116.6A
Other languages
Chinese (zh)
Other versions
CN116578602B (en
Inventor
王尧舒
谢珉
樊文飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Computing Sciences
Original Assignee
Shenzhen Institute of Computing Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Computing Sciences filed Critical Shenzhen Institute of Computing Sciences
Priority to CN202310717116.6A priority Critical patent/CN116578602B/en
Publication of CN116578602A publication Critical patent/CN116578602A/en
Application granted granted Critical
Publication of CN116578602B publication Critical patent/CN116578602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a time sequence ordering method and a time sequence ordering device, and in the embodiment of the application, compared with the situation that the time sequence ordering effect is poor in the prior art, the application provides a solution of context-aware vector associated data, a coding mechanism for ordering vectors in time sequence and self-adaptive interval ordering, which comprises the following specific steps: determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value; and time-based ordering the data in the data set according to the time code. The technical problem of poor time-effect sorting effect is solved by the context-aware vector representation method, and the aim of carrying out vectorization representation on the target attribute and integrating the context information of the related attribute is achieved, so that the time-effect of the target attribute can be accurately judged.

Description

Time sequence ordering method and device
Technical Field
The application relates to the field of data identification, in particular to a time sequence ordering method and device.
Background
Data in real life is constantly changing. According to the United kingdom royal postal report, 9590 users move, 1496 users wedding, 810 users divorce, 2011 retires and 1500 users die every day in the United kingdom. It is estimated that inaccurate customer data can cause companies to lose 6% of their annual revenue. When data in the search engine passes, a restaurant search may return a restaurant that was closed for 3 years. Equipment servicing delays and outages may be caused when the data of the infrastructure asset condition is old. Furthermore, data driven decisions may be worse if they are based on outdated data than decisions made without data. Especially in critical industries like healthcare, retail or financial services, we cannot make a correct decision at all from yesterday's data. Unfortunately, however, 82% of corporate decisions are based on outdated information. These issues emphasize the necessity of determining the timeliness of the data. In other words, it is necessary to determine whether the data is old or new.
The existing model can be used to solve the problem of time-efficient ordering of data.
However, since some unique features of timing are not well considered and optimized in existing models, the immediate use of existing models to time-scale the data is poor.
Disclosure of Invention
In view of the foregoing, the present application has been developed to provide a method and apparatus for time sequencing that overcomes, or at least partially solves, the problems, including:
a method of time sequential ordering for time-efficient ordering of data, comprising:
acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data;
determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value;
and time-based ordering the data in the data set according to the time code.
Preferably, the step of determining the associated data attribute value of the target data from the dataset comprises:
determining front data of target data, rear data of the target data and the target data according to the data set;
and determining the associated data attribute value of the target data according to the front data of the target data, the rear data of the target data, the target data and the data set.
Preferably, the step of determining the time code of the target data according to the associated data attribute value includes:
determining an attribute vector of each target data according to the associated data attribute value;
and determining the time code of each target data according to the attribute vector of each target data.
Preferably, the step of determining the attribute vector of each target data according to the associated data attribute value includes:
determining an associated data sequence according to the associated data attribute value;
and determining an attribute vector of each target data according to the associated data sequence.
Preferably, the step of determining the time code of each target data according to the attribute vector of each target data includes:
determining a numerical code according to the attribute vector of each target data;
determining attribute codes according to the attribute vector of each target data;
and determining the time code of each target data according to the attribute codes and the numerical codes.
Preferably, the step of determining the time code of each target data according to the attribute code and the numerical code includes:
and obtaining the time code of each target data with the adaptive interval through a loss function according to the attribute code and the numerical code.
Preferably, the step of time-ordering the data in the dataset according to the temporal coding comprises:
and time-sequentially sorting the data in the data set according to the time code.
The application also comprises a time sequence ordering device which is used for time-efficiency ordering of data, comprising:
the acquisition module is used for acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data;
the time coding module is used for determining an associated data attribute value of the target data according to the data set and determining time coding of the target data according to the associated data attribute value;
and the timeliness ordering module is used for timeliness ordering the data in the data set according to the time code.
The application also comprises a computer electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements the steps of the time sequencing method.
To implement a computer readable storage medium, a computer program is stored on the computer readable storage medium, which when executed by a processor implements the steps of the time sequencing method.
The application has the following advantages:
in the embodiment of the application, compared with the 'poor effect of time-based ordering' in the prior art, the application provides a solution of 'associated data of context-aware vectors, coding mechanism for ordering the vectors in time sequence and adaptive interval ordering', which is specifically as follows: acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data; determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value; and time-based ordering the data in the data set according to the time code. The technical problem of poor time-effect sorting effect is solved by the context-aware vector representation method, and the aim of carrying out vectorization representation on the target attribute and integrating the context information of the related attribute is achieved, so that the time-effect of the target attribute can be accurately judged.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for timing sequencing according to an embodiment of the present application;
FIG. 2 is a block diagram of a timing ordering method according to an embodiment of the present application;
FIG. 3 is a diagram of a data set of a time sequence ordering method according to an embodiment of the present application;
FIG. 4 is a block diagram of a timing sequencing device according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a computer device according to an embodiment of the present application;
12. a computer device; 14. an external device; 16. a processing unit; 18. a bus; 20. a network adapter; 22. an I/O interface; 24. a display; 28. a memory; 30. a random access memory; 32. a cache memory; 34. a storage system; 40. program/utility; 42. program modules.
Detailed Description
In order that the manner in which the above recited objects, features and advantages of the present application are obtained will become more readily apparent, a more particular description of the application briefly described above will be rendered by reference to the appended drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The inventors found by analyzing the prior art that: as shown in fig. 3, for a picture of a dataset, a set of records belonging to the same entity is considered. Their attribute values may be outdated and inaccurate. And only a partially reliable timestamp may be available. In this case, we can determine how old and new the data attribute is, that is, assume that two records t belonging to the same entity are given 1 And t 2 We need to determine t 1 A attribute value ratio t of (2) 2 The A attribute value of (2) is to be updated, denoted t 2A t 1
Consider the client record t shown in FIG. 3 1 -t 6 Has been identified to mean the same person, mary. Each record records some attribute values of her marital status, work, number of children, SZ (shoe size). Where some of the recorded attributes have been outdated. For example, her work, address, and surname are exchanged 4 times, 5 times, and 2 times, respectively. But only certain attribute values may have a reliable timestamp, e.g., t 5 [ work ]]And t 6 [ work ]]May be 2016 and 2019, respectively, indicating that these attribute values are up-to-date at that time. However, without a complete timestamp, we have difficulty knowing whether t is 2LN t 6 I.e. t 2 Whether the surname value of t is 6 In addition to this, the last name value of Mary is required to be sorted.
The prior art typically learns a ranking model so that we can rank the data according to learned relevance, preference or importance. In our problem we order according to timeliness. Existing methods may employ a pairwise ordering approach. This is because the semantics of the pairwise ordering and the semantics of the timing are identical, they can both determine that a given pair of records t belonging to the same entity 1 And t 2 ,t 1 Whether the A attribute value of (2) is greater than t 2 The a attribute value of (c) is to be new. In addition, the timing is transitive, i.e. if t 2A t 1 And t 3A t 2 Then weCan deduce t 3A t 1 . Thus, pairwise ordering can help us obtain the total order of all attribute values, resulting in the most current value on each attribute.
It is naturally desirable to use existing models to solve the problem of time-efficient ordering of data. However, since some unique features of timing are not well considered and optimized in existing models, the effect of time-efficient ordering of data directly with existing models is poor, specifically for three reasons:
attribute relevance. Since in judging the aging order of data, we often need to refer to other attributes to determine the aging order of a given attribute. Furthermore, the attribute values may change back and forth due to the data. For example, marital status may change from "wedding" to "divorce" and back from "divorce" to "wedding". Therefore, it is difficult to determine the latest value on a single attribute from information on that attribute.
The limitations of the embedded model. To determine the timeliness of data, we need to take special care of lexically different but semantically similar attribute values. For example, in a person's state, "read" and "exact" have similar meanings, as do "married" and "married". Although existing embedding models (e.g., ELMo or Bert) are widely employed for abstracting semantic information, they cannot be used directly in timeliness issues because they are not trained on chronologically ordered data.
And (5) adapting the interval. Existing ordering strategies do not take into account the timeliness characteristics of data attributes in real life. For example, the status of everyone typically spans a longer time from "birth" to "engagement" than from "engagement" to "wedding". In most existing strategies, ordering is done in a fixed interval fashion. What is needed, however, is an adaptive spacing method that can embody the timeliness characteristics of the data, combining the ranked results with their real-life behavior to demonstrate the validity of the ranked results.
It should be noted that, in any embodiment of the present application, the method is used for time-based ordering of data.
Referring to fig. 1, a flowchart illustrating steps of a method for sequencing time sequences according to an embodiment of the present application specifically includes the following steps:
s110, acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data;
s120, determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value;
s130, time-based sorting is carried out on the data in the data set according to the time codes.
In the embodiment of the application, compared with the 'poor effect of time-based ordering' in the prior art, the application provides a solution of 'associated data of context-aware vectors, coding mechanism for ordering the vectors in time sequence and adaptive interval ordering', which is specifically as follows: acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data; determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value; and time-based ordering the data in the data set according to the time code. The technical problem of poor time-effect sorting effect is solved by the context-aware vector representation method, and the aim of carrying out vectorization representation on the target attribute and integrating the context information of the related attribute is achieved, so that the time-effect of the target attribute can be accurately judged.
Next, the timing sorting method in the present exemplary embodiment will be further described.
As described in the above step S110, a data set is acquired, where the data set includes data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data.
In one embodiment of the present application, the step S110 of "acquiring a data set, where the data set includes data and attribute values corresponding to the data one to one, may be further described in conjunction with the following description; wherein the data set comprises a specific procedure of at least two sets of data ".
In a specific embodiment, as shown in FIG. 3, it is assumed that two records t belonging to the same entity are given 1 And t 2 We need to determine t 1 A attribute value ratio t of (2) 2 The A attribute value of (2) is to be updated, denoted t 2A t 1
As described in step S120, an associated data attribute value of the target data is determined according to the data set, and a time code of the target data is determined according to the associated data attribute value.
In one embodiment of the present application, the specific process of "determining the associated data attribute value of the target data according to the data set and determining the time code of the target data according to the associated data attribute value" in step S120 may be further described in conjunction with the following description.
Determining front data of target data, rear data of the target data and the target data according to the data set as follows; and determining the associated data attribute value of the target data according to the front data of the target data, the rear data of the target data, the target data and the data set.
In a specific embodiment, the pre-data is the above data and the post-data is the below data.
Determining an attribute vector of each target data according to the associated data attribute value as follows; and determining the time code of each target data according to the attribute vector of each target data.
In one embodiment of the present application, the steps of determining the front data of the target data, the rear data of the target data, and the target data according to the data set may be further described in conjunction with the following description; and determining the associated data attribute value' of the target data according to the front data of the target data, the rear data of the target data, the target data and the data set.
In one embodiment, the data has attribute dependencies. Since in judging the aging order of data, we often need to refer to other attributes to determine the aging order of a given attribute. Furthermore, the attribute values may change back and forth due to the data. For example, marital status may change from "wedding" to "divorce" and back from "divorce" to "wedding". Therefore, it is difficult to determine the latest value on a single attribute from information on that attribute.
As described in the following steps, pre-data and post-data are established for each target data, and the pre-data of the target data is matched with the corresponding pre-data attribute value, the post-data of the target data is matched with the corresponding post-data attribute value, and the target data attribute value corresponding to the target data in the data set. The associated data attribute values include a leading data attribute value, a trailing data attribute value, and a target data attribute value.
In one embodiment of the present application, the specific procedure of "determining the time code of the target data according to the associated data attribute value" described in the step may be further described in connection with the following description.
Determining an attribute vector of each target data according to the associated data attribute values as follows; and determining the time code of each target data according to the attribute vector of each target data.
In one embodiment of the present application, the specific process of determining the attribute vector of each target data according to the associated data attribute value described in the step "may be further described in conjunction with the following description.
As one example, the ranking model Mrank first builds a vector representation of context information for each attribute value using a pre-trained language model (e.g., ELMo or Bert) such that the attribute value itself and the attribute value information associated with that attribute are also embedded.
Determining an associated data sequence according to the associated data attribute value as follows; and determining an attribute vector of each target data according to the associated data sequence.
In a specific embodiment, to refer to the associated data attribute values associated with the target data, we consider each record as a sequence, employing the concept of serialization, so that the information in the record can be efficiently digested by the model and embedded in the vector. Specifically, given a record t, we sequence its value as:
serialize(t)=<COL>A 1 <VAL>t[A 1 ]...<COL>An<VAL>t[A n ],
wherein the method comprises the steps of<COL>And<VAL>is a special identifier that indicates the beginning of the attribute and the attribute value, respectively. Serialization of t will be passed as input into a pre-trained language model emb (), thus calculating a d-dimensional embedded vector for each A attribute value of t, using emb (tA]) And (3) representing. In addition, we average all attribute embedding vectors to get a contextual representation of record t Finally, for each A attribute value of record t, we can get a context-aware vector representation:
E t[A] =[emb(t[A]);emb(t)],
wherein, [;]representing a concatenation of vectors. According to this strategy, the numerical vector E t[A] Not only the numerical information on the attribute A is represented, but also the context information of other related attributes is contained, so that the time-efficient sorting of the numerical values of the attribute A is facilitated. Similar to the numerical vector, we can also vector the attribute names. Specifically, we pass the attribute name A (e.g. "marital status") into the pre-trained language model to obtain an attribute vector E A =emb(A)。
In an embodiment of the present application, the specific procedure of "determining the time code of each target data according to the attribute vector of each target data" described in the step may be further described in the following description.
Determining a numerical code according to the attribute vector of each target data as follows; determining attribute codes according to the attribute vector of each target data; and determining the time code of each target data according to the attribute codes and the numerical codes.
As an example, mrank encodes each attribute name a to get a target code based on the constructed vector representation. Similarly, for each vector representation of attribute value tA, we encode it to obtain a numerical code. If the numerical code of tA is closer to the target code of attribute A, we consider the attribute value of tA to be newer. Meanwhile, the distance of the numerical code of tA from the target code of attribute A reflects their time span in the aged ordering. As shown in fig. 2.
In a particular embodiment, while approaches to vectorizing representations based on pre-trained language models have been widely used to capture semantic information, they have not been specifically used to time-scale ranking. Thus, we have resorted to a time-sequential encoding approach to reorganize the vectorized representations obtained based on the language model to preserve timeliness. The specific idea is to use the attribute vector as a target vector, so that the numerical vector corresponding to the updated numerical value is closer to the target vector; furthermore, unlike most existing ordering that takes a fixed interval, we have the target code and the numerical code have an adaptive interval that conforms to the time span in the aged ordering.
Specifically, the numerical vector E for the A attribute of a given record t t[A] We encode it using the context encoder enctxtx (·) as follows:
φ t[A] =ENC cxtx (E t[A] )=σ(W 2 *σ(W 1 *E t[A] )),
wherein W is 1 And W is 2 Is the academic parameter of the encoder, σ is the sigmoid activation function, i.e
Similarly, for attribute vector E of attribute A A We can also encode this by an attribute encoder ENCattr (·) as follows:
φ A =ENC attr (E A ),
to train an encoder with timing encoding properties and adaptive spacing we use a property-centric loss function for the adaptive spacing. Given an attribute a, its loss function is:
wherein, the liquid crystal display device comprises a liquid crystal display device,<·,·>representing vector dot product, gamma t1, Is record t 1 And t 2 Adaptive spacing between. In general, we set γ t1, The method comprises the following steps:
at this moment draw t 1 [A]And t 2 [A]The frequency with which the two values co-occur.
Intuitively, for each training sample t 1A t 2 To minimize this loss function, we are doing so by letting our encoder: (a) Relatively updated value t 2 [A]Is closer to the target code, (b) t 1 [A]And t 2 [A]The attribute value encoding of (2) has an adaptive interval.
In other words, the attribute values in the encoding space will not only chronologically arrange their "distance" to the target encoding so that the temporal order of the attributes can be easily deduced, but the intervals of the corresponding encodings are also adaptively determined (not fixed) so as to embody the semantics of the temporal ordering.
In an embodiment of the present application, the specific procedure of "determining the time code of each target data according to the attribute code and the numerical code" can be further described in conjunction with the following description.
Obtaining a time code of each target data with an adaptive interval through a loss function according to the attribute codes and the numerical codes as follows; the adaptive interval refers to a distance from the code of each target data attribute value to the code of the preset data attribute value in time sequence.
The data in the dataset is time-ordered according to the temporal encoding, as described in step S130 above.
In one embodiment of the present application, the specific process of "time-based ordering of data in the dataset according to the temporal coding" described in step S130 may be further described in connection with the following description.
And as described in the following steps, time-sequentially sorting the data in the data set according to the time code.
As an example, from the distance of all value encodings to the target encoding, we can derive the temporal ordering of the values under this attribute.
In a specific embodiment, the application provides a novel data timeliness ranking model to solve the defects of the prior art, and the key points and novelty of the novel data timeliness ranking model comprise: a context-aware vector representation method can simultaneously perform vectorized representation on target attribute values and related attribute values. A coding scheme that can order vectors in time order. An adaptive interval ordering strategy with attributes as cores.
In one embodiment, first, in determining the age order of the data, we often need to refer to other attributes to determine the age order for a given attribute. The vector representation obtained by the existing embedded model is usually based on a single attribute, so that related information on other attributes cannot be fused. In contrast, the application provides a brand-new context-aware vector representation method, which not only vectorizes the target attribute, but also integrates the context information of the related attribute, thereby accurately judging the timeliness of the target attribute.
Second, existing embedded models, while widely employed for abstracting semantic information, can handle lexically different but semantically similar attribute values, however, they do not order the attributes in chronological order. The present application thus proposes a completely new coding mechanism, which arranges the "distance" of the code of each attribute value to the target code in time order, so that the time order of the attributes can be easily deduced.
Finally, the existing ordering strategy does not consider the timeliness characteristic of the data attribute in real life, and adopts an ordering mode with fixed intervals. However, in time-lapse sequencing, the time spans between attributes are different. Therefore, we propose an adaptive spacing method that can characterize the timeliness of the data. Different intervals exist between different attributes, so that the ordered result is consistent with the real behavior of the data attributes.
In a specific embodiment, we verify the validity of the application by experiments on a real dataset. We implement the ordering model Mrank and compare the effects of the vector representation method without using context awareness, the coding mechanism without ordering in time order, and the scheme without using adaptive interval ordering strategy, respectively.
From the experimental results, the accuracy of Mrank is significantly better than the other three schemes. Specifically, the average accuracy of Mrank is 0.722, while the other three schemes have an accuracy of 0.641,0.613 and 0.714, respectively, which are improved by 8.1, 10.9, and 1 percent, respectively. Without using the context-aware vector representation method, we cannot refer to the valid information of other relevant attributes in the ranking. Without using a coding mechanism that orders in time order, we cannot rely solely on existing embedding models to obtain a good time order. Without the use of an adaptive interval ordering strategy, the semantics of the time span cannot be reflected in the ordering result.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 4, a timing sequencing device according to an embodiment of the present application is shown, which specifically includes the following modules,
an obtaining module 410, configured to obtain a dataset, where the dataset includes data and attribute values corresponding to the data one to one; wherein the dataset comprises at least two sets of data;
a time encoding module 420, configured to determine an associated data attribute value of the target data according to the data set, and determine a time encoding of the target data according to the associated data attribute value;
and the timeliness ordering module 430 is configured to timeliness order the data in the dataset according to the time code.
In one embodiment of the present application, the time encoding module 420 includes:
the front data and rear data sub-module is used for determining front data of target data, rear data of target data and target data according to the data set;
the associated data attribute value sub-module is used for determining an associated data attribute value of the target data according to the front data of the target data, the rear data of the target data, the target data and the data set;
the attribute vector sub-module is used for determining an attribute vector of each target data according to the associated data attribute value;
and the time coding sub-module is used for determining the time coding of each target data according to the attribute vector of each target data.
In one embodiment of the present application, the attribute vector submodule includes:
the associated data sequence sub-module is used for determining an associated data sequence according to the associated data attribute value;
and the attribute vector sub-module is used for determining the attribute vector of each target data according to the associated data sequence.
In an embodiment of the present application, the time encoding submodule includes:
the numerical coding submodule is used for determining numerical codes according to the attribute vector of each target data;
the attribute coding submodule is used for determining attribute codes according to the attribute vector of each target data;
and the time coding sub-module is used for determining the time coding of each target data according to the attribute codes and the numerical codes.
In an embodiment of the present application, the time encoding submodule of each target data includes:
an adaptive interval sub-module, configured to obtain a time code of each target data with an adaptive interval according to the attribute code and the numerical code through a loss function; the adaptive interval refers to a distance from the code of each target data attribute value to the code of the preset data attribute value in time sequence.
In one embodiment of the present application, the time-efficient sorting module 430 includes:
and the timeliness sorting sub-module is used for timeliness sorting the data in the data set according to the time code and the time sequence.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.
In this embodiment and the above embodiments, repeated operation steps are provided, and this embodiment is only described briefly, and the rest of the solutions only need to be described with reference to the above embodiments.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 5, a computer device illustrating a time sequencing method of the present application may specifically include the following:
the computer device 12 described above is embodied in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a memory 28, and a bus 18 that connects the various system components, including the memory 28 and the processing unit 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, the program modules 42 being configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), one or more devices that enable an operator to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through the I/O interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet, through network adapter 20. As shown in fig. 5, the network adapter 20 communicates with other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, data backup storage systems 34, and the like.
The processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing a time sequencing method provided by embodiments of the present application.
That is, the processing unit 16 realizes when executing the program: acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data; determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value; and time-based ordering the data in the data set according to the time code.
In an embodiment of the present application, the present application further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a time sequencing method as provided in all embodiments of the present application.
That is, the program is implemented when executed by a processor: acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data; determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value; and time-based ordering the data in the data set according to the time code.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the operator's computer, partly on the operator's computer, as a stand-alone software package, partly on the operator's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the operator computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing has described in detail a method and apparatus for time-series sequencing, wherein specific examples are employed to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A method for time-sequential ordering of data, the method comprising:
acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data;
determining an associated data attribute value of target data according to the data set, and determining a time code of the target data according to the associated data attribute value;
and time-based ordering the data in the data set according to the time code.
2. The method of claim 1, wherein the step of determining the associated data attribute value of the target data from the data set comprises:
determining front data of target data, rear data of the target data and the target data according to the data set;
and determining the associated data attribute value of the target data according to the front data of the target data, the rear data of the target data, the target data and the data set.
3. The method of claim 1, wherein said step of determining a time code of said target data based on said associated data attribute values comprises:
determining an attribute vector of each target data according to the associated data attribute value;
and determining the time code of each target data according to the attribute vector of each target data.
4. A method according to claim 3, wherein the step of determining an attribute vector for each target data in dependence upon the associated data attribute values comprises:
determining an associated data sequence according to the associated data attribute value;
and determining an attribute vector of each target data according to the associated data sequence.
5. A method according to claim 3, wherein said step of determining the time code of each of said target data from said attribute vector of each of said target data comprises:
determining a numerical code according to the attribute vector of each target data;
determining attribute codes according to the attribute vector of each target data;
and determining the time code of each target data according to the attribute codes and the numerical codes.
6. The method of time sequencing of claim 5 wherein said step of determining a time code for each of said target data based on said attribute codes and said numerical codes comprises:
and obtaining the time code of each target data with the adaptive interval through a loss function according to the attribute code and the numerical code.
7. The method of time sequencing of claim 1 wherein said step of time-sequentially sequencing data in said dataset according to said time code comprises:
and time-sequentially sorting the data in the data set according to the time code.
8. A time sequential ordering apparatus for time-efficient ordering of data, comprising:
the acquisition module is used for acquiring a data set, wherein the data set comprises data and attribute values corresponding to the data set one by one; wherein the dataset comprises at least two sets of data;
the time coding module is used for determining an associated data attribute value of the target data according to the data set and determining time coding of the target data according to the associated data attribute value;
and the timeliness ordering module is used for timeliness ordering the data in the data set according to the time code.
9. A computer electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor performs the steps of the time sequencing method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the time sequencing method of any of claims 1 to 7.
CN202310717116.6A 2023-06-15 2023-06-15 Time sequence ordering method and device Active CN116578602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310717116.6A CN116578602B (en) 2023-06-15 2023-06-15 Time sequence ordering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310717116.6A CN116578602B (en) 2023-06-15 2023-06-15 Time sequence ordering method and device

Publications (2)

Publication Number Publication Date
CN116578602A true CN116578602A (en) 2023-08-11
CN116578602B CN116578602B (en) 2024-03-12

Family

ID=87545372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310717116.6A Active CN116578602B (en) 2023-06-15 2023-06-15 Time sequence ordering method and device

Country Status (1)

Country Link
CN (1) CN116578602B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955829A (en) * 2011-08-30 2013-03-06 北京百度网讯科技有限公司 Method, device and equipment for sequencing resource items
US20140181067A1 (en) * 2012-12-25 2014-06-26 Alibaba Group Holding Limited Method and apparatus of ordering search data, and data search method and apparatus
CN112732766A (en) * 2020-12-30 2021-04-30 绿盟科技集团股份有限公司 Data sorting method and device, electronic equipment and storage medium
CN114647627A (en) * 2020-12-17 2022-06-21 国际商业机器公司 Ordering datasets based on data attributes
CN114970717A (en) * 2022-05-25 2022-08-30 亚信科技(中国)有限公司 Time series data abnormity detection method, electronic equipment and computer storage medium
CN115495498A (en) * 2022-09-23 2022-12-20 共青科技职业学院 Data association method, system, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955829A (en) * 2011-08-30 2013-03-06 北京百度网讯科技有限公司 Method, device and equipment for sequencing resource items
US20140181067A1 (en) * 2012-12-25 2014-06-26 Alibaba Group Holding Limited Method and apparatus of ordering search data, and data search method and apparatus
CN103902549A (en) * 2012-12-25 2014-07-02 阿里巴巴集团控股有限公司 Search data sorting method and device and data searching method and device
CN114647627A (en) * 2020-12-17 2022-06-21 国际商业机器公司 Ordering datasets based on data attributes
US20220197914A1 (en) * 2020-12-17 2022-06-23 International Business Machines Corporation Ranking datasets based on data attributes
CN112732766A (en) * 2020-12-30 2021-04-30 绿盟科技集团股份有限公司 Data sorting method and device, electronic equipment and storage medium
CN114970717A (en) * 2022-05-25 2022-08-30 亚信科技(中国)有限公司 Time series data abnormity detection method, electronic equipment and computer storage medium
CN115495498A (en) * 2022-09-23 2022-12-20 共青科技职业学院 Data association method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN116578602B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
US20210374610A1 (en) Efficient duplicate detection for machine learning data sets
US10452992B2 (en) Interactive interfaces for machine learning model evaluations
CA2953959C (en) Feature processing recipes for machine learning
CN112084383A (en) Information recommendation method, device and equipment based on knowledge graph and storage medium
WO2022121171A1 (en) Similar text matching method and apparatus, and electronic device and computer storage medium
WO2022218186A1 (en) Method and apparatus for generating personalized knowledge graph, and computer device
US20100293117A1 (en) Method and system for facilitating batch mode active learning
US20220100772A1 (en) Context-sensitive linking of entities to private databases
WO2022222943A1 (en) Department recommendation method and apparatus, electronic device and storage medium
US20220100967A1 (en) Lifecycle management for customized natural language processing
US20180293295A1 (en) Detection and creation of appropriate row concept during automated model generation
US11042576B2 (en) Identifying and prioritizing candidate answer gaps within a corpus
US11775504B2 (en) Computer estimations based on statistical tree structures
CN114840531B (en) Data model reconstruction method, device, equipment and medium based on blood edge relation
CN116383193A (en) Data management method and device, electronic equipment and storage medium
AU2022204589A1 (en) Multiple input machine learning framework for anomaly detection
CN110826616B (en) Information processing method and device, electronic equipment and storage medium
CN116578602B (en) Time sequence ordering method and device
US20220284501A1 (en) Probabilistic determination of compatible content
CN115345600A (en) RPA flow generation method and device
CN115269998A (en) Information recommendation method and device, electronic equipment and storage medium
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
US11475335B2 (en) Cognitive data preparation for deep learning model training
CN117077802B (en) Sequencing prediction method and device for time sequence data
CN113342998B (en) Multimedia resource recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant