CN115599838B - Data processing method, device, equipment and storage medium based on artificial intelligence - Google Patents

Data processing method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN115599838B
CN115599838B CN202211262585.5A CN202211262585A CN115599838B CN 115599838 B CN115599838 B CN 115599838B CN 202211262585 A CN202211262585 A CN 202211262585A CN 115599838 B CN115599838 B CN 115599838B
Authority
CN
China
Prior art keywords
data
processed
database
unit space
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211262585.5A
Other languages
Chinese (zh)
Other versions
CN115599838A (en
Inventor
阴智辉
曹彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211262585.5A priority Critical patent/CN115599838B/en
Publication of CN115599838A publication Critical patent/CN115599838A/en
Application granted granted Critical
Publication of CN115599838B publication Critical patent/CN115599838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data processing method, device, equipment and storage medium based on artificial intelligence, relates to the field of artificial intelligence, in particular to the technical fields of cloud computing, cloud storage, cloud networks and cloud databases, and can be applied to intelligent cloud scenes. The specific implementation scheme is as follows: acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution; in the data to be processed, dispersing the data corresponding to the same distributed scanning unit space, and determining the dispersing processing sequence of the data to be processed; and calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database asynchronously processes the storage data corresponding to the data to be processed. The method and the device can improve the usability of the database in the data processing process.

Description

Data processing method, device, equipment and storage medium based on artificial intelligence
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to the technical fields of cloud computing, cloud storage, cloud networks and cloud databases, and can be applied to intelligent cloud scenes, and particularly relates to a data processing method, device, equipment and storage medium based on artificial intelligence.
Background
With the development of databases, the number of data processed by data is increasing, the association between data is increasing, and the conventional data processing mode cannot meet the use requirement.
At present, data can be asynchronously processed to process scenes of complex services.
Disclosure of Invention
The present disclosure provides a data processing method, apparatus, device and storage medium based on artificial intelligence.
According to an aspect of the present disclosure, there is provided an artificial intelligence based data processing method, including:
acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution;
in the data to be processed, dispersing the data corresponding to the same distributed scanning unit space, and determining the dispersing processing sequence of the data to be processed;
and calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database asynchronously processes the storage data corresponding to the data to be processed.
According to an aspect of the present disclosure, there is provided an artificial intelligence based data processing apparatus including:
the unit space determining module is used for acquiring data to be processed and determining a scanning unit space of storage data distribution corresponding to the data to be processed;
The processing sequence determining module is used for dispersing the data corresponding to the same distributed scanning unit space in the data to be processed and determining the dispersing processing sequence of the data to be processed;
and the database calling module is used for calling the database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence so as to asynchronously process the storage data corresponding to the data to be processed by the database.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of the artificial intelligence of any embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the artificial intelligence data processing method according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program object comprising a computer program which, when executed by a processor, implements the data processing method of artificial intelligence according to any embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a data processing system comprising: a first node for executing the data processing method according to any of the embodiments of the present disclosure, and a second node;
the first node is used for acquiring data to be processed and determining a scanning unit space corresponding to the data to be processed and storing data distribution;
in the data to be processed, dispersing the data corresponding to the same distributed scanning unit space, and determining the dispersing processing sequence of the data to be processed;
according to the decentralized processing sequence, calling a second node to which a scanning unit space corresponding to the data to be processed belongs;
the second node is used for asynchronously processing the storage data corresponding to the data to be processed.
The method and the device can improve the usability of the database in the data processing process.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of an artificial intelligence based data processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another artificial intelligence based data processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of another artificial intelligence based data processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an artificial intelligence based data processing system according to an embodiment of the present disclosure;
FIG. 5 is a scenario diagram of a data asynchronous delete method disclosed according to an embodiment of the present disclosure;
FIG. 6 is a scenario diagram of a data asynchronous delete method disclosed according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of an artificial intelligence based data processing apparatus according to an embodiment of the disclosure;
FIG. 8 is a block diagram of an electronic device for implementing an artificial intelligence based data processing method of an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a flow chart of an artificial intelligence based data processing method according to an embodiment of the present disclosure, which may be applicable to the case of asynchronous processing of data. The method of the embodiment can be implemented by an artificial intelligence-based data processing device, which can be implemented in a software and/or hardware manner and is specifically configured in an electronic device with a certain data computing capability, wherein the electronic device can be a client device or a server device, and the client device can be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer and the like.
S101, acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution.
The data to be processed is used for processing the stored data. The data to be processed may be index data or pointer data of the corresponding stored data. And inquiring corresponding storage data according to the data to be processed. The stored data may be data actually stored in the storage space. The storage data corresponding to the data to be processed may be that the data to be processed and the storage data have the same content. Alternatively, the stored data may include the content of the data to be processed, as well as other content. For example, the data to be processed may be a key (key), and the stored data may be an actual value (value) corresponding to the key, or may be the key and an actual value corresponding to the key. For example, the data to be processed includes key=1, key=2, key=10, and the like. The storage space comprises at least one scanning unit space, the storage data is stored in the scanning unit space, and the storage data can be set to be stored in a certain scanning unit space according to the requirement. For example, a data processing request sent by a querying party may be obtained, pointing information (such as an address or an index value) of stored data to be processed may be extracted from the data processing request, and the data to be processed may be determined. The amount of data of the stored data to be processed is generally large, and the pointing information is plural, so that the determined amount of data of the data to be processed is large, for example, millions of data.
The scanning unit space may be a storage space between a scanning start position to a scanning end position of the single scanning. In addition, in the single scan, the scan order is a fixed direction. For example, the scanning unit spaces are scanned in the order from the front to the back, that is, the scanning is started from the start space address of the scanning unit space as the scanning start position, and the scanning is started to the back in order until the end space address of the scanning unit space (as the scanning end position) ends. For example, one scan cell space may be one data slice. Alternatively, the scan cell space may be at least one data slice or a portion of a data slice. For example, if the position where the single scan starts corresponds to the head of the data slice and the position where the single scan ends corresponds to the tail of the data slice, the scan cell space is one data slice. For another example, if the position where the single scanning of the database starts corresponds to the head of the data slice, and the position where the single scanning of the data ends corresponds to any position between the head and the tail of the data slice, the scanning unit space is a partial data slice.
Specifically, at least one piece of data included in the data to be processed is obtained, and for each piece of data, storage data corresponding to the piece of data is determined. And acquiring a scanning unit space to which the stored data corresponding to each piece of data belongs. And combining the scanning unit spaces to which the stored data corresponding to each piece of data belong to, and determining the scanning unit spaces in which the stored data corresponding to the data to be processed are distributed.
S102, dispersing data corresponding to the same distributed scanning unit space in the data to be processed, and determining the dispersing processing sequence of the data to be processed.
The processing order may refer to the processing order of each piece of data in the data to be processed. In the processing process of the data to be processed, each piece of data included in the data to be processed is actually processed piece by piece, and the data to be processed is generally processed sequentially according to the distribution of the scanning unit spaces. Thus, the decentralized processing sequence may further include a processing sequence of the scan cell space and a processing sequence of data corresponding to the scan cell space, and in particular, the processing sequence may include a processing sequence of the scan cell space and a processing sequence of data corresponding to each scan cell space among the data to be processed.
Exemplary, the scan cell spaces are A, B and C, respectively, the data processing sequence of each scan cell space is A, B and C in sequence from front to back, and the processing sequence of each data corresponding to scan cell space A is D in sequence from front to back A1 、D A2 、D A3 、D A4 And D A5 The processing sequence of each data corresponding to the scanning unit space B is D from front to back B1 、D B2 、D B3 、D B4 And D B5 The processing sequence of each data corresponding to the scanning unit space C is D from front to back C1 、D C2 、D C3 、D C4 And D C5 Then the processing order may be D A1 、D A2 、D A3 、D A4 、D A5 、D B1 、D B2 、D B3 、D B4 、D B5 、D C1 、D C2 、D C3 、D C4 And D C5
In practice, the pieces of data included in the data to be processed may be classified and arranged in accordance with the scanning unit space. Dispersing data corresponding to the same distributed scanning unit space may mean acquiring data to be processed, and dispersing and arranging data corresponding to the same distributed scanning unit space therein so that data corresponding to the same distributed scanning unit space is not all adjacently arranged, and is not all continuously arranged. The order of the dispersion processing can be determined in accordance with the front-rear relationship between the dispersed data. The decentralized processing order may refer to a processing order of the decentralized data to be processed. The decentralized processing sequence may include a processing sequence of the scan cell spaces and a processing sequence of data corresponding to each scan cell space among the data to be processed. The data corresponding to the same distributed scanning unit space is dispersed, and the data corresponding to the same distributed scanning unit space is dispersed among the data of other distributed scanning unit spaces, so that the data corresponding to the same distributed scanning unit space is dispersed into a discontinuous processing sequence for processing, namely, the data of the same distributed scanning unit space is originally processed according to the processing sequence of centralized processing into the discontinuous and non-centralized processing sequence for processing. And processing the data to be processed according to a decentralized processing sequence, and ensuring discontinuous processing of the data corresponding to the same scanning unit space.
Alternatively, each piece of data of the data to be processed may be arranged according to the processing sequence, and divided according to the scan cell space, and the data corresponding to the scan cell space with the same distribution may be dispersed, so that in the sorting result, the data corresponding to the scan cell space with the same distribution is dispersed and sorted, that is, the order is discontinuous, and the obtained sorting result is determined as the dispersing processing sequence. For another example, each piece of data of the data to be processed is arranged according to the processing sequence, the data corresponding to each scanning unit space is obtained by dividing the scanning unit space, the distributed scanning unit spaces are sampled randomly, one scanning unit space is extracted at a time, the extraction sequence is a decentralized processing sequence, wherein the scanning unit spaces extracted continuously are different, and the adjacent scanning unit spaces in the corresponding sequencing result are different. In processing, it is possible to randomly take part of data from the data corresponding to the scanning unit space to perform piece-by-piece processing in the order of the scanning unit space.
In practice, the decentralized data processing is to avoid the data centralized processing corresponding to the same distributed scanning unit space, that is, to avoid the number of data centralized processing corresponding to the same distributed scanning unit space being equal to or less than the processing number threshold, and may be understood as to avoid the number of data continuously processed corresponding to the same distributed scanning unit space being equal to or less than the processing number threshold. The processing number threshold may be an upper limit value of a continuous processing number preset in the same scanning unit space. The process number threshold may be set and adjusted according to the experience of the skilled person.
Alternatively, the data to be processed corresponding to the same scanning unit space can be scattered in an extraction mode. Specifically, the random sampling with the substitution is performed on the scanning unit space to obtain a sampling sequence, and for each extracted scanning unit space, the non-substitution sampling is performed on the corresponding data to be processed, and the number of the extracted data does not exceed the threshold of the processing number. The extraction mode may be random extraction or extraction according to a preset extraction rule, which is not limited herein. And not extracting the scanning unit space until the data corresponding to the scanning unit space is completely extracted. Repeating the steps until all the data corresponding to all the scanning unit spaces are completely extracted.
Optionally, the data to be processed corresponding to the same scanning unit space can be pre-grouped to obtain data groups corresponding to each scanning unit space, the random sampling with the substitution of the set scanning unit space can be firstly performed to obtain the sampling sequence, and the random sampling without substitution is performed in the corresponding data groups for each extracted scanning unit space until all the data groups corresponding to the scanning unit space are extracted, and the scanning unit space is not extracted. Repeating the steps until all the data sets corresponding to all the scanning unit spaces are completely extracted.
In one specific example, the decentralized processing order is the processing order of the loop of A, B and C. The processing sequence of each data corresponding to the scanning unit space A is D from front to back A1 、D A2 、D A3 、D A4 And D A5 The processing sequence of each data corresponding to the scanning unit space B is D from front to back B1 、D B2 、D B3 、D B4 And D B5 The processing sequence of each data corresponding to the scanning unit space C is D from front to back C1 、D C2 、D C3 、D C4 And D C5 Then the dispersion processing order may be D A1 、D B1 、D C1 、D A2 、D B2 、D C2 、D A3 、D B3 、D C3 、D A4 、D B4 、D C4 、D A5 、D B5 And D C5
Optionally, the number of data to be processed corresponding to each scanning unit space may be different, the number of data extracted each time may be different, and the number of data after each scanning unit space is dispersed may be different. And ensuring discontinuous processing of the data after the spatial dispersion of the same scanning unit through a dispersion processing sequence.
S103, calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database asynchronously processes the storage data corresponding to the data to be processed.
The scan cell space is a memory space belonging to a certain database. The database may process stored data distributed in the scanning unit space belonging to the present database.
Asynchronous processing may refer to receiving a processing instruction, not processing on the fly, but executing the processing instruction if a condition is subsequently met. The processing instruction may be an instruction for asynchronous processing by the database. By way of example, asynchronous processing may be implemented using a tag-before-process data processing approach. Asynchronous processing may include marking the data prior to processing the data. Alternatively, the asynchronous processing may include asynchronous deletion or asynchronous modification, or the like. For example, if the asynchronous process is an asynchronous delete, the data H is marked first, e.g., marked "tombstone", to asynchronously delete the data H. The marked data can be deleted uniformly when the data is compressed. After all the data in the data to be processed correspondingly call the affiliated database and the call feedback result of the corresponding affiliated database is obtained, the result can be fed back to the inquirer, wherein the data to be processed is not actually processed by the database at this time, and only the database is informed of the stored data corresponding to the processing.
If the data is processed in real time, the database can process the next piece of data after waiting for processing one piece of data, and the problems of overlong waiting time and low data processing efficiency exist. If the data is processed in a non-instant manner, namely asynchronous processing, after a processing instruction is received, the processing instruction is executed under a certain condition, at the moment, the system can also receive the next processing instruction without waiting for the completion of the last data processing, so that the waiting time is saved, and the efficiency of the data processing of the system is improved.
Asynchronous processing is not immediate processing after receiving a processing instruction, and the processing instruction is executed under a certain condition after being satisfied. If the stored data distributed in the same scanning unit space corresponding to the data to be processed are simultaneously processed in a centralized manner, a large amount of data which receives the processing instruction but is not processed yet exists in the same scanning unit space. In the asynchronous processing scenario, the data processing sequence of the same scan cell space is a sequence of continuous scanning in a fixed direction, for example, a sequence of scanning from front to back, and accordingly, when the stored data in the scan cell space is read, the stored data in the scan cell space needs to be scanned from front to back. When executing a read instruction, data that has received a large amount of processing instructions, but has not yet been processed, is scanned. When scanning these data, the actual data is already invalid data, which wastes system resources of the database. Moreover, as the number of the data is large, the scanning time is too long, the time of executing the reading instruction is overtime when the database executes the reading instruction, but the condition that the data to be read is not scanned is not met, at the moment, the database can repeatedly execute the reading instruction, so that the great waste of database resources is caused, and the database system is crashed when serious, so that the response speed of data reading is further reduced, and the reading and writing performance of the database is affected.
According to the decentralized processing sequence, the processing sequence of the data to be processed continuously concentrated from the same scanning unit space is adjusted to the processing sequence of discontinuous and non-concentrated from the same scanning unit space, the data volume of asynchronous processing in the same scanning unit space can be greatly reduced, and a large amount of data which are not processed but are received in the same scanning unit space are avoided. When the reading instruction is executed, the stored data in the space of the scanning unit is scanned from front to back, so that the time consumption for scanning the data which is not processed after the processing instruction is received is reduced, the condition that the processing time is overtime when the database executes the reading instruction but the data which needs to be read is not scanned is avoided, the condition that the system resource is wasted due to the fact that the database repeatedly executes the reading instruction is avoided, the stability of the database is improved, and the normal reading and writing performance of the database is maintained.
Specifically, each piece of data included in the data to be processed can be sequentially selected according to the decentralized processing sequence, a database to which the scanning unit space corresponding to the selected data belongs is called, and asynchronous processing operation is directly performed on the storage data corresponding to the selected data through the database to which the scanning unit space belongs.
According to the technical scheme, the data to be processed is obtained, and the scanning unit space of the storage data distribution corresponding to the data to be processed is determined; in the data to be processed, the data corresponding to the same distributed scanning unit space is scattered, and the scattered processing sequence of the data to be processed is determined; and calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database asynchronously processes the storage data corresponding to the data to be processed, and adjusting the data processing sequence of the same scanning unit space from a continuous concentrated processing sequence to a discontinuous non-concentrated processing sequence. The method and the device avoid a large amount of unprocessed data of the received processing instructions in the same scanning unit space, can reduce the time consumption of scanning the unprocessed data of the received processing instructions, correspondingly reduce invalid scanning, improve the stability and usability of the database in the data processing process, and consider the reading and writing performance of the database. And asynchronous processing is carried out on the data, so that waiting time is saved, and the data processing efficiency is improved.
FIG. 2 is a flow chart of another artificial intelligence based data processing method disclosed in accordance with an embodiment of the present disclosure, further optimized and expanded based on the above-described technical solution, and may be combined with the various alternative implementations described above. According to the decentralized processing sequence, calling a database to which the scanning unit space corresponding to the data to be processed belongs, wherein the database is embodied as follows: selecting a current unit space in the scanning unit space according to the dispersion processing sequence; aiming at the current unit space, acquiring target data corresponding to the current unit space in the data to be processed; selecting current data from the target data; wherein the data to be processed comprises current data; and calling a database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
S201, acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution.
S202, dispersing data corresponding to the same distributed scanning unit space in the data to be processed, and determining the dispersing processing sequence of the data to be processed.
S203, selecting the current unit space in the scanning unit space according to the dispersion processing sequence.
The current cell space may be a scan cell space selected at the current time. The current unit space is one of the scanning unit spaces corresponding to the data to be processed and storing data distribution.
Specifically, the decentralized processing sequence includes a data processing sequence among the scan cell spaces, and according to the decentralized processing sequence, one scan cell space may be sequentially selected among the scan cell spaces, and the scan cell space selected at the current time is determined as the current cell space.
S204, aiming at the current unit space, acquiring target data corresponding to the current unit space in the data to be processed.
The target data may refer to data corresponding to a current unit space among the data to be processed, and the storage data corresponding to the target data is stored in the current unit space. The target data belongs to the data to be processed. Each piece of data included in the target data corresponds to the same scanning unit space, namely each piece of data included in the target data corresponds to the current unit space.
Specifically, for the current unit space, all data corresponding to the current unit space may be acquired from the data to be processed as target data.
S205, selecting current data from target data.
The current data may be data in the target data. The current data includes at least one piece of data. If the target data only comprises one piece of data, the target data is the current data. Illustratively, the target data is three million pieces of data, and the current data may be ten thousand pieces of data in the target data. The current data corresponds to the current cell space.
Alternatively, if the decentralized processing sequence includes only the data processing sequence between the respective scan cell spaces, a small amount of data may be randomly selected as the current data among the target data. For example, the data corresponding to the current unit space E may be divided into three groups, each group includes a small amount of data, and one group of data may be randomly selected as the current data.
Alternatively, if the decentralized processing sequence includes a data processing sequence between the respective scan cell spaces and a processing sequence between the respective pieces of data corresponding to the respective scan cell spaces, an appropriate amount of data may be sequentially selected from the target data according to the processing sequence between the respective pieces of data in the decentralized processing sequence, and determined as the current number According to the above. As described above, if the scan cell space E is the current cell space, and the processing order of the target data corresponding to the scan cell space is: d (D) E2 、D E1 And D E3 When the scanning unit space E is used as the current unit space for the first time, D is selected E2 And D E1 As current data; when the scanning unit space E is used as the current unit space for the second time, D is selected E3 As current data.
Alternatively, the target data may be divided into data groups in advance, one data group is randomly selected from the divided data groups, and the data in the data group is used as the current data. The data in the data group corresponding to the current order may be selected as the current data according to the processing order of the data group in the distributed processing order.
S206, according to the current data, calling a database to which the current unit space belongs to asynchronously process the storage data corresponding to the current data.
Specifically, the database to which the current unit space belongs may be sequentially called according to at least one piece of data included in the current data. And the database of the current unit space performs asynchronous processing on the storage data corresponding to each data in the current data piece by piece. And after the asynchronous processing of the stored data corresponding to the current data is finished, returning to execute the step of selecting the next scanning unit space according to the decentralized processing sequence until the data to be processed are processed.
In an alternative embodiment of the present disclosure, the amount of current data is less than or equal to a preset amount threshold.
The preset number threshold may be an upper limit value of a preset number of the current data. The preset number threshold may be set and adjusted according to the experience of the technician.
The limit of the current data quantity is realized by setting the current data quantity smaller than or equal to a preset quantity threshold value. The current data corresponds to the single-processing stored data in the current unit space, and if the number of the current data is smaller than or equal to a preset number threshold, the number of the stored data which is continuously processed in the current unit space is indicated not to be excessive. When the processing instruction is executed on the scanning unit space, the data which is not processed by the receiving processing instruction and is scanned is not excessive. The problems that excessive scanning and excessive data which are not processed by receiving the processing instruction occupy excessive system resources and the scanning time is too long are avoided, the effective utilization of the system resources is realized, and the stability of the database in the data processing process is ensured.
In an optional embodiment of the present disclosure, invoking, according to current data, a database to which a current unit space belongs specifically includes: generating new version data corresponding to the current data; writing the new version data into a database to which the current unit space belongs; and calling a database to which the current unit space belongs to asynchronously process the stored data and the new version data corresponding to the current data.
And (2) generating new version data corresponding to the current data in the step (1).
The database can process the stored data corresponding to the current data in a multi-version mode. The new version data may be processed storage data. The old version data may be pre-processed stored data. The multi-version processing stores both the processed new version data and the old version data before processing. The old version data is the stored data before processing. The new version data and the old version data can be distinguished by version numbers. The version number may be characterized with a timestamp. Illustratively, the new version data may be p=11, t s =10. Wherein p=11 is a value of 10 in the new version data p; t is t s =10 is the timestamp t corresponding to the new version data s 10, i.e. the corresponding version number of the new version data is 10. The old version data may be p=9, t s =8. Wherein p=9 is a value of 9 in the old version data p; t is t s =8 is the timestamp t corresponding to the old version data s 8, the corresponding version number 8 of the old version data.
And the new version data corresponding to the current data is used for determining the data obtained by executing processing on the storage data corresponding to the current data. And processing the stored data corresponding to the current data in the database, namely generating new version data corresponding to the current data and storing the new version data in the database as new version data of the stored data, and meanwhile, changing the stored data corresponding to the current data into old version data. And deleting the data to be deleted when the triggering condition is met, so as to realize asynchronous processing of the stored data corresponding to the current data.
The multi-version mode storage data can be stored by a plurality of versions of storage data, and the latest version number value of the storage data is used as the value of the storage data by utilizing the version number of the storage data, so that the correctness of the storage data is ensured. And the data is stored in a non-multi-version mode, when the conditions such as system delay and the like occur, the time sequence of the stored data is disordered due to the delay, and the stored data is incorrect.
Illustratively, set at t 1 The stored data X is assigned a value of 8 at time instant, at t 2 The stored data X is assigned a value of 9 at time instant, at t 3 Deleting the stored data X at the moment, and sequentially from front to back in time sequence as t 1 、t 2 And t 3 . If the storage data X is stored in a non-multi-version manner, and at execution t 2 When the stored data X at the moment is assigned, the system generates time delay, and t is executed first 3 The operation of deleting the stored data X at the moment, the stored data X is deleted first and then assigned to 9, and the value of the stored data X is 9 at the moment, which indicates that t is passed 1 、t 2 And t 3 After the operation of (2), the value of the stored data X is 9, which is different from the true value, and an error occurs. If the storage data X is stored in a multi-version mode, t 1 Time t 2 Time sum t 3 The moment can be used as the version number of the storage data X, the value of the storage data X with the latest version number is used as the value of the storage data X, namely the storage data X is deleted, and the value of the storage data X is empty, so that the processing correctness of the storage data is ensured.
And (2) writing the new version data into a database to which the current unit space belongs.
When the stored data corresponding to the current data is processed, the corresponding new version data can be written into the database to which the current unit space belongs, and the stored data corresponding to the current data is distributed in the current unit space.
And (3) calling a database of the current unit space to asynchronously process the stored data and the new version data corresponding to the current data.
And calling a database to which the current unit space belongs, and asynchronously processing the stored data and the new version data corresponding to the current data by the database, namely marking the stored data and the new version data corresponding to the current data, and actually processing the marked stored data and the marked new version data corresponding to the current data when the triggering condition is met. The triggering condition is used for triggering the database to carry out real processing on asynchronously processed data. For example, the trigger condition may be detecting whether a periodic time point is reached, or the trigger condition may be detecting whether a data compression process is performed, or the like.
When the triggering condition is met, the database usually deletes the stored data (namely the old version data), and the time when the database receives the asynchronous processing instruction is different, so that the asynchronous processing of the stored data is realized. Optionally, the trigger condition is detecting whether the database executes a compressed file instruction.
Generating new version data corresponding to the current data; writing the new version data into a database to which the current unit space belongs; calling a database to which the current unit space belongs to asynchronously process the stored data and the new version data corresponding to the current data; the stored data corresponding to the current data is stored in a multi-version mode, and the value of the stored data with the latest version number is used as the value of the stored data according to different version numbers of the stored data, so that errors of the value of the stored data caused by system delay and other reasons are avoided, and the correctness of the stored data is improved. And the stored data and the new version data corresponding to the current data are asynchronously processed, when the trigger condition is met, the stored data (namely the old version data) and the new version data are asynchronously processed, and the system can also receive the next processing instruction before the actual processing, so that the waiting time is not required to wait for the completion of the asynchronous processing, the waiting time is saved, the data processing efficiency of the system is improved, and the read-write performance of the system is improved. Meanwhile, the data stored and the new version data corresponding to a small amount of current data are asynchronously processed, so that the data in the same scanning unit space is subjected to decentralized processing by database processing, excessive data which are not processed by receiving processing instructions cannot be generated in the same scanning unit space in the asynchronous processing process, the time consumption of invalid scanning can be reduced correspondingly, the condition that the processing time is overtime when the processing instructions are executed but the data which need to be executed are not scanned is avoided, the system resource is effectively utilized, and the stability and the usability of the database in the data processing process are improved.
In an alternative embodiment of the present disclosure, the data to be processed includes data to be deleted and/or data to be modified.
The data to be modified is used for modifying the corresponding stored data. The data to be deleted is used for modifying the corresponding stored data.
Optionally, if the data to be processed is the data to be modified, the corresponding storage data is modified, new version data corresponding to the current data may be generated, and the new version data is written into the database to which the current unit space belongs. And calling a database to which the current unit space belongs, and storing new version data corresponding to the current data. The database may perform tag deletion on the stored data corresponding to the current data, for example, identify the stored data corresponding to the current data as a "tombstone", and delete the stored data (old version data) when the trigger condition is satisfied.
Optionally, if the data to be processed is to be deleted, deleting the corresponding stored data may generate new version data corresponding to the current data, where the content of the new version data may be the stored data corresponding to the deleted current data, and more specifically, the content of the new version data may be the stored data corresponding to the marked current data as a tombstone. And writing the new version data into a database to which the current unit space belongs. And calling a database to which the current unit space belongs, and storing new version data corresponding to the current data. The database marks the new version data and the stored data (old version data), for example identified as "tombstones", and deletes the new version data and the stored data when a trigger condition is satisfied.
By embodying the data to be processed into the data to be modified and the data to be deleted, when the data to be processed is the data to be modified, the new version data is stored, the stored data (namely the old version data) is marked and deleted, and when the triggering condition is met, the stored data is actually deleted, so that asynchronous processing of storing the new version data and deleting the stored data is realized. When the data to be processed is the data to be deleted, the storage data corresponding to the current data to be deleted is firstly stored into the current unit space as new version data, the storage data and the new version data are marked and deleted, and when the triggering condition is met, the storage data and the new version data are actually deleted, so that asynchronous processing of marked deletion and actual deletion is realized. By asynchronous processing, the waiting time for deleting the stored data when the stored data is modified is saved, or the waiting time for immediately deleting the stored data when the stored data is deleted is saved, the data processing efficiency of the system is improved, and the read-write performance of the system is improved. Meanwhile, the data stored and the new version data corresponding to a small amount of current data are asynchronously processed, so that the data in the same scanning unit space is subjected to decentralized processing by database processing, excessive data which are not processed by receiving processing instructions cannot be generated in the same scanning unit space in the asynchronous processing process, the time consumption of invalid scanning can be reduced correspondingly, the condition that the processing time is overtime when the processing instructions are executed but the data which need to be executed are not scanned is avoided, the system resource is effectively utilized, and the stability and the usability of the database in the data processing process are improved.
In an optional embodiment of the present disclosure, according to data to be processed, according to a decentralized processing order, invoking a database to which a scan unit space corresponding to the decentralized data belongs, including: the following steps are executed in a multithreaded parallel manner: selecting a current unit space in the scanning unit space according to the dispersion processing sequence; aiming at the current unit space, acquiring target data corresponding to the current unit space in the data to be processed; selecting current data from the target data; wherein the scattered data comprises current data; and calling a database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
The multithreaded parallel manner may be a manner of thread parallel processing. Wherein, the current unit space can be selected from the scanning unit spaces according to the dispersion processing sequence; aiming at the current unit space, acquiring target data corresponding to the current unit space in the data to be processed; selecting current data from the target data; wherein the scattered data comprises current data; and according to the current data, calling a database to which the current unit space belongs to asynchronously process the stored data corresponding to the current data as a data processing task, wherein one thread executes one data processing task, and different threads execute different data processing tasks. Multiple threads execute multiple data processing tasks in parallel.
The different data processing tasks may be that the selected current unit space is different according to the decentralized processing sequence, or may be that the selected current unit space is the same, but the selected current data is different.
Alternatively, a certain scan cell space is selected as the current cell space according to the decentralized processing sequence, and the data processing task 1 is correspondingly generated, and the next adjacent scan cell space is selected as the current cell space according to the decentralized processing sequence, and the data processing task 2 is correspondingly generated, and so on. Data processing task 1 and data processing task 2 and … … are generated, and data processing task n is generated, wherein the current unit spaces corresponding to adjacent tasks are different. At least one thread of executable data processing tasks is acquired and data processing task 1-data processing task n are allocated in sequence. For example, k executable threads are k threads, k is smaller than n, and k threads can execute the data processing task 1-the data processing task k in parallel, the data processing task k+1 is allocated to a certain thread in the k threads if the execution of the certain thread is completed, the data processing task k+2 is allocated to the certain thread in the k threads if the execution of the certain thread is completed, and so on until all the data processing tasks n are executed. Wherein the allocation of tasks is related to whether there are executable threads. Tasks are assigned to executable threads in turn.
Corresponding data processing tasks are generated through the decentralized processing sequence, so that different scanning unit spaces corresponding to adjacent data processing tasks are realized, corresponding threads are distributed for the data processing tasks according to the decentralized processing sequence, the data processing tasks are executed in parallel by utilizing the threads of the executable tasks in the database, the utilization rate of thread resources in the database is improved, and the data processing efficiency of the database is improved by executing the data processing tasks in parallel through multiple threads.
According to the technical scheme, the current unit space is selected from the scanning unit spaces according to the decentralized processing sequence, so that discontinuous processing of data in the same scanning unit space is realized. Aiming at the current unit space, target data corresponding to the current unit space in data to be processed is obtained, the current data is selected from the target data, and according to the current data, the stored data corresponding to the current data is asynchronously processed by calling a database to which the current unit space belongs, so that the asynchronous processing of a small amount of stored data corresponding to the current data is realized, the data of the same scanning unit space is ensured to be subjected to distributed processing instead of centralized processing by database processing, so that invalid data which cannot be subjected to excessive receiving processing instructions in the same scanning unit space in the asynchronous processing process is realized, the data volume of the scanned invalid data is reduced, the time consumption of invalid scanning is reduced, the condition of overtime of request response is reduced, the system resource is effectively utilized, and the stability and the usability of the database in the data processing process are improved.
FIG. 3 is a flow chart of another artificial intelligence based data processing method disclosed in accordance with an embodiment of the present disclosure, further optimized and expanded based on the above-described technical solution, and may be combined with the various alternative implementations described above. The data processing method based on artificial intelligence further comprises the following steps: acquiring a database to which a distributed scanning unit space belongs; acquiring asynchronous processing pressure information of a database to which the asynchronous processing pressure information belongs; screening a target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database; and in the target database, the stored data in the scanning unit space corresponding to the data to be processed are moved to other databases for storage.
S301, acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution.
S302, acquiring a database to which the distributed scanning unit space belongs.
The storage data corresponding to the data to be processed are distributed in at least one scanning unit space. The database contains at least one scan cell space, typically the database comprises a plurality of scan cell spaces. At least one database exists. The storage data corresponding to the data to be processed can be distributed in the same database or can be distributed in different databases.
S303, acquiring asynchronous processing pressure information of the affiliated database.
The asynchronous processing pressure information may be pressure information of data processing by a database. The asynchronous processing pressure information may be used to determine the pressure of the asynchronous processing data of the database. The asynchronous processing pressure information may be characterized by the number of asynchronous processing or the speed of asynchronous processing of the database. The number of asynchronous processes may be the number of data that are asynchronously processed, and the asynchronous processing speed may be the amount of data that are asynchronously processed per second.
S304, screening the target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database.
The stored data corresponding to the current data are distributed in at least one scanning unit space, different scanning unit spaces belong to different databases, and can also belong to the same database, and correspondingly, the number of databases to which each distributed scanning unit space belongs is at least one, and target databases are screened from the databases. The target database is at least one of the databases to which the distributed scanning unit space belongs. The target database may refer to a database with a higher asynchronous processing pressure, and may be used as a database of stored data to be rolled out of asynchronous processing. The asynchronous processing pressure of the target database can be reduced by transferring out the stored data in the target database.
Specifically, according to the asynchronous processing pressure information of the database to which the scanning unit space belongs, if the asynchronous processing pressure of the database is too high, for example, a pressure value exceeding the processing capacity of the database, the database is taken as target data.
Alternatively, a preset asynchronous processing number threshold or a preset asynchronous processing speed threshold may be preset. The preset asynchronous processing quantity threshold value is an upper limit value of the preset asynchronous processing quantity of the database. The preset asynchronous processing speed threshold is an upper limit value of the asynchronous processing speed preset by the database. For example, if the asynchronous processing number in the asynchronous processing pressure information is greater than a preset asynchronous processing number threshold, determining that the asynchronous processing pressure of the database is too high; and if the asynchronous processing quantity in the asynchronous processing pressure information is smaller than or equal to a preset asynchronous processing quantity threshold value, determining that the asynchronous processing pressure of the database is moderate. The method for determining the asynchronous processing pressure by using the preset asynchronous processing speed threshold is the same as the method for determining the asynchronous processing pressure by using the preset asynchronous processing speed threshold, and will not be described herein. The asynchronous processing pressure information also includes a specific value of the asynchronous processing pressure of the database. The particular value of the asynchronous process pressure may also be characterized by the number of asynchronous processes or the speed of asynchronous processes. From the specific values of the asynchronous process pressures, specific values of the current process pressure of the database, pressure values exceeding the processing capacity of the database, or differences in the peak values of the database from the processing capacity may be determined.
S305, in the target database, the stored data in the scanning unit space corresponding to the data to be processed are moved to other databases to be stored.
The other databases are databases into which stored data are to be transferred. Other databases may shift the asynchronous processing pressure of the database to which the scan cell space belongs. Other databases need to have certain data asynchronous processing capability, so that after the stored data of the target database is transferred, the other databases can still normally perform data asynchronous processing. The other databases may be databases to which the scanning unit spaces storing the data distribution corresponding to the data to be processed belong, or databases unrelated to the data to be processed. As in the previous example, all the stored data in the scanning unit space with the number of data to be processed being within the preset number range can be transferred to other databases for storage.
Specifically, in the target database, the stored data in the scanning unit space corresponding to the data to be processed may be moved to other databases for storage, and the pointing information of the stored data in the data to be processed may be changed so as to route to the storage positions of the stored data.
S306, dispersing the data corresponding to the same distributed scanning unit space in the data to be processed, and determining the dispersing processing sequence of the data to be processed.
S307, according to the data to be processed, calling the database to which the scanning unit space corresponding to the scattered data belongs according to the scattered processing sequence, so that the database asynchronously processes the storage data corresponding to the scattered data.
In addition, when screening the target database, it is also necessary to detect whether the amount of data to be processed corresponding to the scan cell space in the same affiliated database is appropriate. On the one hand, if the data corresponding to the same scanning unit space is too large, the asynchronous processing pressure of the moving-in database is too large, and the normal asynchronous processing of the stored data corresponding to the data cannot be performed; on the other hand, if the data corresponding to the same scan cell space is too small, asynchronous processing pressure relief to the associated database is limited. Therefore, when the number of the data to be processed corresponding to the scanning unit space in the same affiliated database is within the preset number range and the asynchronous processing pressure of the affiliated database is overlarge, the database affiliated to the scanning unit space is determined to be the target database, and the scanning unit space is determined to be the scanning unit space to be moved out.
In an optional embodiment of the present disclosure, moving the stored data in the scan cell space corresponding to the data to be processed to another database for storage includes: acquiring the storage quantity of storage data corresponding to the data to be processed in a scanning unit space corresponding to the data to be processed; and under the condition that the storage quantity is in the preset quantity range, moving the storage data in the scanning unit space corresponding to the data to be processed to other databases for storage.
The storage number may be the number of storage data corresponding to the data to be processed in the scan cell space. The amount of storage may also reflect the asynchronous processing pressure of the scan cell space. The preset number range may be a range of a preset storage number. The predetermined number of ranges may be used to determine whether the stored data may be transferred to achieve optimal performance of the system processing data for each database after transfer. If the storage quantity is within the preset quantity range, the storage data can be moved; if the storage quantity is not within the preset quantity range, the storage data does not need to be moved.
Specifically, in each scanning unit space corresponding to the data to be processed, the storage quantity of the storage data corresponding to each scanning unit space is obtained; judging whether the storage quantity is in a preset quantity range, if so, moving the storage data corresponding to the data to be processed in the scanning unit space, moving the storage data to other databases for storage, and changing the pointing information in the data to be processed. If the storage quantity is not in the preset quantity range, the storage data corresponding to the data to be processed in the scanning unit space does not need to be moved.
By setting the preset quantity range, a proper amount of stored data is moved, the asynchronous processing pressure of the space of the scanning unit to which the stored data belongs can be relieved by moving the stored data, the normal asynchronous processing capacity of other databases transferred into the stored data is considered, and the system resources can be normally utilized by the databases of both sides after movement and the availability of the databases of both sides after movement can be realized.
According to the technical scheme of the present disclosure, a database to which a distributed scanning unit space belongs is obtained; acquiring asynchronous processing pressure information of a database to which the asynchronous processing pressure information belongs; screening a target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database; in the target database, the stored data in the scanning unit space corresponding to the data to be processed are moved to other databases for storage, and the stored data corresponding to the data to be processed in the target database is moved to other databases for storage by screening the target database, so that the asynchronous processing pressure of the databases of the scanning unit space is reduced, the databases have the asynchronous processing capability of normally carrying out data, and the availability of the databases of the data processing is improved.
FIG. 4 is a schematic diagram of an artificial intelligence based data processing system according to aspects of the present disclosure. The system may perform the artificial intelligence based data processing method of any of the embodiments of the present disclosure. As shown in fig. 4, the data processing system includes: a first node 401 performing the data processing method of any of the embodiments of the present disclosure, and a second node 402; the first node is used for acquiring data to be processed and determining a scanning unit space corresponding to the data to be processed and storing data distribution; in the data to be processed, dispersing the data corresponding to the same distributed scanning unit space, and determining the dispersing processing sequence of the data to be processed; according to the decentralized processing sequence, calling a second node to which a scanning unit space corresponding to the data to be processed belongs; the second node is used for asynchronously processing the storage data corresponding to the data to be processed.
The first node is used for receiving data to be processed, dispersing the data corresponding to the scanning unit space with the same distribution, and determining a dispersing processing sequence. The first node may invoke the second node. And the second node asynchronously processes the storage data corresponding to the data to be processed. The data processing system based on artificial intelligence realizes the data processing method of the embodiment of the disclosure through the first node. The number of the second nodes is at least one, the first node can call the second nodes, and the second nodes are databases, so that asynchronous data processing is realized. The first node may be a distributed database supporting transactions, and may implement a data storage manner for multi-version data. The second node is typically an asynchronously processed database. The second node is used for adding, deleting and checking the stored data, the first node stores the routing information of the stored data in the second node, and the routing information is used for positioning the stored second node of the stored data and the data fragments in the second node.
Fig. 5 is a scene diagram of a data asynchronous deletion method disclosed in accordance with an embodiment of the present disclosure. As shown in fig. 5, the data processing system includes: a first node and a second node; the first node is used for acquiring the data to be deleted, carrying out garbage collection on the data to be deleted, and calling the second node. The second node is used for asynchronously deleting the data to be deleted. When the data processing system performs asynchronous deletion of data, the second node is required to mark the stored data corresponding to the data to be deleted as a tombstone, and then performs asynchronous deletion of the stored data when the triggering condition is met, so that the second node can actually delete the stored data. The current distributed databases supporting transactions mostly support the realization of the transactions and ensure better writing performance through a data storage mode of multi-version data. At the same time, these data processing systems also largely employ a storage engine based on LSM-Tree (Log-Structured Merge-Tree), such as a second node and a first node. Such engines also employ asynchronous deletion to ensure write performance. The first node firstly carries out garbage collection on the data by marking the data as tombstones, and then the second node completes actual deletion of normal data when compressing the file. When a user deletes a large amount of data in the scan cell space, asynchronous deletion may cause a large contiguous segment of the data tombstone area (i.e., the data area labeled "tombstone") to appear in the system. At this time, once the scanning operation is performed on the following data, the data scanning sequence is from front to back, and the area of the invalid data is scanned, so that the system request is overtime, the computing resource of the system is greatly consumed, the operation of the system is affected, and the system is crashed in severe cases. In the existing system, when the system or the storage engine tombstone affects data scanning too much, the data scanning is generally performed after waiting for the completion of asynchronous deletion by timely finding and ending the damage through statistical indexes such as scanning time or scanning data quantity. However, waiting for the system to recover can not provide a read service, which can affect the usability of the system.
Fig. 6 is a scene diagram of a data asynchronous deletion method according to an embodiment of the present disclosure. In order to provide normal reading service when a large amount of data is deleted by the system, a novel method for deleting multi-version data of a distributed database is designed and realized. As shown in fig. 6, the concentrated data to be deleted is changed into scattered data, so that user-scope data deleting or modifying operation can be perceived and scattered to a plurality of partitions to relieve the pressure of garbage collection in a certain concentrated area. The two-layer tombstone can be effectively dispersed, the influence of a large number of continuous garbage data on the data scanning operation is avoided, and the overall garbage data processing speed of the system is improved through concurrency. In addition, in the dimension of the second nodes, the system can sense the pressure of generating the garbage data on each second node (namely, the asynchronous processing pressure of the stored data corresponding to the data to be deleted of the garbage collection of the first node), and when the asynchronous processing pressure is overlarge, part of the stored data is moved to other idle databases, so that the defect of insufficient capacity of some second nodes for compressing files is avoided.
According to the technical scheme of the disclosure, the data processing system comprises: a first node and a second node performing the data processing method of any of the embodiments of the present disclosure; the method comprises the steps of acquiring data to be processed through a first node, determining a scanning unit space corresponding to the data to be processed and storing data distribution, dispersing data corresponding to the same distributed scanning unit space in the data to be processed, determining a dispersing processing sequence of the data to be processed, and calling a second node corresponding to the scanning unit space corresponding to the data to be processed according to the dispersing processing sequence; the second node is used for asynchronously processing the stored data corresponding to the data to be processed, the data processing sequence of the same scanning unit space is adjusted to be a discontinuous and non-centralized processing sequence from a continuous and centralized processing sequence through the first node, so that a large amount of data which are not processed by the received processing instructions are prevented from being existed in the same scanning unit space, the time consumption of scanning the data which are not processed by the received processing instructions can be reduced, the time consumption of invalid scanning is correspondingly reduced, the condition of overtime response request and repeated response failure of the second node is avoided, and the stability and the usability of the database in the data processing process are improved through the mutual cooperation of the first node and the second node. And asynchronous processing is carried out on the data, so that waiting time is saved, the data processing efficiency is improved, and the reading and writing performance of the database is considered.
FIG. 7 is a block diagram of an artificial intelligence based data processing apparatus in an embodiment of the present disclosure, which is applicable to the case of operating an artificial intelligence based data processing method, according to an embodiment of the present disclosure. The device is realized by software and/or hardware, and is specifically configured in the electronic equipment with certain data operation capability.
An artificial intelligence based data processing apparatus 700 as shown in fig. 7, comprising: a unit space determination module 701, a processing order determination module 702, and a database call module 503. The unit space determining module 701 is configured to obtain data to be processed, and determine a scanning unit space of a stored data distribution corresponding to the data to be processed; a processing sequence determining module 702, configured to disperse data corresponding to the same distributed scanning unit space in the data to be processed, and determine a decentralized processing sequence of the data to be processed; the database calling module 703 is configured to call, according to the decentralized processing order, a database to which the scanning unit space corresponding to the decentralized data belongs, so that the database to which the scanning unit space corresponding to the decentralized data belongs asynchronously processes the storage data corresponding to the decentralized data.
According to the technical scheme, the data to be processed is obtained, and the scanning unit space of the storage data distribution corresponding to the data to be processed is determined; in the data to be processed, the data corresponding to the same distributed scanning unit space is scattered, and the scattered processing sequence of the data to be processed is determined; and calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database asynchronously processes the storage data corresponding to the data to be processed, and adjusting the data processing sequence of the same scanning unit space from a continuous concentrated processing sequence to a discontinuous non-concentrated processing sequence. The method and the device avoid a large amount of unprocessed data of the received processing instructions in the same scanning unit space, can reduce the time consumption of scanning the unprocessed data of the received processing instructions, correspondingly reduce the invalid scanning time, and improve the stability and the usability of the database in the data processing process. And asynchronous processing is carried out on the data, so that waiting time is saved, the data processing efficiency is improved, and the reading and writing performance of the database is considered.
In an alternative embodiment of the present disclosure, the database call module 703 includes: the current space selection unit is used for selecting a current unit space from the scanning unit spaces according to the dispersion processing sequence; the target data acquisition unit is used for acquiring target data corresponding to the current unit space in the data to be processed aiming at the current unit space; a current data selection unit for selecting current data from the target data; and the database calling unit is used for calling the database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
In an alternative embodiment of the present disclosure, the amount of current data is less than or equal to a preset amount threshold.
In an alternative embodiment of the present disclosure, the apparatus further comprises: the database acquisition module is used for acquiring a database to which the distributed scanning unit space belongs; the pressure information acquisition module is used for acquiring asynchronous processing pressure information of the affiliated database; the target database screening module is used for screening the target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database; and the stored data moving module is used for moving the stored data in the scanning unit space corresponding to the data to be processed to other databases for storage in the target database.
In an alternative embodiment of the present disclosure, a storage data movement module includes: the storage quantity acquisition unit is used for acquiring the storage quantity of the storage data corresponding to the data to be processed in the scanning unit space corresponding to the data to be processed; and the stored data moving unit is used for moving the stored data in the scanning unit space corresponding to the data to be processed to other databases for storage under the condition that the storage quantity is in the preset quantity range.
In an alternative embodiment of the present disclosure, a database calling unit includes: the new version data generation subunit is used for generating new version data corresponding to the current data; the new version data writing subunit is used for writing the new version data into a database to which the current unit space belongs; and the database calling subunit is used for calling the database to which the current unit space belongs so as to asynchronously process the stored data and the new version data corresponding to the current data.
In an alternative embodiment of the present disclosure, the data to be processed includes data to be deleted and/or data to be modified.
In an alternative embodiment of the present disclosure, the database call module 703 includes: the following units are executed in a multithreaded parallel manner: the current space selection unit is used for selecting a current unit space from the scanning unit spaces according to the dispersion processing sequence; the target data acquisition unit is used for acquiring target data corresponding to the current unit space in the data to be processed aiming at the current unit space; a current data selection unit for selecting current data from the target data; and the database calling unit is used for calling the database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
The data processing device based on the artificial intelligence can execute the data processing method based on the artificial intelligence provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the data processing method based on the artificial intelligence.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program object.
Fig. 8 illustrates a schematic area diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as artificial intelligence based data processing methods. For example, in some embodiments, the artificial intelligence based data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more of the steps of the artificial intelligence based data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the artificial intelligence based data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application specific standard objects (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or region diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (19)

1. A data processing method based on artificial intelligence, comprising:
acquiring data to be processed, and determining a scanning unit space corresponding to the data to be processed and storing data distribution; wherein the scanning unit space is a storage space between a scanning start position and a scanning end position of a single scanning; the scanning unit space comprises at least one data slice or part of data slices;
Dispersing data corresponding to the same distributed scanning unit space in the data to be processed, and determining a dispersing processing sequence of the data to be processed; the processing sequence comprises the processing sequence of the scanning unit space and the processing sequence of the data corresponding to each scanning unit space in the data to be processed;
and calling a database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence, so that the database to which the scanning unit space corresponds asynchronously processes the storage data corresponding to the data to be processed.
2. The method of claim 1, wherein the calling, according to the decentralized processing order, the database to which the scan cell space corresponding to the data to be processed belongs includes:
selecting a current unit space from the scanning unit spaces according to the dispersion processing sequence;
aiming at a current unit space, acquiring target data corresponding to the current unit space in the data to be processed;
selecting current data from the target data;
and calling a database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
3. The method of claim 2, wherein the current data amount is less than or equal to a preset amount threshold.
4. The method of claim 1, further comprising:
acquiring a database to which the distributed scanning unit space belongs;
acquiring asynchronous processing pressure information of the affiliated database;
screening a target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database;
and in the target database, the storage data in the scanning unit space corresponding to the data to be processed are moved to other databases for storage.
5. The method of claim 4, wherein the moving the stored data in the scan cell space corresponding to the data to be processed to another database for storage includes:
acquiring the storage quantity of storage data corresponding to the data to be processed in a scanning unit space corresponding to the data to be processed;
and under the condition that the storage quantity is in a preset quantity range, moving the storage data in the scanning unit space corresponding to the data to be processed to other databases for storage.
6. The method of claim 2, wherein the calling the database to which the current unit space belongs according to the current data comprises:
Generating new version data corresponding to the current data;
writing the new version data into a database to which the current unit space belongs;
and calling a database to which the current unit space belongs to asynchronously process the stored data corresponding to the current data and the new version data.
7. The method of claim 6, wherein the data to be processed comprises data to be deleted and/or data to be modified.
8. The method of claim 1, wherein the calling, according to the decentralized processing order, the database to which the scan cell space corresponding to the data to be processed belongs includes:
the following steps are executed in a multithreaded parallel manner:
selecting a current unit space from the scanning unit spaces according to the dispersion processing sequence;
aiming at a current unit space, acquiring target data corresponding to the current unit space in the data to be processed;
selecting current data from the target data;
and calling a database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
9. An artificial intelligence based data processing apparatus comprising:
The unit space determining module is used for acquiring data to be processed and determining a scanning unit space of storage data distribution corresponding to the data to be processed; wherein the scanning unit space is a storage space between a scanning start position and a scanning end position of a single scanning; the scanning unit space comprises at least one data slice or part of data slices;
the processing sequence determining module is used for dispersing the data corresponding to the same distributed scanning unit space in the data to be processed and determining the dispersing processing sequence of the data to be processed; the processing sequence comprises the processing sequence of the scanning unit space and the processing sequence of the data corresponding to each scanning unit space in the data to be processed;
and the database calling module is used for calling the database to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence so that the database to which the scanning unit space corresponding to the data to be processed belongs asynchronously processes the storage data corresponding to the data to be processed.
10. The apparatus of claim 9, wherein the database call module comprises:
a current space selecting unit for selecting a current unit space among the scan unit spaces according to the dispersion processing order;
The target data acquisition unit is used for acquiring target data corresponding to the current unit space in the data to be processed aiming at the current unit space;
a current data selecting unit for selecting current data from the target data;
and the database calling unit is used for calling the database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
11. The apparatus of claim 10, wherein the current amount of data is less than or equal to a preset amount threshold.
12. The apparatus of claim 9, further comprising:
the database acquisition module is used for acquiring a database to which the distributed scanning unit space belongs;
the pressure information acquisition module is used for acquiring asynchronous processing pressure information of the affiliated database;
the target database screening module is used for screening the target database in the affiliated database according to the asynchronous processing pressure information of the affiliated database;
and the stored data moving module is used for moving the stored data in the scanning unit space corresponding to the data to be processed to other databases for storage in the target database.
13. The apparatus of claim 12, wherein the stored data movement module comprises:
a storage quantity obtaining unit, configured to obtain, in a scanning unit space corresponding to the data to be processed, a storage quantity of storage data corresponding to the data to be processed;
and the stored data moving unit is used for moving the stored data in the scanning unit space corresponding to the data to be processed to other databases for storage under the condition that the stored number is in the preset number range.
14. The apparatus of claim 10, wherein the database call unit comprises:
a new version data generation subunit, configured to generate new version data corresponding to the current data;
the new version data writing subunit is used for writing the new version data into a database to which the current unit space belongs;
and the database calling subunit is used for calling the database to which the current unit space belongs so as to asynchronously process the stored data corresponding to the current data and the new version data.
15. The apparatus of claim 14, wherein the data to be processed comprises data to be deleted and/or data to be modified.
16. The apparatus of claim 9, wherein the database call module comprises:
the following units are executed in a multithreaded parallel manner:
a current space selecting unit for selecting a current unit space among the scan unit spaces according to the dispersion processing order;
the target data acquisition unit is used for acquiring target data corresponding to the current unit space in the data to be processed aiming at the current unit space;
a current data selecting unit for selecting current data from the target data;
and the database calling unit is used for calling the database to which the current unit space belongs according to the current data so as to asynchronously process the storage data corresponding to the current data.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the artificial intelligence based data processing method of any one of claims 1 to 8.
18. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the artificial intelligence based data processing method according to any one of claims 1-8.
19. An artificial intelligence based data processing system comprising: a first node performing the data processing method of any of claims 1-8, and a second node;
the first node is used for acquiring data to be processed and determining a scanning unit space corresponding to the data to be processed and storing data distribution;
dispersing data corresponding to the same distributed scanning unit space in the data to be processed, and determining a dispersing processing sequence of the data to be processed;
calling a second node to which the scanning unit space corresponding to the data to be processed belongs according to the decentralized processing sequence;
the second node is used for asynchronously processing the storage data corresponding to the data to be processed.
CN202211262585.5A 2022-10-14 2022-10-14 Data processing method, device, equipment and storage medium based on artificial intelligence Active CN115599838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211262585.5A CN115599838B (en) 2022-10-14 2022-10-14 Data processing method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211262585.5A CN115599838B (en) 2022-10-14 2022-10-14 Data processing method, device, equipment and storage medium based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN115599838A CN115599838A (en) 2023-01-13
CN115599838B true CN115599838B (en) 2023-09-29

Family

ID=84846614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211262585.5A Active CN115599838B (en) 2022-10-14 2022-10-14 Data processing method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN115599838B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253821A (en) * 2011-04-12 2011-11-23 深圳市蓝韵实业有限公司 Data transmission processing method in ultrasonic diagnostic equipment
JP2015060285A (en) * 2013-09-17 2015-03-30 株式会社日立システムズ Global transaction processing method in cloud computing
CN106454003A (en) * 2016-09-28 2017-02-22 理光图像技术(上海)有限公司 Scanning processing apparatus
CN107894997A (en) * 2017-10-19 2018-04-10 苏州工业大数据创新中心有限公司 The inquiry processing method and system of industrial time series data
US11023457B1 (en) * 2018-10-19 2021-06-01 Palantir Technologies Inc. Targeted sweep method for key-value data storage
CN114490662A (en) * 2022-02-14 2022-05-13 广东南方数码科技股份有限公司 Workflow data storage method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253821A (en) * 2011-04-12 2011-11-23 深圳市蓝韵实业有限公司 Data transmission processing method in ultrasonic diagnostic equipment
JP2015060285A (en) * 2013-09-17 2015-03-30 株式会社日立システムズ Global transaction processing method in cloud computing
CN106454003A (en) * 2016-09-28 2017-02-22 理光图像技术(上海)有限公司 Scanning processing apparatus
CN107894997A (en) * 2017-10-19 2018-04-10 苏州工业大数据创新中心有限公司 The inquiry processing method and system of industrial time series data
US11023457B1 (en) * 2018-10-19 2021-06-01 Palantir Technologies Inc. Targeted sweep method for key-value data storage
CN114490662A (en) * 2022-02-14 2022-05-13 广东南方数码科技股份有限公司 Workflow data storage method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Won Gi Choi 等.OurRocks: Offloading Disk Scan Directly to GPU in Write-Optimized Database System.《IEEE Transactions on Computers》.2021,1831 - 1844. *
文明波 ; 丁治明 ; .适用于云计算的面向查询数据库数据分布策略.计算机科学.2010,(09),174-178. *
李梁.内存数据管理与分析关键技术研究.《中国博士学位论文全文数据库》.2022,I138-70. *

Also Published As

Publication number Publication date
CN115599838A (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN109271343B (en) Data merging method and device applied to key value storage system
CN110019873B (en) Face data processing method, device and equipment
US8751446B2 (en) Transference control method, transference control apparatus and recording medium of transference control program
US20180150536A1 (en) Instance-based distributed data recovery method and apparatus
CN111722918A (en) Service identification code generation method and device, storage medium and electronic equipment
CN113806300A (en) Data storage method, system, device, equipment and storage medium
CN112613964A (en) Account checking method, account checking device, account checking equipment and storage medium
CN110222046B (en) List data processing method, device, server and storage medium
CN111062634A (en) Approval task allocation method and device, computer equipment and storage medium
CN112650449B (en) Method and system for releasing cache space, electronic device and storage medium
CN113553216A (en) Data recovery method and device, electronic equipment and storage medium
CN115599838B (en) Data processing method, device, equipment and storage medium based on artificial intelligence
CN115904240A (en) Data processing method and device, electronic equipment and storage medium
CN115422231A (en) Data page processing method and device, electronic equipment and medium
CN115408547A (en) Dictionary tree construction method, device, equipment and storage medium
CN111625500B (en) File snapshot method and device, electronic equipment and storage medium
CN111061719B (en) Data collection method, device, equipment and storage medium
CN114968950A (en) Task processing method and device, electronic equipment and medium
CN114691781A (en) Data synchronization method, system, device, equipment and medium
CN114416885A (en) Data synchronization method and device based on DRBD, computer equipment and storage medium
CN113961641A (en) Database synchronization method, device, equipment and storage medium
CN111061712A (en) Data connection operation processing method and device
CN113326890B (en) Labeling data processing method, related device and computer program product
CN116431561B (en) Data synchronization method, device, equipment and medium based on heterogeneous many-core accelerator card
WO2023246654A1 (en) Data management method and apparatus, system, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant