CN117971137B - Multithreading-based large-scale vector data consistency assessment method and system - Google Patents

Multithreading-based large-scale vector data consistency assessment method and system Download PDF

Info

Publication number
CN117971137B
CN117971137B CN202410391753.3A CN202410391753A CN117971137B CN 117971137 B CN117971137 B CN 117971137B CN 202410391753 A CN202410391753 A CN 202410391753A CN 117971137 B CN117971137 B CN 117971137B
Authority
CN
China
Prior art keywords
vector data
mode
data
thread
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410391753.3A
Other languages
Chinese (zh)
Other versions
CN117971137A (en
Inventor
顾雪平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Hairun Shuju Technology Co ltd
Original Assignee
Shandong Hairun Shuju Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Hairun Shuju Technology Co ltd filed Critical Shandong Hairun Shuju Technology Co ltd
Priority to CN202410391753.3A priority Critical patent/CN117971137B/en
Publication of CN117971137A publication Critical patent/CN117971137A/en
Application granted granted Critical
Publication of CN117971137B publication Critical patent/CN117971137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of vector data processing, and discloses a method and a system for evaluating the consistency of large-scale vector data based on multithreading, wherein the method comprises the steps of determining the use mode of shared vector access mode to know vector data in an application program, selecting a proper concurrency control mode, designing a data structure, realizing locking and unlocking processing of read-write locks, waiting and awakening processing by using condition variables, testing and optimizing, processing abnormal conditions, monitoring and analyzing performance.

Description

Multithreading-based large-scale vector data consistency assessment method and system
Technical Field
The invention relates to the technical field of vector data processing, in particular to a method and a system for evaluating large-scale vector data consistency based on multithreading.
Background
Vector data refers to data organized in the form of vectors, which in mathematical and computer science are an ordered collection of data in which each element has a corresponding position or index. Vector data is widely used in various fields including linear algebra, statistics and machine learning.
In the prior art, in the process of processing a large amount of vector data, a multithreading parallel operation mode is often adopted, so that the operation efficiency of the vector data is improved.
However, in the actual operation process, for example, in the scheme provided in the chinese patent application CN114138808A, CN102033948a, since a large amount of vector data may be updated in a data portion during the parallel operation process, the data is inconsistent or incomplete, for example, one thread is modifying the data, while another thread simultaneously reads the partially modified data, the consistency of the data during the operation process may be affected, so that the operation accuracy of the whole set of data is affected, and in view of this, a method and system for evaluating the consistency of large-scale vector data based on multiple threads are proposed.
Disclosure of Invention
In order to make up for the defects, the invention provides a method and a system for evaluating the consistency of large-scale vector data based on multithreading, which aim to solve the problem that a large amount of vector data in the prior art may have inconsistent data in the operation process.
In order to achieve the above object, the present invention adopts a method for evaluating the consistency of large-scale vector data based on multithreading, comprising the following steps:
s1, determining an access mode of vector data: determining an access mode of the vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write once and frequent updating;
s2, defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode determined in the step S1, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock;
S3, initializing a storage structure and an interface of vector data: designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode defined in the step S2, and initializing and storing the vector data based on the data interface according to the data storage structure;
S4, locking processing of the read-write lock: based on the vector data operation request of the thread, the access mode and the concurrency control mode corresponding to the thread are identified, and a critical area of the vector data operation is identified, and locking processing is carried out on the critical area based on the access mode and the concurrency control mode;
S5, using a condition variable: returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result;
In the process of executing the locking and returning results in the step S4 and the step S5, the performance and the state of the vector data operation are monitored in real time, and when an abnormal situation occurs, the abnormal situation is processed according to a preset error processing mechanism, wherein the abnormal situation comprises a deadlock and a race condition, the preset error processing mechanism is preset according to different errors, and in the vector data processing, the preset error processing mechanism is dynamically updated according to the generated new errors.
Correspondingly, the invention also provides a multithreading-based large-scale vector data consistency evaluation system, which comprises a data access module, a concurrency control module, a storage initialization module, a locking processing module and an evaluation and optimization module;
The data access module is used for determining an access mode of vector data: determining an access mode of the vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write once and frequent updating; the concurrency control module is used for defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock; the storage initialization module is used for initializing a storage structure and an interface of vector data: designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode, and initializing and storing the vector data based on the data interface according to the data storage structure; the locking processing module is used for locking processing of the read-write lock: based on the vector data operation request of the thread, the access mode and the concurrency control mode corresponding to the thread are identified, and a critical area of the vector data operation is identified, and locking processing is carried out on the critical area based on the access mode and the concurrency control mode; the evaluation and optimization module is used for returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result.
Compared with the prior art, the invention has the following beneficial effects:
1. before the vector operation process is carried out, the version numbers are allocated for the current vector data according to the operation state of the vector data, and the version numbers are continuously updated in the subsequent operation process when the vector data are changed, so that the system can continuously carry out verification according to the version numbers of the group of vector data, and when the verification obtains the abnormal version numbers, the initialization of the vector data is re-executed, thereby avoiding the influence on the accuracy of the whole group of data caused by the continuous operation of the abnormal data.
2. Before the vector calculation process is carried out, the method judges according to the operation condition of the vector, and carries out screening operation on the vector data according to whether the operation condition is met or not, so that the vector which is directly operated can carry out normal operation, and part of vector data which is required to be triggered by the pre-condition can be operated after being judged in the program, thereby reducing the addition quantity of a read-write lock in the vector data operation process and improving the operation efficiency of the program on the vector data.
3. According to the invention, different threads can be allocated to a plurality of groups of vector data which are operated simultaneously through the parallel control module, a waiting sequence can be formed when no spare calculation force is temporarily generated, and subsequent vector data are operated according to the waiting sequence when the operation of the front-end thread is finished, so that the operation efficiency of the system on a large quantity of vector data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for evaluating the consistency of large-scale vector data based on multiple threads;
FIG. 2 is a schematic diagram of a system for evaluating the consistency of large-scale vector data based on multiple threads.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Referring to fig. 1, the present invention provides an embodiment: a method for evaluating the consistency of large-scale vector data based on multithreading comprises the following steps:
S1, determining an access mode of vector data: the method comprises the steps of determining an access mode of vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write-once and frequent updating.
In this embodiment, in step S1, the step of determining the access mode of the vector data according to the usage mode of the vector data in the application program further includes:
S101, extracting vector data categories, vector node names and application targets from the application program;
S102, determining an access right control type and an access level of the vector data according to the vector data category, the vector node name and the application target;
S103, determining a using mode of the vector data according to the access right control type of the vector data, wherein if the access right control type of the vector data is read-only, the access mode of the vector data is the reading mode, and if the access right control type of the vector data is writing, the access mode of the vector data is the writing mode;
S104, determining the level of the reading mode according to the access level of the vector data, wherein if the access level of the vector data is temporary, the level of the reading mode is write-once, and if the access level of the vector data is durable, the level of the reading mode is frequently updated;
the access level of the vector data is set by the application program according to different users.
S2, defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode determined in the step S1, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock.
In this embodiment, in step S2, the step of defining the concurrency control mode of the vector data according to the access mode determined in step S1, and adding a timestamp and a validity tag further includes:
S201, algorithm operation: importing the access mode determined in the step S1 to a concurrency control algorithm program to perform concurrency control operation;
s202, thread allocation: invoking the current idle operation thread to allocate the operation thread for the concurrent control operation in the step S201, calculating resources and a buffer space, and recording an intermediate calculation result and a final calculation result of the concurrent control operation;
Stopping concurrent control operation if the intermediate calculation result is in a reading mode, wherein the final calculation result is-1, the final calculation result is 0 if the intermediate calculation result is write-once, and the final calculation result is 1 if the intermediate calculation result is frequently updated;
S203, determining a concurrency control mode of the vector data according to the final calculation result, wherein the concurrency control mode is the read-write lock when the final calculation result is-1, and is the mutual exclusion lock when the final calculation result is 0 or 1;
S204, adding a time stamp and a validity tag for the concurrency control mode, and initializing the initial value of the time stamp of the concurrency control mode to be 0 and the validity tag to be invalid;
Wherein the validity tag includes invalid and valid.
S3, initializing a storage structure and an interface of vector data: and designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode defined in the step S2, and initializing and storing the vector data based on the data interface according to the data storage structure.
In this embodiment, in step S3, the step of designing the data storage structure, the operation interface, and the data interface of the corresponding vector data according to the concurrency control mode defined in step S2, and according to the data storage structure further includes:
S301, defining a data structure: designing a data storage structure of the corresponding vector data according to the concurrency control mode defined in the step S2, wherein the data storage structure comprises a vector node name, a storage format, a vector length, a vector dimension, an index structure, a storage space and a memory resource;
S302, interface definition: defining an operation interface and a data interface for an addition operation, a deletion operation and a query operation of the vector data respectively based on a data storage structure of the vector data;
s303, initializing data: through the data interface, and call the memory resource defined in step 301, store the initial vector data into the memory space defined in step 301 according to the data storage structure of the vector data;
When the vector data is initialized according to the data storage structure of the vector data, the vector data is encrypted through an encryption algorithm, and the vector data is converted into encrypted data to be stored in the storage space.
S4, locking processing of the read-write lock: and identifying the access mode and the concurrency control mode corresponding to the thread based on the vector data operation request of the thread received by the operation interface in real time, and carrying out locking processing on a critical area of the vector data operation based on the access mode and the concurrency control mode.
In this embodiment, in step S4, the step of locking the critical area based on the access mode and the concurrency control mode includes:
S4011, determining the access mode, the concurrency control mode and the critical area of the thread on the vector data operation corresponding to the thread according to the analysis result of the vector data operation request of the thread;
S4012, locking the critical area according to the access mode and the concurrency control mode,
If the access mode is a read mode, adding the read-write lock to the critical area according to the concurrency control mode, acquiring a time stamp when the read-write lock is added, and enabling validity labels of the read-write lock to be valid;
If the access mode is write-once, adding the mutual exclusion lock for the critical area according to the concurrency control mode, setting the value of the timestamp of the mutual exclusion lock to be 1, and enabling the validity labels of the read-write lock to be valid;
if the access mode is frequently updated, adding the mutual exclusion lock to the critical area according to the concurrency control mode, acquiring a time stamp when the mutual exclusion lock is added, and enabling validity labels of the read-write lock to be valid;
When the final calculation result is-1, the concurrency control mode is the read-write lock, and when the final calculation result is 0 or 1, the concurrency control mode is the mutual exclusion lock;
S4013, obtaining vector data corresponding to the vector data operation request of the thread, performing read-write processing on the vector data according to the vector data operation request of the thread, and returning a processing result to the thread;
wherein the vector data operation request corresponds to the access pattern, including a read operation request and a write operation request.
It should be noted that the reading mode of vector data generally depends on the source of the data and the requirements of the application program. For example, if vector data is stored in a database, an application may need to read the data through an SQL query. If the data is stored in a file, it may be necessary to read the data using file I/O operations. Knowing the read mode facilitates selection of an appropriate data storage and access method, thereby improving the performance of the application.
Furthermore, it is also very important to know whether vector data is written once or updated frequently. If the vector data is write-once, then the data may be updated little or no at all during the lifecycle of the application. In this case, an efficient storage solution can be selected to reduce costs. In contrast, if vector data needs to be updated frequently, a storage solution supporting random writing needs to be used to ensure that the application can update the data quickly and efficiently.
The characteristics of the access pattern need to be taken into account when selecting the appropriate concurrency control pattern. If both read and write operations on the shared resource need to be locked, the mutual exclusion lock is a proper choice; if the read operation is far more than the write operation, the read-write lock can improve concurrency performance; atomic operations can reduce lock contention and improve concurrency performance if access to shared resources is very frequent and simple to operate.
The mutual exclusion lock allows a plurality of threads to read the shared resource at the same time, but when the shared resource is written in, the mutual exclusion lock can prevent the read-write operation of other threads, so that only one thread can modify the resource at the same time. Read-write locks are relatively flexible, allowing multiple threads to read a shared resource at the same time, but allowing only one thread to operate while writing to a resource, and not preventing read operations by other threads.
After the critical area to be protected is identified in the code, the locking and unlocking processes can be performed by using a mutual exclusion lock or a read-write lock according to actual requirements. The locking process may prevent other threads from entering the critical area and the unlocking process may release the lock and allow other threads to enter the critical area.
In this embodiment, in step S4013, when performing read-write processing on the vector data according to the vector data operation request of the thread and returning a processing result to the thread, the method further includes a step of vector data status processing, specifically:
S4013-1, judging conditions: before the vector data is read and written, judging whether corresponding read operation or write operation can be executed according to the use state of the vector data, wherein the use state comprises processing idle state and processing state;
s4013-2, wait or wake-up processing: if the use state of the vector data is in process, setting the execution state of the thread to be waiting, waking up the thread when the use state of the vector data is updated to be idle, and executing corresponding reading operation or writing operation;
s4013-3, the operation is performed: if the using state of the vector data is idle, the using state of the vector data is set in data processing, vector data corresponding to the vector data operation request of the thread is obtained, and corresponding reading operation or writing operation is executed on the vector data according to the vector data operation request of the thread;
S4013-4, returning a result: the thread obtains a processing result of the read operation or the write operation.
S5, using a condition variable: and returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result.
In this embodiment, the step S5 includes a step of performing an unlocking process in the critical area, specifically:
S5011, the thread sends the returned processing result to the application program and receives a receipt of the application program;
S5012, adding a processing tag corresponding to the thread to vector data corresponding to the thread according to the receipt, adding 1 to the value of the processing tag, and setting the use state of the vector data as processing idle;
wherein the initial value of the processing tag is 0.
In this embodiment, the step of S5 further includes:
S5021, performance monitoring: performing real-time monitoring on the reading operation or the writing operation of the process, and returning a monitoring result, wherein the monitoring result comprises the following steps: the return time and the result consistency;
S5022, result consistency analysis: comparing the data format of the processing result with the storage format of the vector data, if the data format of the processing result is the same as the storage format of the vector data, jumping to step S5023, otherwise, returning inconsistent results, and jumping to step S4;
s5023, performance analysis: comparing the return time with a preset return time, if the return time is larger than the preset return time, adjusting and optimizing a performance optimization strategy, and if the return time is smaller than or equal to the preset return time, sending a returned processing result to an application program by the thread;
The preset return time is set according to the actual corresponding requirements of the application program;
The performance optimization strategy is a strategy for scheduling threads for executing the read operation and the write operation of the vector data and optimizing the locking and unlocking processes of the vector data.
In the process of executing the locking and returning results in the step S4 and the step S5, the performance and the state of the vector data operation are monitored in real time, and when an abnormal situation occurs, the abnormal situation is processed according to a preset error processing mechanism, wherein the abnormal situation comprises a deadlock and a race condition, the preset error processing mechanism is preset according to different errors, and in the vector data processing, the preset error processing mechanism is dynamically updated according to the generated new errors.
Possible abnormal conditions including deadlock and race conditions are considered and processed, and an error processing mechanism is implemented to ensure the normal operation of the system, wherein the deadlock refers to the condition that two or more processes wait for each other to release resources, so that the two or more processes cannot continue to execute. To avoid deadlock, a series of precautions need to be taken, including requesting resources in order, setting timeout times, and in addition, deadlock detection and recovery mechanisms can be used to detect and resolve life-and-death lock problems.
Race conditions refer to unpredictable behavior that occurs when multiple processes or threads access a shared resource at the same time. To avoid race conditions, synchronization mechanisms, including mutex locks, semaphores, and atomic operations may be used to ensure inseparability of operations, thereby avoiding race conditions, including using exception handlers to capture and handle exceptions, logging error information for subsequent analysis and debugging, and providing appropriate user feedback and system recovery mechanisms. For example, when an abnormal condition is detected, the system may display an error message to the user and provide the option to retry or skip the step, and in addition, logging and monitoring tools may be used to track the operating condition of the system in order to discover and resolve potential problems in time;
By applying the method provided by the invention, before the vector operation process is carried out, the version numbers are allocated for the current vector data according to the operation state of the vector data, and the version numbers are continuously updated in the subsequent operation process when the vector data are changed, so that the system can continuously carry out verification according to the version numbers of the group of vector data, and when the verification obtains the abnormal version numbers, the initialization of the vector data is re-executed, thereby avoiding the influence on the accuracy of the whole group of data caused by continuous operation of the abnormal data.
Meanwhile, before the vector calculation process is carried out, judgment is carried out according to the operation condition of the vector, and screening operation is carried out on the vector data according to whether the operation condition is met or not, so that the vector which is directly operated can be operated normally, part of vector data which needs to be triggered by the pre-condition can be operated after being judged in the program, the addition quantity of read-write locks in the vector data operation process is reduced, and the operation efficiency of the program on the vector data is improved.
In addition, different threads can be distributed to a plurality of groups of vector data which are operated simultaneously through the parallel control module, a waiting sequence can be formed when no spare calculation force is temporarily generated, and subsequent vector data are operated according to the waiting sequence when the operation of the front-end thread is finished, so that the operation efficiency of the system on a large quantity of vector data is improved.
Referring to fig. 2, the present invention provides an embodiment: the system comprises a data access module, a concurrency control module, a storage initialization module, a locking processing module and an evaluation tuning module;
Wherein,
The data access module is used for determining an access mode of vector data: determining an access mode of the vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write once and frequent updating;
the concurrency control module is used for defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock;
the storage initialization module is used for initializing a storage structure and an interface of vector data: designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode, and initializing and storing the vector data based on the data interface according to the data storage structure;
The locking processing module is used for locking processing of the read-write lock: based on the vector data operation request of the thread, the access mode and the concurrency control mode corresponding to the thread are identified, and a critical area of the vector data operation is identified, and locking processing is carried out on the critical area based on the access mode and the concurrency control mode;
The evaluation and optimization module is used for returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result.
In this embodiment, the system further includes an exception handling module;
Wherein,
The exception handling module is used for monitoring the performance and the state of the vector data operation in real time, and handling the exception condition according to a preset error handling mechanism when the exception condition occurs, wherein the exception condition comprises deadlock and race conditions, the preset error handling mechanism is preset according to different errors, and dynamic updating is carried out in vector data processing according to the generated new errors.
Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims (10)

1. A method for evaluating large-scale vector data consistency based on multithreading, the method comprising:
s1, determining an access mode of vector data: determining an access mode of the vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write once and frequent updating;
s2, defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode determined in the step S1, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock;
S3, initializing a storage structure and an interface of vector data: designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode defined in the step S2, and initializing and storing the vector data based on the data interface according to the data storage structure;
S4, locking processing of the read-write lock: based on the vector data operation request of the thread, the access mode and the concurrency control mode corresponding to the thread are identified, and a critical area of the vector data operation is identified, and locking processing is carried out on the critical area based on the access mode and the concurrency control mode;
S5, using a condition variable: returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result;
In the process of executing the locking and returning results in the step S4 and the step S5, the performance and the state of the vector data operation are monitored in real time, and when an abnormal situation occurs, the abnormal situation is processed according to a preset error processing mechanism, wherein the abnormal situation comprises a deadlock and a race condition, the preset error processing mechanism is preset according to different errors, and in the vector data processing, the preset error processing mechanism is dynamically updated according to the generated new errors.
2. The method for evaluating consistency of large-scale vector data based on multithreading according to claim 1, wherein in step S1, the step of determining the access pattern of the vector data according to the usage of the vector data in the application program further comprises:
S101, extracting vector data categories, vector node names and application targets from the application program;
S102, determining an access right control type and an access level of the vector data according to the vector data category, the vector node name and the application target;
S103, determining a using mode of the vector data according to the access right control type of the vector data, wherein if the access right control type of the vector data is read-only, the access mode of the vector data is the reading mode, and if the access right control type of the vector data is writing, the access mode of the vector data is the writing mode;
S104, determining the level of the reading mode according to the access level of the vector data, wherein if the access level of the vector data is temporary, the level of the reading mode is write-once, and if the access level of the vector data is durable, the level of the reading mode is frequently updated;
the access level of the vector data is set by the application program according to different users.
3. The method for evaluating the consistency of large-scale vector data based on multiple threads according to claim 2, wherein in step S2, the step of defining the concurrency control mode of the vector data according to the access mode determined in step S1, and adding a time stamp and a validity tag further comprises:
S201, algorithm operation: importing the access mode determined in the step S1 to a concurrency control algorithm program to perform concurrency control operation;
s202, thread allocation: invoking the current idle operation thread to allocate the operation thread for the concurrent control operation in the step S201, calculating resources and a buffer space, and recording an intermediate calculation result and a final calculation result of the concurrent control operation;
Stopping concurrent control operation if the intermediate calculation result is in a reading mode, wherein the final calculation result is-1, the final calculation result is 0 if the intermediate calculation result is write-once, and the final calculation result is 1 if the intermediate calculation result is frequently updated;
S203, determining a concurrency control mode of the vector data according to the final calculation result, wherein the concurrency control mode is the read-write lock when the final calculation result is-1, and is the mutual exclusion lock when the final calculation result is 0 or 1;
S204, adding a time stamp and a validity tag for the concurrency control mode, and initializing the initial value of the time stamp of the concurrency control mode to be 0 and the validity tag to be invalid.
4. A method of evaluating the consistency of large-scale vector data based on multithreading according to claim 3, wherein in step S3, the step of designing the data storage structure, the operation interface, and the data interface of the corresponding vector data according to the concurrency control mode defined in step S2, and according to the data storage structure further comprises:
S301, defining a data structure: designing a data storage structure of the corresponding vector data according to the concurrency control mode defined in the step S2, wherein the data storage structure comprises a vector node name, a storage format, a vector length, a vector dimension, an index structure, a storage space and a memory resource;
S302, interface definition: defining an operation interface and a data interface for an addition operation, a deletion operation and a query operation of the vector data respectively based on a data storage structure of the vector data;
S303, initializing data: through the data interface, and call the memory resource defined in the step S301, the initial vector data is stored into the storage space defined in the step S301 according to the data storage structure of the vector data;
When the vector data is initialized according to the data storage structure of the vector data, the vector data is encrypted through an encryption algorithm, and the vector data is converted into encrypted data to be stored in the storage space.
5. The method for evaluating the consistency of large-scale vector data based on multithreading according to claim 4, wherein in step S4, the step of locking the critical area based on the access mode and the concurrency control mode comprises:
S4011, determining the access mode, the concurrency control mode and the critical area of the thread on the vector data operation corresponding to the thread according to the analysis result of the vector data operation request of the thread;
S4012, locking the critical area according to the access mode and the concurrency control mode,
If the access mode is a read mode, adding the read-write lock to the critical area according to the concurrency control mode, acquiring a time stamp when the read-write lock is added, and enabling validity labels of the read-write lock to be valid;
If the access mode is write-once, adding the mutual exclusion lock for the critical area according to the concurrency control mode, setting the value of the timestamp of the mutual exclusion lock to be 1, and enabling the validity labels of the read-write lock to be valid;
if the access mode is frequently updated, adding the mutual exclusion lock to the critical area according to the concurrency control mode, acquiring a time stamp when the mutual exclusion lock is added, and enabling validity labels of the read-write lock to be valid;
S4013, obtaining vector data corresponding to the vector data operation request of the thread, performing read-write processing on the vector data according to the vector data operation request of the thread, and returning a processing result to the thread;
wherein the vector data operation request corresponds to the access pattern, including a read operation request and a write operation request.
6. The method for evaluating the consistency of large-scale vector data based on multithreading according to claim 5, wherein in step S4013, when the vector data is read and written according to the vector data operation request of the thread and the processing result is returned to the thread, the method further comprises the step of vector data status processing, specifically comprising:
S4013-1, judging conditions: before the vector data is read and written, judging whether corresponding read operation or write operation can be executed according to the use state of the vector data, wherein the use state comprises processing idle state and processing state;
s4013-2, wait or wake-up processing: if the use state of the vector data is in process, setting the execution state of the thread to be waiting, waking up the thread when the use state of the vector data is updated to be idle, and executing corresponding reading operation or writing operation;
s4013-3, the operation is performed: if the using state of the vector data is idle, the using state of the vector data is set in data processing, vector data corresponding to the vector data operation request of the thread is obtained, and corresponding reading operation or writing operation is executed on the vector data according to the vector data operation request of the thread;
S4013-4, returning a result: the thread obtains a processing result of the read operation or the write operation.
7. The method for evaluating the consistency of large-scale vector data based on multithreading according to claim 6, wherein the step S5 further comprises the step of performing an unlocking process in a critical area, specifically:
S5011, the thread sends the returned processing result to the application program and receives a receipt of the application program;
S5012, adding a processing tag corresponding to the thread to vector data corresponding to the thread according to the receipt, adding 1 to the value of the processing tag, and setting the use state of the vector data as processing idle;
wherein the initial value of the processing tag is 0.
8. The method for evaluating the consistency of large-scale vector data based on multiple threads according to claim 7, wherein said step of S5 further comprises:
s5021, performance monitoring: performing real-time monitoring on the reading operation or the writing operation of the process, and returning a monitoring result, wherein the monitoring result comprises the following steps: the return time and the result consistency;
S5022, result consistency analysis: comparing the data format of the processing result with the storage format of the vector data, if the data format of the processing result is the same as the storage format of the vector data, jumping to step S5023, otherwise, returning inconsistent results, and jumping to step S4;
s5023, performance analysis: comparing the return time with a preset return time, if the return time is larger than the preset return time, adjusting and optimizing a performance optimization strategy, and if the return time is smaller than or equal to the preset return time, sending a returned processing result to an application program by the thread;
The preset return time is set according to the actual corresponding requirements of the application program;
The performance optimization strategy is a strategy for scheduling threads for executing the read operation and the write operation of the vector data and optimizing the locking and unlocking processes of the vector data.
9. The system is characterized by comprising a data access module, a concurrency control module, a storage initialization module, a locking processing module and an evaluation tuning module;
Wherein,
The data access module is used for determining an access mode of vector data: determining an access mode of the vector data according to the use mode of the vector data in an application program, wherein the access mode comprises a reading mode and a writing mode, and the level of the reading mode comprises write once and frequent updating;
the concurrency control module is used for defining a concurrency control mode: defining a concurrency control mode of the vector data according to the access mode, and adding a time stamp and a validity tag, wherein the concurrency control mode comprises a mutual exclusion lock and a read-write lock;
the storage initialization module is used for initializing a storage structure and an interface of vector data: designing a data storage structure, an operation interface and a data interface of the corresponding vector data according to the concurrency control mode, and initializing and storing the vector data based on the data interface according to the data storage structure;
The locking processing module is used for locking processing of the read-write lock: based on the vector data operation request of the thread, the access mode and the concurrency control mode corresponding to the thread are identified, and a critical area of the vector data operation is identified, and locking processing is carried out on the critical area based on the access mode and the concurrency control mode;
The evaluation and optimization module is used for returning a processing result to the thread according to the locking processing result, evaluating the execution performance and vector data consistency of the thread according to the return time and the result consistency of the processing result, and optimizing the thread according to the evaluation result.
10. The multithreading-based large-scale vector data consistency assessment system of claim 9, wherein the system further comprises an exception handling module;
Wherein,
The exception handling module is used for monitoring the performance and the state of the vector data operation in real time, and handling the exception condition according to a preset error handling mechanism when the exception condition occurs, wherein the exception condition comprises deadlock and race conditions, the preset error handling mechanism is preset according to different errors, and dynamic updating is carried out in vector data processing according to the generated new errors.
CN202410391753.3A 2024-04-02 2024-04-02 Multithreading-based large-scale vector data consistency assessment method and system Active CN117971137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410391753.3A CN117971137B (en) 2024-04-02 2024-04-02 Multithreading-based large-scale vector data consistency assessment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410391753.3A CN117971137B (en) 2024-04-02 2024-04-02 Multithreading-based large-scale vector data consistency assessment method and system

Publications (2)

Publication Number Publication Date
CN117971137A CN117971137A (en) 2024-05-03
CN117971137B true CN117971137B (en) 2024-06-04

Family

ID=90851843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410391753.3A Active CN117971137B (en) 2024-04-02 2024-04-02 Multithreading-based large-scale vector data consistency assessment method and system

Country Status (1)

Country Link
CN (1) CN117971137B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2231597A1 (en) * 1997-03-10 1998-09-10 Patrick G. Sobalvarro Detecting concurrency errors in multi-threaded programs
CN102681937A (en) * 2012-05-15 2012-09-19 浪潮电子信息产业股份有限公司 Correctness verifying method of cache consistency protocol
CN105574585A (en) * 2015-12-14 2016-05-11 四川长虹电器股份有限公司 Sample training method of neural network model on the basis of multithreading mutual exclusion
CN112416556A (en) * 2020-11-23 2021-02-26 西安西热电站信息技术有限公司 Data read-write priority balancing method, system, device and storage medium
CN117389755A (en) * 2023-09-06 2024-01-12 北京恺望数据科技有限公司 Multithreading memory sharing method and device
CN117667932A (en) * 2023-12-01 2024-03-08 北京柏睿数据技术股份有限公司 Method and system for customizing and converting storage format of vector database
CN117707771A (en) * 2023-12-14 2024-03-15 中证鹏元资信评估股份有限公司 Distributed object read-write performance improving method based on multithreading technology

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7380073B2 (en) * 2003-11-26 2008-05-27 Sas Institute Inc. Computer-implemented system and method for lock handling

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2231597A1 (en) * 1997-03-10 1998-09-10 Patrick G. Sobalvarro Detecting concurrency errors in multi-threaded programs
CN102681937A (en) * 2012-05-15 2012-09-19 浪潮电子信息产业股份有限公司 Correctness verifying method of cache consistency protocol
CN105574585A (en) * 2015-12-14 2016-05-11 四川长虹电器股份有限公司 Sample training method of neural network model on the basis of multithreading mutual exclusion
CN112416556A (en) * 2020-11-23 2021-02-26 西安西热电站信息技术有限公司 Data read-write priority balancing method, system, device and storage medium
CN117389755A (en) * 2023-09-06 2024-01-12 北京恺望数据科技有限公司 Multithreading memory sharing method and device
CN117667932A (en) * 2023-12-01 2024-03-08 北京柏睿数据技术股份有限公司 Method and system for customizing and converting storage format of vector database
CN117707771A (en) * 2023-12-14 2024-03-15 中证鹏元资信评估股份有限公司 Distributed object read-write performance improving method based on multithreading technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于采样技术的动态混合数据竞争检测算法;李梦珂;郑秋生;王磊;;计算机科学;20201015(第10期);第315-321页 *

Also Published As

Publication number Publication date
CN117971137A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
US6009269A (en) Detecting concurrency errors in multi-threaded programs
Kim et al. Ermia: Fast memory-optimized database system for heterogeneous workloads
KR101143214B1 (en) Method and system for detecting potential races in multithreaded program
US6854108B1 (en) Method and apparatus for deterministic replay of java multithreaded programs on multiprocessors
US7747805B2 (en) Adaptive reader-writer lock
Perkovic et al. Online data-race detection via coherency guarantees
US8065499B2 (en) Methods and apparatus to implement parallel transactions
US6678772B2 (en) Adaptive reader-writer lock
US8533681B2 (en) Atomicity violation detection using access interleaving invariants
EP2936311B1 (en) System and method for implementing scalable contention-adaptive statistics counters
US20090133032A1 (en) Contention management for a hardware transactional memory
Wang et al. A transactional memory with automatic performance tuning
JPH0784851A (en) Shared data managing method
KR101713362B1 (en) Lock resolution for distributed durable instances
Hirve et al. Hipertm: High performance, fault-tolerant transactional memory
EP2936313B1 (en) System and method for implementing numa-aware statistics counters
JP5435741B2 (en) Using mold-fixability to facilitate conflict management
Wilcox et al. VerifiedFT: a verified, high-performance precise dynamic race detector
CN114428733A (en) Kernel data competition detection method based on static program analysis and fuzzy test
Vandevoort et al. Robustness against read committed for transaction templates
CN117971137B (en) Multithreading-based large-scale vector data consistency assessment method and system
CN111767337B (en) Block verification method, device and equipment
Moreno et al. On the implementation of memory reclamation methods in a lock-free hash trie design
KR101121902B1 (en) Transactional memory system and method for tracking modified memory address
US20030182318A1 (en) Method and apparatus for improving transaction specification by marking application states

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant