GB2284079A - Sorting or merging lists - Google Patents

Sorting or merging lists Download PDF

Info

Publication number
GB2284079A
GB2284079A GB9423239A GB9423239A GB2284079A GB 2284079 A GB2284079 A GB 2284079A GB 9423239 A GB9423239 A GB 9423239A GB 9423239 A GB9423239 A GB 9423239A GB 2284079 A GB2284079 A GB 2284079A
Authority
GB
United Kingdom
Prior art keywords
list
sublists
memory
sorted
sequential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9423239A
Other versions
GB9423239D0 (en
Inventor
Mehdi Jazayeri
Meng Lee
Alexander A Stepanov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Publication of GB9423239D0 publication Critical patent/GB9423239D0/en
Publication of GB2284079A publication Critical patent/GB2284079A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
    • G06F7/36Combined merging and sorting

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

2284079 1 METHOD AND APPARATUS FOR SORTING OR MERGING LISTS j The present
invention relates to a method for adapting a processing operation to memory availability, for example to sorting or merging sequential lists in a space-adaptive manner.
Sequential lists are commonly sorted for use in many aspects of data processing. The sorting is based on a key which is mappable to natural numbers. For example, the key can represent a wide range of things such as employee number, employee age, colors, geographical locations, etc. The sequential lists can represent a wide variety of applications, including database records or other data lists. The sorting of the list reorders the elements in the list in relative order. The sorting is called stable if the relative order of equal elements in the sorted list is the same as they were in the input list.
Known procedures for stably sorting lists were developed many years aL,o. The known stable sorting techniques are of two types. The first type CP 1 is an in-place approach which requires no temporary memory storage or buffering. The second type requires that the memory available be at least as large as the size of the sequential list being sorted.
Sequential lists which have previously been sorted are commonly merg ged for use in many aspects of data processing. Known stable merging procedures for merging lists were also developed many years ago. The known merging procedures include both the first and second types as was = In the case with sorting t, I 0 Examples of the known stable sorting and merging techniques are described in USL C + + Standard Components Library, published by AT&T and UNIX System Laboratories (Release 3.0). In addition, an example of a known stable sort is merge-sort as described in D.E. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching (Addison-Wesley, 1973). Further, an example of a known stable merging procedure is described in Dudzinski and Dydek, Information Processing Letters, vol. 12, no. 1, Feb. 13, 198 1.
The problem with these known stable sorting and merging techniques is that performance is not optimized in most cases. In particular, the inplace approach which does not require any additional memory, has poor performance because it does not make use of available memory. The second type of known stable sorting and merging does not operate efficiently in situations where the available memory is less than the size of the sequential list.
The present invention seeks to provide an improved method of adapting a processing operation to memory availability.
According to an aspect of the present invention, there is provided a method of adopting a processing operation to memory availability, in a data processing system comprising the steps of a) b) receiving at least one sequential list having a list size; determining an amount of memory whi-ch is available for use; 3 c) comparing the size of the at least one sequential list with the amount of memory which is available for use; d) dividing the at least one sequential list into sublists based on said comparison; and e) performing the processing operation on each of the sublists.
It is possible to provide both a stable sorting technique and a stable merging technique which are able to operate with optimized performance regardless of the size of the available memory.
With stable sorting and stable merging techniques, optimization is a major concern. As an examples, when working with databases, the relative ordering in most cases must be preserved; hence, only stable sorting and stable merging are useful.
4 A preferred embodiment functions to adapt a stable processing operation (e.g. merge, sort, partition, and the like) to the memory which is available in an apparatus or system. Upon receiving at least one sequential list (with a list size) to be processed, the amount of memory which is available for use is determined. Next, the size of the at least one sequential list is compared with the amount of the memory which is available for use. The at least one sequential list is then divided into sublists until the size of the sublists do not exceed the amount of memory available for use.
Thereafter, the processing operation is performed on each of the sublists. This embodiment can carry out either a stable sorting technique or a stable merging technique.
The stable sorting technique is preferably space adaptive. As a consequence, it is possible to utilize as much of the memory as is available even if the amount available is less than the size of a sequential list (one-dimensional array of objects) to be sorted. As a result, stable sorting can be provided in a data processing system with optimized processing efficiency.
The stable sorting technique according to the invention can be provided in apparatus or a method. Further, the sorting technique can be either a merge-sort approach or a discrete sort approach. An embodiment of a stable sorting apparatus includes a space-adaptive sorting device and a memory connected thereto. The space-adaptive sorting device receives a sequential list: to be sorted, stably sorts the sequential list, and out- outs a sorted list. The memory temporarily stores elements of the sequential list while the sorting device sorts the sequential list.
0 To operate in a space-adaptive manner using the merge-sort approach, the apparatus preferably includes a division unit for dividing the sequential list into first and second sublists, a comparison unit for comparing the size of the sublists with the amount of the memory which is available for use by the sorting device, another division unit for subdividing the sublists into sublists when it is determined that size of the sublists exceeds the amount of available memory, a sort unit for sorting the sublists to obtain sorted sublists, and a merge unit for stably merging the sorted sublists using the memory, thereby producing the sorted list.
Alternatively, to operate in a space-adaptive manner using the discrete sort approach, the apparatus may include a comparison unit for comparing the size of the list with an amount of the memory which is available for use by the sorting device, a partition unit for stably partitioning the list using one of the elements of the list as a predicate to produce three sublists when the comparison unit determines that size of the lists exceeds a predetermined size, and a sort unit for sorting each of the reduced size sublists (i.e., size less than the predetermined size) to obtain sorted sublists.
As a method, the sorting technique may also use either the merge-sort 4P approach or the discrete sort approach. According to the merge-sort approach, the method may be performed by a data processing system having memory, and the method may include the steps of: receiving a sequential list; dividing the sequential list into first and second sublists; determining an amount of the memory which is available for use; comparing the sizes of the sublists with the amount of the memory which is available for use; when the size of the sublists lists exceeds the amount of the memory which is available for use, dividing the sublists into further sublists; sorting each of the sublists to obtain sorted sublists; and stably merging the sublists to obtain a sorted list.
Alternatively. the sorting technique may use the discrete sort approach. In this case, the method performed by the data processing system I? 6 may include the steps of: receiving the sequential list; determining an amount of the memory which is available for use; comparing the size --E the list with the amount of the memory which is available for use; when the size of the list exceeds the amount of the memory which is available for use, selecting one of the elements of the list as a predicate; stably partitioning the list using the predicate to produce three sublists; and stably sorting certain of the ublists which have a size that does not exceed the amount of the memory which is available for use.
Likewise, the stable merging technique may be space adaptive. As a consequence, it is possible to utilize as much of the memory as is available even if the amount is less than the size of the sequential sorted lists being merged. As a result, stable merging can be provided in a data processing system with optimized processing efficiency.
The stable merging technique according to the invention can be provided in apparatus or a method. An embodiment of a space-adaptive stable merging apparatus includes a space-adaptive merging device and a memory connected thereto. The space-adaptive merging device may receive first and second sorted sequential lists, stably merge the first and second sorted sequential lists, and output a single sorted list. The memory can temporarily store some elements of at least one of the sequential lists while the merging device merges the first and second sorted sequential lists. To operate in a space-adaptive manner, the apparatus preferably includes a comparison unit for comparing the list sizes with the amount of the memory which is available for use by the merging device, a division unit for dividing one of the sequential sorted lists into first and second sublists when it is determined that both of the list sizes exceed the amount of available memory, a segment unit for segmenting the other of the sequential sorted lists in accordance with the first element of the second ublist to produce third and fourth sublists. and a merge unit for stably 7 merging the first and third sublists using the memory and for stably merging the second and fourth sublists using the memory, thereby producing the combined sorted list. Preferably, a swapping or rotate unit is also provided to swap or rotate the second and third sublists to allow the merge unit to operate on two contiguous lists.
As a method, the stable merging technique preferably merges first and second sorted sequential lists. The method may be performed by a data processing system having memory, and may include the steps of: receiving the first and second sequential sorted lists; determining an amount of the memory which is available for use; comparing the list sizes with the amount of the memory which is available for use; when it is determined that the size of both of the sequential sorted lists exceeds the amount of the memory which is available for use, dividing one of the sequential sorted lists into first and second sublists; segmenting the other of the sequential sorted lists in accordance with a first element of the second sublist to produce third and fourth sublists; and stably merging the first and third sublists using the memory and for stably merging the second and fourth sublists using the memory.
Thus, both the space-adaptive sorting and the space-adaptive merging techniques may yield optimal performance regardless of the amount of available memory.
An embodiment of the present invention is described below, by way of example only, with reference to the accompanying drawings, in which:
Fig.1 is a block diagram of an embodiment of a space-adaptive merging apparatus; Figs. 2A and 2B are schematic diagrams of a sequential list and available memory; 8 FIGS. 3A and 3B are flow charts of an embodiment of a space- adaptive stable merging, technique according to the invention; 1 1 FIG. 4 is a block diagram of an embodiment of a space-adaptive sorting apparatus; FIGS. SA, 5B and 5C are flow charts of a first embodiment of a space-adaptive stable sorting technique according to the invention; FIG. 6 is a flow chart of a second embodiment of a space-adaptive stable sorting technique according to the invention; and 1.
FIGS. 7A, 7B and 7C are flow charts of a space-adaptive partitioning technique utilized by the second embodiment of the space-adaptive sorting technique.
Embodiments of techniques described below function to adapt a stable processing operation (e.g., merge, sort, partition, and the like) to the memory which is available in the apparatus or system. Upon receiving at least one sequential list (with a list size) to be processed, the amount of memory which is available for use is determined. Next, the size of the at least one sequential list is compared with the amount of the memory which is available for use. The at least one sequential list is then divided into sublists until the size of the sublists does not exceed the amount of memory available for use. Thereafter, the processing operation is performed on each of the sublists.
9 Space-Adaptive Merging FIG. I is a block diagram of an embodiment of a space-adaptive merging apparatus 1. The apparatus I includes a space-adaptive merging device 2 and memory 4. The space-adaptive merging device 2 receives a first sorted sequential list 6 and a second sorted sequential list 8 as inputs, and outputs a sorted list 10. The space-adaptive sorting device 2 is also operatively connected to the memory 4.
The space-adaptive merging device 2 can be embodied in a wide variety of apparatus or systerns, including electrical circuits, robots, computer software, computer firmware, and computer hardware. The quantity of memory 4 available varies with application. The sequential lists 6,8 are one-dimensional arrays of objects (or elements). In addition, the sequential lists can be of any size and width. As an example, the list can vary from a few one-bit elements to many database records having multiple multi-bit fields. Since the sequential lists 6,8 are defined as one-dimensional arrays of objects, they do not include linked lists.
FIGS. 2A and 2B are schematic diagrams illustrating an input tist which includes a first sorted sequential list 6' and a second sorted sequential list 8'. The sequential list 6' has five elements which are numerically ordered and the sequential list 8' has six elements which are sequentially ordered. The available memory 4' is capable of storing up to three elements. For simplicity, the remainder of the discussion concerning the space-adaptive merging technique will concentrate on the simplified working example illustrated in FIGS. 2A and 2B.
The general operation of the space-adaptive merging apparatus 1 is as Z7 1 Z follows.
An important aspect of the operation of the space-adaptive merging 1 1 apparatus 1 is that it varies in accordance with the amount of memory 4 that is available. Typically, the amount of memory 4 (data storage) available in an apparatus or computer system is limited and allocated to numerous operations or tasks. Here, the space-adaptive merging device 2 determines or is told the amount of memory (e.g., buffering) that the memory 4 can offer the device 2. The space-adaptive merging device 2 is then able to adapt its operation so as to make best use of the memory 4 that is available.
When the amount of available memory 4 is at least as large as the size of one of the first and second sorted sequential lists 6, 8, then the list which fits into the available memory 4 is moved into the available memory 4. Next, each element of the lists are compared with one another. Initially, the first element of each list is compared, and the smaller of the two is placed into the first position of the list. Next, the larger of the first elements is compared with the second element of the other list. The smaller of the two elements is placed in the second position of the list. After processing all the elements in the first and second sorted sequential lists 6, 8, all the elements are now sorted in the entire input list. It should be noted that the operation of the apparatus 1 preserves the relative order of the objects in the original sorted lists, thus making the merging technique not only space-adaptive. but also stable.
When the amount of available memory 4 is not as large as the size of either the first or second sorted sequential lists 6, 8. the processing differs. Namely, prior to processing the lists 6, 8. the larger of the First and second sorted sequential lists 6, 8 is divided in half, forming first and second subsegments. If the lists are equal in size, then either can be divided in half. Next, the smaller list of the first and second sorted sequential lists 6, 8 is seamented into first and second subsegments in accordance with the first element of the second subsegment of the larger list. A first subsegment of the larger list is then rotated or swapped with a subsegment of the smaller list. Namely, if the subsegment of the larger list is the first subsegment, it 1 n 1t7 C7 is rotated or swapped with the second subse(Yment of the smaller list, and vice versa. Next, the First subsegments of the larger. and smaller lists are 11 merged, and tile second subsegments of the larger and smaller lists are merged. Typically, the input list includes the two sorted lists to be merged in which case the resulting list is the input list which is now sorted. Note that if after the dividing, segmenting and swapping, the amount of available memory 4 is still not as large as one of the subsegments to be merged, the halving or subdividing continues in a recursive manner until the condition is met. For every division of the lists or subsegments there will be a subsequent combining operation.
The operation of the space-adaptive merging device 2 is described in more detail below, FIGS. 3A and 313 are flow charts of the space-adaptive stable merging technique according to an embodiment of the present invention.
The space-adaptive stable merging technique or method begins with a decision 12. The decision 12 turns on whether or not the available memory 4 is greater than or equal to the size of one of the lists 6, 8.
In the first case, namely, when the decision 12 determines that the amount of available memory is at least as large as the size of one of the lists 6,8. then the sorted lists 6,8 can be merged without any adaptation to the amount of available memory 4. The processing steps for this case are discussed in detail below with reference to FIG. 3B.
In the second case. namely, when the decision 12 determines that the amount of available memory 4 is less than the size of both of the sorted lists 6,8. the processing differs. In this case, the larger of the lists 6,8 is divided 14 in half to form first and second subsegments. Next, the smaller of the lists 6,8 is segmented 16 into First and second subsegments in accordance with the First element of the second subsegment of the larger list. Table 1 provided below illustrates the dividing 14 and seMenting 16 for the working example.
12 TABLE I
Input Sequential List Smaller List Larger List 1 1 4 [.,5 J1 9 1 1211 2 j_3 1 4 1 8 1 9 1 0 L-1st L-2nd 1 1 1 4 [_5 1 9 1 12 l 2 1 3 1 4 11 8 1 9 1 S-1st S-2nd L-Ist L-2nd 1 1 4 1_ 5 11 9 1 1211 2 1 3 1 4 -f 8 1 9 1 10 1 The double bared lines indicate where the divisions or segmenting have 4P occurred. Namely, subsegments L-Ist and L-2nd result from dividing, and nd result from the segmenting. In the working subsegments S-Ist and S-2 example, it is assumed that the available memory can store up to three elements as illustrated in FIG. 2B.
Next. a subsegment of the larger list is swapped 18 (or rotated) with a subsegment of the smaller list. If the smaller list precedes the larger list, the first subsegment of the larger list is swapped 18 with the second subsegment of the smaller list. or vice versa. Table II provided below illustrates the swapping operation for the working example. TABLE II S-1st L-1st S-2nd 415112i314 9 L-2nd 8 1 9 T-1, o The examples illustrated in Tables I and II correspond to the situation where only one division 14 is necessary. However, in general, blocks 14- 18 would be repeated until the size of one of the subsegments is less than or equal to the amount of available memory 4.
Once the swapping 18 has occurred. the first subsegments. which are now adjacent to one another, are mereged 20 to produce a first portion of the 13 0 sorted list. Next, the second subsegments, which are now adjacent to one another, are merged 22 to produce the second portion of the sorted list. For example, with reference to Tables I and II, the subsegment (L-lst) and subsegment (S-Ist) are merged, and subsegment (L-2nd) and subsegment (S- 0 JC) 2nd) are merged. Table III provided below illustrates the merging 20, 22 to produce the sequential list 6 for the working example contained in Tables I and II.
TABLE III ist merged list 2nd merged list 1 1 1 2 1_ 3 1 4 1 4 1 5_ -8 1 9 1 9 1 10 1 12 1 Sorted List 1 1 1 2 1 3 1 4 1 4 1 5_1 8 1 9 1 9 1 11E The swapping 18 can be achieved in a number of known ways by those in the art. For example, one may use an in-place rotate scheme. Alternatively, however, it is preferable to make use of the available memory 4 in performing the swap operation because it will usually be faster. As an example, if one of the subsegments to be moved fits into the available memory 4, the subsearnent is moved there; thereafter, the other segment to be moved is moved into the vacated slots in the list, and finally, the portion held in available memory 4 is moved into the now vacated other slots in the list. In general, when multiple subdividing is required, each pair of adjacent subseaments after dividing, segmenting and swapping operations would be merged in reverse order.
Although the above-described method describes the operation of the spaceadaptive merging technique, the embodiment also operates in the case where the available memory is at least equal to the size of one of the lists 6,8. In this case, the lists 6,8 need not be divided 14 or segmented 16 so as to adapt to the size of the available memory 4. Here. the method is 14 illustrated in FIG. 313 and operates as follows. Once the decision 12 directs control to block 24 because the amount of available memory 4 is at least equal to the size of one of the lists 6, 8, then non-adaptive merge processing can be utilized. This processing is referred to as nonadaptive merge because the memory is guaranteed to be of sufficient size.
Initially, the space-adaptive merging device 2 receives 26 the first and second sorted lists which are to be merged. The first sorted list is then moved 28 into the available memory 4. Next, pointer PI is set 30 to the address in the available memory 4 of the first element of the first sorted list. Similarly, pointer P2 is set 332 to the address of the first element of the second sorted list. A current location pointer (CLP) is also set 34 to the address of the first element of the merged list.
Next, a decision 36 is made based on whether the element at P1 in the available memory 4 is greater than the element at P2 in the second sorted list. If the element at P1 in the available memory 4 is not greater than the element at P2 of the second sorted list, then the element at P1 in the available memory 4 is copied 38 to the next available location in the merged list which is identified by CLP. The pointer P1 is then incremented 40. On the other hand, if the element at P1 in the available memory 4 is greater than the element at P2 of the second sorted list, the element at P1 in the 2 to the next available location in the merged available memory 4 is copied 4',._ list which is identified by the CLP. The pointer P2 is then incremented 44. Regardless of which of the elements are copied to the next available memory location in the list, the CLP is incremented 46 so as to point to the next available location in the list.
Next, a decision 48 is made based on whether all the elements of the sorted lists 6.8 have been processed. If not, blocks 36-46 are repeated until the condition is satisfied (e.g., all the elements in the sorted lists are processed). Once all the elements have been processed. the merge processing is completed.
is 0 Space-Adaptive Sortin FIG. 4 is a block diagram of the embodiment of a space-adaptive 1 sorting apparatus 50 according to the invention. The apparatus 50 includes a space-adaptive sorting device 52 and memory 54. The space- adaptive sorting device 52 receives a sequential list 56 of elements, and outputs a sorted list 58. The sorting can be either merge-sort or discrete sort. The space-adaptive sorting device 52 is also operatively connected to the memory 54.
The space-adaptive sorting device 52 can be embodied in a wide variety of apparatus or systems, including electrical circuits, robots, computer software, computer firmware, and computer hardware. The sequential list 56 received by the space-adaptive sorting apparatus 50 is a one- dimensional array of objects (or elements) which need to be sorted. The sequential list 56 can be of any size and width. As an example, the list 56 can vary from a few one-bit elements to many database records having multiple multi-bit fields. Since the sequential list 56 is defined as a onedimensional array of objects, it does not include linked lists.
Although the operation of the space-adaptive sorting apparatus 50 is described in detail below, the general operation of the apparatus 50 is as follows. However, it should be noted that the processing differs between the merge-sort embodiment and the discrete sort embodiment.
An important aspect of the operation of the space-adaptive sorting apparatus 50 is that it varies in accordance with the amount of memory 54 that is available. Typically, the amount of memory 54 (data storage) available in an apparatus or computer system is limited and allocated to numerous operations or tasks. Here, the space-adaptive sorting device 52 determines or is told the amount of memory (e.g., buffering) that the memory 54 can offer the device 52. The space-adaptive sorting device 52 is then able to adapt its operation so as to make best use of the memory 54 that is available.
16 I? Basically, the operation of the space-adaptive sorting apparatus 50 for a merge-sort embodiment is as follows. Initially, the sequential list 56 is divided into first and second sublists by the sorting device 52. If the sequential list 56 has an even number of elements, the first and second sublists will have equal sizes; otherwise, one of the sublists will have an additional element in comparison with the other sublist. The remaining description will assume that the sublists are of equal size. If the amount of available memory 54 is less than the size of the sublists, the sublists are again divided in half. The dividing repeats until the amount of available memory 54 equals or exceeds the size of the sublists. In any case, once the amount of available memory 54 equals or exceeds the size of the sublists, then each of the sublists is stably sorted by the sorting device 52 to obtain sorted sublists. The sorted sublists are then merged a pair at a time to obtain the sorted list 58.
On the other hand, the basic operation of the space-adaptive sorting apparatus 50 for a discrete sort embodiment is as follows. In this case, the sorting device 52 receives the sequential list 56 to be sorted. The sorting b device 52 then determines the amount of the memory 54 which is available for use by the sorting device 52. If the amount of the memory 54 which is available for use is less than the size of the list, then one of the elements of the list is selected as a predicate and the list is stably partitioned using the predicate to produce three sublists. Thereafter, certain of the sublists (which have a size that does not exceed the amount of the memory which is available for use) are stably sorted.
The operation of the space-adaptive sorting device 52 is described in b more detail below FIGS. 5A, 5B and 5C are flow charts of the space-adaptive stable sorting technique according to a First embodiment of the invention. According to the embodiment. the space-adaptive stable sorting technique 1 17 begins by dividing 60 the sequential list 56 in half to produce first and second sublists. Table IV provided below illustrates an example of a sequential list 56 which is used as the working example for this embodiment. TABLE IV Sequential List 1 1 4 1 8 5 1 7 1 2 _3 1 4 1 8 1 101 9 1 2 1 Sublist-A Sublist-B 4 8 1 5 1 7 1 2 1 3 1 4 1 8 1 101 9 1 2 1 Note that the division of the sequential list has produced Sublist-A and Sublist-B.
Next, a decision 62 is made based on whether or not the amount ofavailable memory 54 is at least as great as the size of the sublist. In the first case, namely, when the decision 62 determines that the amount of available memory 54 is at least equal to the size of the sublists, then the sublists can be individually sorted without any adaptation to the amount of available memory 54. In the second case, namely, when the amount of available memory 54 does not equal or exceed the size of the sublists, the sublists are repetitively divided 64 in half until the condition is satisfied. For the working example in Table IV. the sequential list need only be divided once because the working example assumes that the amount of available memory 54 is at least equal to six elements.
In any case, the sorting 66 is achieved using a non-adaptive mergesort technique which produces a sorted sublist for each of the sublists. A description of the non-adaptive merge-sort procedure is described with reference to FIGS. 5B and 5C. Although this embodiment (FIG. 5A) is described as using a mercre-sort technique to stably sort the sublists, more
C generally, any stable sortin', techniques such as insertion sort may be 1 18 substituted therefor. After obtaining these sorted sublists, each pair of sublists is merged 68 to obtain the sorted list.
FIG. 5B is a flow chart illustrating procedures of the sorting 66 performed by the non-adaptive merge-sort technique of the first embodiment. The processing begins with a decision 70 based on whether all the sublists have been sorted. Initially, in the working example, there are at least two sublists which need to be sorted, namely sublists-A and sublist-B both need to be sorted.
When there are sublists to be processed, one of the sublists to be processed is selected 72. A decision 74 is then made based on whether the list size of the sublist is greater than a predetermined chunk size. If the list size of the sublist is less than or equal to a predetermined chunk size, then an insertion sort procedure is performed 76. The insertion sort procedure is discussed in detail below with reference to FIG. 5C. Again, it should be recognized that other stable sorting procedures besides the insertion sort procedure can be used.
On the other hand, when the list size of the sublist is greater than the predetermined chunk size, then the sublist is divided 78 into chunks equal to the predetermined chunk size. Next, the insertion sort procedure is performed 80 (FIG. 5Q. Thereafter, the resulting sorted chunks of the sublist are then stably meraed 82 to reform the sublist. Because the chunks are eventually merged 82, this technique is classified as a meraesort.
Preferably, the chunk size is seven elements. Therefore, with the working example, blocks 78-82 are not performed because the size of the sublists is six elements, which is less than seven. However, if the sequential list contained one-hundred elements, then each sublist would contain fifty elements, and block 78 would divide each of the sublists into eight chunks (seven chunks of seven elements and one chunk of one element).
Following blocks 76 and 82, decision block 70 is repeated to again if '111 the suhh-Sts havc been sorted. If not. then blocks 772-82 are 19 repeated until all the sublists have been sorted. However, once all the sublists have been sorted, an adaptive merge technique is performed 68. An embodiment of the adaptive-merge technique has been described above with reference to FIG. 3A and 3B. In general, for every division 60, 64 of the sequential list needed to adapt the list to the amount of available memory 54, a subsequent merge 68 will be used to recombine the sublists.
The insertion sort procedure 76, 80 for the first embodiment of the adaptive merge-sort technique is now described in detail with reference to FIG. 5C. This procedure is called by either block 76 (for a sublist) or block 80 (for chunks). The insertion sort procedure 76, 80 begins by setting 86 a pointer POINTER_FIRST (PF) to point to the first element in the sublist or chunk. A pointer POINTER - LAST (PL) is also set 88 to the last element in the sublist or chunk. Next, a decision 90 is made based on whether the pointer PL equals the pointer P17 or if the pointer PL equals the pointer PF increased by one (PF+ 1). This decision 90 tests to see if the sublist or chunk consists of only zero or one elements. If the decision 90 determines that the list has only zero or one elements, then the insertion sort technique is completed. otherwise the processing continues.
Following block 90, a CURRENT_POINTER (CP) is set 92 to P17 + 1. A decision 94 is then made based to determine if the entire sublist or chunk has been processed. If so, the insertion sort technique is completed. If the list is not yet completely processed, then an insertion operation is performed. Specifically, in the insertion operation, the element currently at location CP is inserted 96 into the proper position in the sublist or chunk starting with P17 and ending with CP. The insertion sort technique is well known to those skilled in the art. Nevertheless, the basic operation is to start at the beginning of the sublist or chunk and compare the current t) 1 1 1 element being sorted with the first element in the sublist or chunk. If the =P current element is smaller it is inserted into the list by moving all the other elernents bet-ore the current position down one location and placing tile current element in the first location. If its larger, the process is repeated for the second element in the sublist or chunk. Eventually, the current element is either inserted or left at it present position in the sublist or chunks.
Nevertheless, once the sorting is completed of a given sublist or chunk, the insertion sort processing for the given sublist or chunk is completed and control returns back to the proper place in the procedures shown in FIG. 513 (namely, blocks 76 or 80). After all the sublists have been sorted, control returns back to FIG. 5A so that the sublists can be stably merged 68. Preferably, the merging 68 is performed in accordance with the space-adaptive merging technique according to the invention which was described above with reference to FIGS. 3A and 3B. In any case, Table V provided below illustrates the sorted sublists and the desired sorted list upon merging the same.
TABLE V
Sorted Sublist-A Sorted Sublist-B 2 1 4 5 1 7 1 8 l 2 1 3 1 4 1 8 1 9T 1LO Sorted List 1 1 2 1 2 1 3 1 4 4 1 5 1 7 0 1 8 S 1, 9 1 1 Note that because the sorting is stable the relative order of equal elements is maintained. For example, the first in the sorted list is element #6 in the original sequential list, and the second "2" in the sorted list is element #12 in the original sequential list.
Although the space-adaptive stable sorting technique described in the first embodiment is able to rapidly sort the sequential list, a discrete sort is a specialized sorting technique which is more efficient when sorting lists which contain many elements which are of equal value. Hence, a second embodiment of the space-adaptive stable sorting technique according to the invention relates to a di.s.-r-te sort like the first 21 embodiment, is stable. The space-adaptive sorting device 52 can utilize either of these embodiments.
IF FIG. 6 is a flow chart of the second embodiment of a space-adaptive stable sorting technique according to the invention.
The space-adaptive discrete sort technique begins with a decision 104 which is made based on whether the list size is less than a predetermined size. Preferably, the predetermined size is thirty-two (32) elements. When the list size is small (i.e., less than the predetermined size), then the insertion sort procedure is called 106. The insertion sort procedure, which was discussed above with reference to FIG. 5C, operates to sorts the list.
On the other hand, when the list size is not small (i.e., greater than or equal to the predetermined size), then an element in the list is randomly selected 108. The list is then partitioned 110 using a space-adaptive stable partitioning technique. The partitioning technique is a three-way partitioning technique which partitions the list into three sublists based on the random element. The space-adaptive stable three-way partitioning herein utilized is described in detail in our co-pending British patent application no.915111,5 (RJ/N2631) filed the same day as this application. Nevertheless, the space-adaptive stable three-way partitioning technique is described in detail below with reference to Figs. 7A-7C.
Next, a decision 112 is made based on whether the size of the first and third sublists is less than the predetermined size. If not, then blocks 108-112 are repeated until the first and third sublists of the partitioned list or sublist thereof are less than the predetermined size. For example, if after the first partitioning the first sublist of the list has thirty elements and the third sublist has 15 elements, then the first sublist is again partitioned. The third sublist need not be partitioned because its size is less than the predetermined size which is preferably twenty elements. Once all the first 22 0 and third sublists (after one or multiple partitions) have a size which is less than the predetermined size, then the insertion sort procedure is called 114 to sort all the first and third sublists. The insertion sort procedure, which was discussed above with reference to FIG. 5C, operates to sorts the list. This completes the sorting of the sequential list.
FIGS. 7A-7C are flow charts illustrating the space-adaptive stable threeway partitioning 110.
Initially, a decision 116 is made upon comparing the size of the list 56 with the amount of available memory 54. If the amount of available memory 54 is at least as large as the list size, then the stable partitioning technique need not adapt itself to the space available in the memory. In this case, the processing follows the operations discussed below with reference to FIG. 7C.
On the other hand, when the amount of memory 54 available is smaller than the list size, the stable partitioning operates differently so that it adapts itself to the amount of memory 54 which is available. First, the list 56 is divided 118 in half. As with the first embodiment, in the general case, the dividing would be repeated until the condition of block 116 is satisfied. However, for simplicity, it is assumed that only one halving is needed.
Next, several pointers are initialized. In particular, pointer P1 is set 120 to the address of the first element in the first sublist. A current location pointer (CLP) is set 1272 to the value of Pl. In addition, a pointer MEMFRONT is set 122 to the first location in available memory 54, and a pointer MEM-REAR is set 122 to the last location in available memory 54.
Once the pointers are initialized, the elements are compared one at a time with the predicate which is the random element selected 108. Here, there are three results of the comparison, namely " - 1 ", " 0 " or " + 1 ', or <, = or >, the later is used hereinafter. If decision 124 determines that the element at P1 is less than the predicate. then the element at P1 is moved 126 -LP is 12S.
to the stiblist at the location indl.-atcd by- CLP, and C 4P 23 If decision 130 determines that the element at PI is equal to the predicate, then the element at P I is moved 132 to the available memory 54 at the location of MEM-FRONT, and MEM-FRONT is incremented 134. If decision 136 determines that the element at PI is greater than the predicate, then the element at P1 is moved 138 to the available memory 54 at the location of MEM-REAR, and MEM-REAR is incremented 140. Whichever of these three branches that is active for a given element, afterwards the pointer P1 is incremented 142 so as to point to the element to be processed next.
Next, a decision 144 is made based on a comparison of pointer P1 with the number of elements in the sublist. If the comparison indicates that all the elements in the sublist have not been processed, then the processing repeats blocks 124-142 until all the elements in the sublist have been processed.
Once all the elements have been processed, the elements held in available memory starting from the front of the available memory 54 and ending with location MEM-FRONT - 1 are copied 146 in order to the sublist beginning at the location of CLP. Then, the elements held in available memory starting from the end of the available memory 54 and ending with location MEM-REAR + 1 are copied 148 in reverse order to the sublist.
Next, a decision 150 is made based on whether both of the sublist have been processed. If only the first sublist has been processed, then pointer P1 is set 152 to the address of the first element in the second sublist, and blocks 122-148 are repeated for the second sublist.
At this point, both sublists have been individually partitioned. To obtain the desired partitioned list, these partitioned sublists must be combined. Initially, the first and second parts of the second sublist are swapped 154 with the third part of the first sublist. Thereafter, the first part of the second sublist is swapped 156 with the third part of the first sublist. Notice that in this enibodinient two swap operations are needed for the 24 three-way partition, whereas in the first embodiment only one swap was needed.
Finally, partition pointers are set 158 for the partitioned list. Since the list is partitioned into three parts, two pointers are needed to point to the boundaries between the parts.
Although the above-described method describes the operation of the spaceadaptive technique, the second embodiment also operates in the case where the available memory 54 is at least equal to the size of the list 56. In this case, the list 56 need not be divided or otherwise adapted to the size of the available memory 54. Here, the method operates as follows. The decision 116 directs control to block 160 when the amount of available memory 54 is at least equal to the size of the list 56 (FIG. 7Q. Initially, pointers are initialized. In particular, Pointer P1 is set 160 to the address of the first element in the list 56, CLP is set 160 to the value of P I, pointer MEM-FRONT is set 160 to the first location of available memory 54, and pointer MEM-REAR is set 160 to the last location of available memory 54.
Once the pointers are initialized. the elements are compared one at a time with the predicate, which is the random element selected 108. As before, there are three results of the comparison, namely "-1 ", 11Oll or " + 1 11 or <. = or >, the later is used hereinafter. If decision 162 determines that the element at P 1 is less than the predicate, then the element at P 1 is moved 164 to the list at the location indicated by CLP, and CLP is incremented 166. If decision 168 determines that the element at P1 is equal to the predicate, then the element at P1 is moved 170 to the available memory 54 at the location of MEM-FRONT, and MEM-FRONT is incremented 172. If decision 174 determines that the element at P1 is greater than the predicate. then-the element at P1 is moved 176 to the available memory 54 at the location of N4EM-REAR. and MEM-REAR is incremented 178. Whichever of these three branches that is active for a 0 t_ given element, afterwards the pointer P1 is incremented 180 so as to point to the element to be processed next.
Next, a decision 182 is made based on a comparison of pointer P1 with the number of elements in the list 56. If the comparison indicates that all the elements in the list 56 have not been processed, then the processing repeats blocks 162-180 until all the elements in the list have been processed.
Once all the elements in the list 56 have been processed, the elements held in available memory starting from the front of the available memory 54 and ending with location MEM-FRONT - 1 are copied 184 in order to the list beginning at the location of CLP. Then, the elements held in available memory starting from the end of the available memory 54 and ending with location MEM-REAR + 1 are copied 186 in reverse order to the list 56.
At this point. the list 56 is partitioned. The partition pointers are set 188 for the partitioned list. The processing according to the second embodiment is then completed.
Although the above-described embodiments physically move the position (or elements) within a sequential list so as to sort the list (or merge sorted list), the objects need not be physically moved if the sequential list itself comprises pointers. For example, the sequential list may be a one-dimensional array of pointers which point to individual database records which are to be sorted. Here, the objects of the list point to the storage location (e.g., record) of the elements actually being sorted or merged.
In any event, the space-adaptive stable merging and sorting techniques described can achieve substantial extra memory speed gains when only a small amount ol is available.
26 The disclosures in United States patent application no. 08/155,966, from which this application claims priority, and in the abstract accompanying this application are incorporated herein by reference.
0 27

Claims (9)

Claims
1. A method of adapting a processing operation to memory availability in a data processing system comprising the steps of a) b) c) d) e) receiving at least one sequential list having a list size; determining an amount of memory which is available for use; comparing the size of the at least one sequential list with the amount of memory which is available for use; dividing the at least one sequential list into sublists based on said comparison; and performing the processing operation on each of the sublists.
2. A method as recited in claim 1, comprising the step of receiving first and second sorted sequential lists having list sizes, comparing the list sizes with the amount of the memory which is available for use, and when the size of both of the sequential sorted lists exceeds the amount of the memory which is available for use, dividing one of the sequential sorted lists into first and second sublists, and wherein the processing operation merges in stabl manner the first and second sequential sorted lists by segmenting the other of the sequential sorted lists in accordance with a selected element of the second sublist to produce third and fourth sublists, and merging in stable manner the first and third sublists e 28 using the memory and merging in stable manner the second and fourth sublists using the memory to produce a merged list.
If
3. A method as recited in claim 2, wherein the selected element is the first element in the second sublist.
4. A method as recited in claim 2 or 3, wherein the first and second sorted sequential lists form a contiguous input list, and the method includes the step of rearranging the elements in the input list to produce the sorted list.
A method as recited in claim 1, comprising the steps of receiving a sequential list having a list size, dividing the sequential list into first and second sublists, comparing the sizes of the sublists with the amount of the memory which is available for use, dividing the sublists into further sublists when said comparison determines that the size of the sublists exceeds the amount of the memory which is available for use, wherein the processing operation sorts in stable manner the sequential list by sorting each of the sublists to obtain sorted sublists, and merging in stable manner the sublists to obtain a sorted list.
6. A method as recited in claim 5, wherein said sorting is a mergesorting procedure.
7. A method as recited in claim 5 or 6, wherein said sorting comprises the steps of dividing one of the sublists into chunks of a predetermined size; sorting each of the chunks using an insertion sort 29 procedure; and merging the sorted chunks to obtain one of the sorted sublists.
0
8. A method as recited in claim 1, comprising the steps of receiving a sequential list having a list size, comparing the size of the list with the amount of the memory which is available for use, wherein when said comparison determines that the size of the list exceeds the amount of the memory available for use the step of dividing the list comprises the steps of selecting one of the elements of the list as a predicate and partitioning in stable manner the list using the predicate to produce three sublists, the processing operation sorting the sequential list by sorting in stable manner certain of the sublists which have a size which does not exceed the amount of the memory which is available for use, thereby obtaining a sorted list.
9. A method of adapting a processing operation to memory availability substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.
GB9423239A 1993-11-19 1994-11-17 Sorting or merging lists Withdrawn GB2284079A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15596693A 1993-11-19 1993-11-19

Publications (2)

Publication Number Publication Date
GB9423239D0 GB9423239D0 (en) 1995-01-04
GB2284079A true GB2284079A (en) 1995-05-24

Family

ID=22557498

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9423239A Withdrawn GB2284079A (en) 1993-11-19 1994-11-17 Sorting or merging lists

Country Status (3)

Country Link
JP (1) JPH07191827A (en)
DE (1) DE4438652A1 (en)
GB (1) GB2284079A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942200A (en) * 2013-01-18 2014-07-23 佳能株式会社 Ordered list matching method and device and document character matching method and device
US8819376B2 (en) 2012-04-23 2014-08-26 Hewlett-Packard Development Company, L. P. Merging arrays using shiftable memory
WO2014186242A1 (en) * 2013-05-13 2014-11-20 Microsoft Corporation Merging of sorted lists using array pair
US10334011B2 (en) * 2016-06-13 2019-06-25 Microsoft Technology Licensing, Llc Efficient sorting for a stream processing engine
KR20200094852A (en) * 2019-01-25 2020-08-10 전자부품연구원 Connected car big data acquisition device, system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1289451A (en) * 1970-05-04 1972-09-20
GB1378982A (en) * 1971-11-17 1975-01-02 Ibm Data processing apparatus
GB1385893A (en) * 1971-02-12 1975-03-05 Honeywell Inf Systems Sorting of data records
EP0221358A2 (en) * 1985-11-07 1987-05-13 International Business Machines Corporation Sort string generation in a staged storage system
EP0378038A2 (en) * 1989-01-13 1990-07-18 International Business Machines Corporation Partitioning of sorted lists for multiprocessor sort and merge

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1289451A (en) * 1970-05-04 1972-09-20
GB1385893A (en) * 1971-02-12 1975-03-05 Honeywell Inf Systems Sorting of data records
GB1378982A (en) * 1971-11-17 1975-01-02 Ibm Data processing apparatus
EP0221358A2 (en) * 1985-11-07 1987-05-13 International Business Machines Corporation Sort string generation in a staged storage system
EP0378038A2 (en) * 1989-01-13 1990-07-18 International Business Machines Corporation Partitioning of sorted lists for multiprocessor sort and merge

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819376B2 (en) 2012-04-23 2014-08-26 Hewlett-Packard Development Company, L. P. Merging arrays using shiftable memory
CN103942200A (en) * 2013-01-18 2014-07-23 佳能株式会社 Ordered list matching method and device and document character matching method and device
CN103942200B (en) * 2013-01-18 2017-08-18 佳能株式会社 Ordered list matching process and equipment, document character matching process and equipment
WO2014186242A1 (en) * 2013-05-13 2014-11-20 Microsoft Corporation Merging of sorted lists using array pair
CN105264488A (en) * 2013-05-13 2016-01-20 微软技术许可有限责任公司 Merging of sorted lists using array pair
US9418089B2 (en) 2013-05-13 2016-08-16 Microsoft Technology Licensing, Llc Merging of sorted lists using array pair
US20160350345A1 (en) * 2013-05-13 2016-12-01 Microsoft Technology Licensing, Llc Merging of sorted lists using array pair
CN105264488B (en) * 2013-05-13 2018-04-27 微软技术许可有限责任公司 For using array to merging the method and system of ordered list
US10002147B2 (en) 2013-05-13 2018-06-19 Microsoft Technology Licensing, Llc Merging of sorted lists using array pair
US10552397B2 (en) 2013-05-13 2020-02-04 Microsoft Technology Licensing, Llc Merging of sorted lists using array pair
US10334011B2 (en) * 2016-06-13 2019-06-25 Microsoft Technology Licensing, Llc Efficient sorting for a stream processing engine
KR20200094852A (en) * 2019-01-25 2020-08-10 전자부품연구원 Connected car big data acquisition device, system and method

Also Published As

Publication number Publication date
DE4438652A1 (en) 1995-05-24
GB9423239D0 (en) 1995-01-04
JPH07191827A (en) 1995-07-28

Similar Documents

Publication Publication Date Title
US6411957B1 (en) System and method of organizing nodes within a tree structure
EP0864130B1 (en) Storage and retrieval of ordered sets of keys in a compact 0-complete tree
US5487164A (en) Distribution-based replacement selection sorting system
US6216140B1 (en) Methodology for the efficient management of hierarchically organized information
Celis et al. Robin hood hashing
US4030077A (en) Multistage sorter having pushdown stacks for arranging an input list into numerical order
US5193207A (en) Link sorted memory
US5421007A (en) Key space analysis method for improved record sorting and file merging
US5367677A (en) System for iterated generation from an array of records of a posting file with row segments based on column entry value ranges
EP2178003A2 (en) Methods and apparatus for content-defined node splitting
US7017005B2 (en) Implementation of a content addressable memory using a RAM-cell structure
GB2284079A (en) Sorting or merging lists
US20010034749A1 (en) Spatial median filter
CN107533865B (en) Semiconductor device and information writing/reading method
GB2284078A (en) Stably partitioning a sequential list with limited memory
US20020150305A1 (en) Data filtering apparatus and method
WO1998056005A2 (en) Method and device for data sequence manipulation
GB2328531A (en) Storing a long record in a set of shorter keyed records
CN114461173A (en) Relational database data sorting method and device
Bergin et al. Dequeue Programming
Morgan Reduced Assignment Sorting
JPS62212726A (en) Compression processing system for index key
Elmasry Optimal adaptive sorting
Bergin et al. Lists
JPH05341960A (en) Data sorting system

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)