KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > net > sf > saxon > sort > GenericSorter


1 package net.sf.saxon.sort;
2
3 /*
4 Copyright ? 1999 CERN - European Organization for Nuclear Research.
5 Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose
6 is hereby granted without fee, provided that the above copyright notice appear in all copies and
7 that both that copyright notice and this permission notice appear in supporting documentation.
8 CERN makes no representations about the suitability of this software for any purpose.
9 It is provided "as is" without expressed or implied warranty.
10 */

11
12 /**
13  * Modified by Michael Kay to use the Saxon Sortable interface rather than a separate IntComparator and Swapper
14  */

15
16
17 /**
18 Generically sorts arbitrary shaped data (for example multiple arrays, 1,2 or 3-d matrices, and so on) using a
19 quicksort or mergesort. This class addresses two problems, namely
20 <ul>
21   <li><i>Sorting multiple arrays in sync</i>
22   <li><i>Sorting by multiple sorting criteria</i> (primary, secondary, tertiary,
23     ...)
24 </ul>
25 <h4>Sorting multiple arrays in sync</h4>
26 <p>
27 Assume we have three arrays X, Y and Z. We want to sort all three arrays by
28   X (or some arbitrary comparison function). For example, we have<br>
29   <tt>X=[3, 2, 1], Y=[3.0, 2.0, 1.0], Z=[6.0, 7.0, 8.0]</tt>. The output should
30   be <tt><br>
31   X=[1, 2, 3], Y=[1.0, 2.0, 3.0], Z=[8.0, 7.0, 6.0]</tt>. </p>
32 <p>How can we achive this? Here are several alternatives. We could ... </p>
33 <ol>
34   <li> make a list of Point3D objects, sort the list as desired using a comparison
35     function, then copy the results back into X, Y and Z. The classic object-oriented
36     way. </li>
37   <li>make an index list [0,1,2,...,N-1], sort the index list using a comparison function,
38     then reorder the elements of X,Y,Z as defined by the index list. Reordering
39     cannot be done in-place, so we need to copy X to some temporary array, then
40     copy in the right order back from the temporary into X. Same for Y and Z.
41   </li>
42   <li> use a generic quicksort or mergesort which, whenever two elements in X are swapped,
43     also swaps the corresponding elements in Y and Z. </li>
44 </ol>
45 Alternatives 1 and 2 involve quite a lot of copying and allocate significant amounts
46 of temporary memory. Alternative 3 involves more swapping, more polymorphic message dispatches, no copying and does not need any temporary memory.
47 <p> This class implements alternative 3. It operates on arbitrary shaped data.
48   In fact, it has no idea what kind of data it is sorting. Comparisons and swapping
49   are delegated to user provided objects which know their data and can do the
50   job.
51 <p> Lets call the generic data <tt>g</tt> (it may be one array, three linked lists
52   or whatever). This class takes a user comparison function operating on two indexes
53   <tt>(a,b)</tt>, namely an {@link Sortable}. The comparison function determines
54   whether <tt>g[a]</tt> is equal, less or greater than <tt>g[b]</tt>. The sort,
55   depending on its implementation, can decide to swap the data at index <tt>a</tt>
56   with the data at index <tt>b</tt>. It calls a user provided {@link Sortable}
57   object that knows how to swap the data of these indexes.
58 <p>The following snippet shows how to solve the problem.
59 <table>
60 <td class="PRE">
61 <pre>
62 final int[] x;
63 final double[] y;
64 final double[] z;
65
66 x = new int[] {3, 2, 1 };
67 y = new double[] {3.0, 2.0, 1.0};
68 z = new double[] {6.0, 7.0, 8.0};
69
70
71 // this one knows how to swap two indexes (a,b)
72 Swapper swapper = new Swapper() {
73 &nbsp;&nbsp;&nbsp;public void swap(int a, int b) {
74 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;int t1; double t2, t3;
75 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;t1 = x[a]; x[a] = x[b]; x[b] = t1;
76 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;t2 = y[a]; y[a] = y[b]; y[b] = t2;
77 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;t3 = z[a]; z[a] = z[b]; z[b] = t3;
78 &nbsp;&nbsp;&nbsp;}
79 };
80 // simple comparison: compare by X and ignore Y,Z<br>
81 IntComparator comp = new IntComparator() {
82 &nbsp;&nbsp;&nbsp;public int compare(int a, int b) {
83 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return x[a]==x[b] ? 0 : (x[a]&lt;x[b] ? -1 : 1);
84 &nbsp;&nbsp;&nbsp;}
85 };
86
87 System.out.println("before:");
88 System.out.println("X="+Arrays.toString(x));
89 System.out.println("Y="+Arrays.toString(y));
90 System.out.println("Z="+Arrays.toString(z));
91
92 GenericSorting.quickSort(0, X.length, comp, swapper);
93 // GenericSorting.mergeSort(0, X.length, comp, swapper);
94
95 System.out.println("after:");
96 System.out.println("X="+Arrays.toString(x));
97 System.out.println("Y="+Arrays.toString(y));
98 System.out.println("Z="+Arrays.toString(z));
99 </pre>
100 </td>
101 </table>
102 <h4>Sorting by multiple sorting criterias (primary, secondary, tertiary, ...)</h4>
103 <p>Assume again we have three arrays X, Y and Z. Now we want to sort all three
104   arrays, primarily by Y, secondarily by Z (if Y elements are equal). For example,
105   we have<br>
106   <tt>X=[6, 7, 8, 9], Y=[3.0, 2.0, 1.0, 3.0], Z=[5.0, 4.0, 4.0, 1.0]</tt>. The
107   output should be <tt><br>
108   X=[8, 7, 9, 6], Y=[1.0, 2.0, 3.0, 3.0], Z=[4.0, 4.0, 1.0, 5.0]</tt>. </p>
109 <p>Here is how to solve the problem. All code in the above example stays the same,
110   except that we modify the comparison function as follows</p>
111 <table>
112 <td class="PRE">
113 <pre>
114 //compare by Y, if that doesn't help, reside to Z
115 IntComparator comp = new IntComparator() {
116 &nbsp;&nbsp;&nbsp;public int compare(int a, int b) {
117 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (y[a]==y[b]) return z[a]==z[b] ? 0 : (z[a]&lt;z[b] ? -1 : 1);
118 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return y[a]&lt;y[b] ? -1 : 1;
119 &nbsp;&nbsp;&nbsp;}
120 };
121 </pre>
122 </td>
123 </table>
124
125 <h4>Notes</h4>
126 <p></p>
127 <p> Sorts involving floating point data and not involving comparators, like, for
128   example provided in the JDK {@link java.util.Arrays} and in the Colt
129   (cern.colt.Sorting) handle floating point numbers in special ways to guarantee
130   that NaN's are swapped to the end and -0.0 comes before 0.0. Methods delegating
131   to comparators cannot do this. They rely on the comparator. Thus, if such boundary
132   cases are an issue for the application at hand, comparators explicitly need
133   to implement -0.0 and NaN aware comparisons. Remember: <tt>-0.0 < 0.0 == false</tt>,
134   <tt>(-0.0 == 0.0) == true</tt>, as well as <tt>5.0 &lt; Double.NaN == false</tt>,
135   <tt>5.0 &gt; Double.NaN == false</tt>. Same for <tt>float</tt>.
136 <h4>Implementation </h4>
137 <p>The quicksort is a derivative of the JDK 1.2 V1.26 algorithms (which are, in
138   turn, based on Bentley's and McIlroy's fine work).
139   The mergesort is a derivative of the JAL algorithms, with optimisations taken from the JDK algorithms.
140 Both quick and merge sort are "in-place", i.e. do not allocate temporary memory (helper arrays).
141 Mergesort is <i>stable</i> (by definition), while quicksort is not.
142 A stable sort is, for example, helpful, if matrices are sorted successively
143 by multiple columns. It preserves the relative position of equal elements.
144
145 @author wolfgang.hoschek@cern.ch
146 @version 1.0, 03-Jul-99
147 */

148 public class GenericSorter extends Object JavaDoc {
149
150     private static final int SMALL = 7;
151     private static final int MEDIUM = 7;
152     private static final int LARGE = 40;
153
154
155     /**
156      * Makes this class non instantiable, but still let's others inherit from it.
157      */

158     protected GenericSorter() {}
159
160     /**
161      * Sorts the specified range of elements according
162      * to the order induced by the specified comparator. All elements in the
163      * range must be <i>mutually comparable</i> by the specified comparator
164      * (that is, <tt>c.compare(a, b)</tt> must not throw an
165      * exception for any indexes <tt>a</tt> and
166      * <tt>b</tt> in the range).<p>
167      *
168      * The sorting algorithm is a tuned quicksort,
169      * adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a
170      * Sort Function", Software-Practice and Experience, Vol. 23(11)
171      * P. 1249-1265 (November 1993). For details, see
172      * http://citeseer.ist.psu.edu/bentley93engineering.html.
173      * This algorithm offers n*log(n) performance on many data sets that cause other
174      * quicksorts to degrade to quadratic performance.
175      *
176      * @param fromIndex the index of the first element (inclusive) to be sorted.
177      * @param toIndex the index of the last element (exclusive) to be sorted.
178      * @param c the comparator to determine the order of the generic data;
179      * an object that knows how to swap the elements at any two indexes (a,b).
180      *
181      */

182     public static void quickSort(int fromIndex, int toIndex, Sortable c) {
183         quickSort1(fromIndex, toIndex-fromIndex, c);
184     }
185
186     /**
187      * Sorts the specified sub-array into ascending order.
188      */

189     private static void quickSort1(int off, int len, Sortable comp) {
190         // Insertion sort on smallest arrays
191
if (len < SMALL) {
192             for (int i=off; i<len+off; i++)
193                 for (int j=i; j>off && (comp.compare(j-1,j)>0); j--) {
194                     comp.swap(j, j-1);
195                 }
196             return;
197         }
198
199         // Choose a partition element, v
200
int m = off + (len >>> 1); // len/2; // Small arrays, middle element
201

202         if (len > MEDIUM) {
203             int l = off;
204             int n = off + len - 1;
205             if (len > LARGE) { // Big arrays, pseudomedian of 9
206
int s = len >>> 3; // len/8;
207
l = med3(l, l+s, l+2*s, comp);
208                 m = med3(m-s, m, m+s, comp);
209                 n = med3(n-2*s, n-s, n, comp);
210             }
211 // m = med3(l, m, n, comp); // Mid-size, med of 3
212
// manually inlined (most time is spent near the leafs of the recursion tree)
213
//a = comp.compare(l,m);
214
//b = comp.compare(l,n);
215
int c = comp.compare(m,n);
216             m = (comp.compare(l,m)<0 ?
217                 (c<0 ? m : comp.compare(l,n)<0 ? n : l) :
218                 (c>0 ? m : comp.compare(l,n)>0 ? n : l));
219         }
220         //long v = x[m];
221

222         // Establish Invariant: v* (<v)* (>v)* v*
223
int a = off, b = a, c = off + len - 1, d = c;
224         while (true) {
225             int comparison;
226             while (b <= c && ((comparison=comp.compare(b,m))<=0)) {
227                 if (comparison == 0) {
228                     if (a==m) m = b; // pivot is moving target; DELTA to JDK !!!
229
else if (b==m) m = a; // pivot is moving target; DELTA to JDK !!!
230
comp.swap(a++, b);
231                 }
232                 b++;
233             }
234             while (c >= b && ((comparison=comp.compare(c,m))>=0)) {
235                 if (comparison == 0) {
236                     if (c==m) m = d; // pivot is moving target; DELTA to JDK !!!
237
else if (d==m) m = c; // pivot is moving target; DELTA to JDK !!!
238
comp.swap(c, d--);
239                 }
240                 c--;
241             }
242             if (b > c) break;
243             if (b==m) m = d; // pivot is moving target; DELTA to JDK !!!
244
else if (c==m) m = c; // pivot is moving target; DELTA to JDK !!!
245
comp.swap(b++, c--);
246         }
247
248         // Swap partition elements back to middle
249

250         int s = Math.min(a-off, b-a );
251         // vecswap(swapper, off, b-s, s);
252
// manually inlined
253
int aa = off; int bb = b-s;
254         while (--s >= 0) comp.swap(aa++, bb++);
255         int n = off + len;
256         s = Math.min(d-c, n-d-1);
257         // vecswap(swapper, b, n-s, s); // manually inlined
258
aa = b; bb = n-s;
259         while (--s >= 0) comp.swap(aa++, bb++);
260
261         // Recursively sort non-partition-elements
262
if ((s = b-a) > 1)
263             quickSort1(off, s, comp);
264         if ((s = d-c) > 1)
265             quickSort1(n-s, s, comp);
266     }
267
268     /**
269      * Returns the index of the median of the three indexed elements.
270      */

271     private static int med3(int a, int b, int c, Sortable comp) {
272             int bc = comp.compare(b,c);
273             return (comp.compare(a,b)<0 ?
274                 (bc<0 ? b : comp.compare(a,c)<0 ? c : a) :
275                 (bc>0 ? b : comp.compare(a,c)>0 ? c : a));
276         }
277
278
279 // /**
280
// * Swaps x[a .. (a+n-1)] with x[b .. (b+n-1)].
281
// */
282
// private static void vecswap(Swapper swapper, int a, int b, int n) {
283
// for (int i=0; i<n; i++, a++, b++) swapper.swap(a, b);
284
// }
285

286     /**
287      * Sorts the specified range of elements according
288      * to the order induced by the specified comparator. All elements in the
289      * range must be <i>mutually comparable</i> by the specified comparator
290      * (that is, <tt>c.compare(a, b)</tt> must not throw an
291      * exception for any indexes <tt>a</tt> and
292      * <tt>b</tt> in the range).<p>
293      *
294      * This sort is guaranteed to be <i>stable</i>: equal elements will
295      * not be reordered as a result of the sort.<p>
296      *
297      * The sorting algorithm is a modified mergesort (in which the merge is
298      * omitted if the highest element in the low sublist is less than the
299      * lowest element in the high sublist). This algorithm offers guaranteed
300      * n*log(n) performance, and can approach linear performance on nearly
301      * sorted lists.
302      *
303      * @param fromIndex the index of the first element (inclusive) to be sorted.
304      * @param toIndex the index of the last element (exclusive) to be sorted.
305      * @param c the comparator to determine the order of the generic data;
306      * an object that knows how to swap the elements at any two indexes (a,b).
307      *
308      */

309     public static void mergeSort(int fromIndex, int toIndex, Sortable c) {
310         /*
311          * We retain the same method signature as quickSort. Given only a
312          * comparator and swapper we do not know how to copy and move elements
313          * from/to temporary arrays. Hence, in contrast to the JDK mergesorts
314          * this is an "in-place" mergesort, i.e. does not allocate any temporary
315          * arrays. A non-inplace mergesort would be faster in most cases, but
316          * would require non-intuitive delegate objects. Remember that an
317          * in-place merge phase requires N logN swaps, while an out-of-place
318          * merge phase requires only N swaps. This doesn't matter much if swaps
319          * are cheap and comparisons are expensive. Nonetheless this can
320          * certainly be suboptimal.
321          */

322
323
324         // Insertion sort on smallest arrays
325
if (toIndex - fromIndex < SMALL) {
326             for (int i = fromIndex; i < toIndex; i++) {
327                 for (int j = i; j > fromIndex && (c.compare(j - 1, j) > 0); j--) {
328                     c.swap(j, j - 1);
329                 }
330             }
331             return;
332         }
333
334         // Recursively sort halves
335
int mid = (fromIndex + toIndex) >>> 1; // (fromIndex + toIndex) / 2;
336
mergeSort(fromIndex, mid, c);
337         mergeSort(mid, toIndex, c);
338
339         // If list is already sorted, nothing left to do. This is an
340
// optimization that results in faster sorts for nearly ordered lists.
341
if (c.compare(mid - 1, mid) <= 0) return;
342
343         // Merge sorted halves
344
inplaceMerge(fromIndex, mid, toIndex, c);
345     }
346
347     /**
348      * Transforms two consecutive sorted ranges into a single sorted
349      * range. The initial ranges are <code>[first, middle)</code>
350      * and <code>[middle, last)</code>, and the resulting range is
351      * <code>[first, last)</code>.
352      * Elements in the first input range will precede equal elements in the
353      * second.
354      */

355     private static void inplaceMerge(int first, int middle, int last, Sortable comp) {
356         if (first >= middle || middle >= last)
357             return;
358         if (last - first == 2) {
359             if (comp.compare(middle, first)<0) {
360                 comp.swap(first,middle);
361             }
362             return;
363         }
364         int firstCut;
365         int secondCut;
366         if (middle - first > last - middle) {
367             firstCut = first + ((middle - first) >>> 1); // first + ((middle - first) / 2);
368
// secondCut = lower_bound(middle, last, firstCut, comp);
369
// manually inlined for speed (speedup = 2)
370
int _first = middle;
371             int len = last - _first;
372             while (len > 0) {
373                 int half = len >>> 1; // len / 2;
374
int mid = _first + half;
375                 if (comp.compare(mid, firstCut)<0) {
376                     _first = mid + 1;
377                     len -= half + 1;
378                 }
379                 else {
380                     len = half;
381                 }
382             }
383             secondCut = _first;
384         }
385         else {
386             secondCut = middle + ((last - middle) >>> 1); // middle + ((last - middle) / 2);
387
// firstCut = upper_bound(first, middle, secondCut, comp);
388
// manually inlined for speed (speedup = 2)
389
int _first = first;
390             int len = middle - _first;
391             while (len > 0) {
392                 int half = len >>> 1; // len / 2;
393
int mid = _first + half;
394                 if (comp.compare(secondCut, mid)<0) {
395                     len = half;
396                 }
397                 else {
398                     _first = mid + 1;
399                     len -= half + 1;
400                 }
401             }
402             firstCut = _first;
403         }
404
405         // rotate(firstCut, middle, secondCut, swapper);
406
// is manually inlined for speed
407
// (hotspot compiler inlining in recursive methods seems to work only for
408
// small call depths, even if methods are "static private")
409
// speedup = 1.7
410
// begin inline
411
int first2 = firstCut; int middle2 = middle; int last2 = secondCut;
412         if (middle2 != first2 && middle2 != last2) {
413             int first1 = first2; int last1 = middle2;
414             while (first1 < --last1) comp.swap(first1++,last1);
415             first1 = middle2; last1 = last2;
416             while (first1 < --last1) comp.swap(first1++,last1);
417             first1 = first2; last1 = last2;
418             while (first1 < --last1) comp.swap(first1++,last1);
419         }
420         // end inline
421

422         middle = firstCut + (secondCut - middle);
423         inplaceMerge(first, firstCut, middle, comp);
424         inplaceMerge(middle, secondCut, last, comp);
425     }
426
427 // /**
428
// * Performs a binary search on an already-sorted range: finds the first
429
// * position where an element can be inserted without violating the ordering.
430
// * Sorting is by a user-supplied comparison function.
431
// * @param array Array containing the range.
432
// * @param first Beginning of the range.
433
// * @param last One past the end of the range.
434
// * @param x Element to be searched for.
435
// * @param comp Comparison function.
436
// * @return The largest index i such that, for every j in the
437
// * range <code>[first, i)</code>,
438
// * <code>comp.apply(array[j], x)</code> is
439
// * <code>true</code>.
440
// * @see Sorting#upper_bound
441
// * @see Sorting#equal_range
442
// * @see Sorting#binary_search
443
// */
444
// private static int lower_bound(int first, int last, int x, IntComparator comp) {
445
// int len = last - first;
446
// while (len > 0) {
447
// int half = len >>> 1; // len / 2;
448
// int middle = first + half;
449
// if (comp.compare(middle, x)<0) {
450
// first = middle + 1;
451
// len -= half + 1;
452
// }
453
// else {
454
// len = half;
455
// }
456
// }
457
// return first;
458
// }
459
//
460
// /**
461
// * Performs a binary search on an already-sorted range: finds the last
462
// * position where an element can be inserted without violating the ordering.
463
// * Sorting is by a user-supplied comparison function.
464
// * @param array Array containing the range.
465
// * @param first Beginning of the range.
466
// * @param last One past the end of the range.
467
// * @param x Element to be searched for.
468
// * @param comp Comparison function.
469
// * @return The largest index i such that, for every j in the
470
// * range <code>[first, i)</code>,
471
// * <code>comp.apply(x, array[j])</code> is
472
// * <code>false</code>.
473
// * @see Sorting#lower_bound
474
// * @see Sorting#equal_range
475
// * @see Sorting#binary_search
476
// */
477
// private static int upper_bound(int first, int last, int x, IntComparator comp) {
478
// int len = last - first;
479
// while (len > 0) {
480
// int half = len >>> 1; // len / 2;
481
// int middle = first + half;
482
// if (comp.compare(x, middle)<0) {
483
// len = half;
484
// }
485
// else {
486
// first = middle + 1;
487
// len -= half + 1;
488
// }
489
// }
490
// return first;
491
// }
492

493 }
494
Popular Tags