KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > org > apache > commons > jrcs > diff > SimpleDiff


1 /*
2  * ====================================================================
3  *
4  * The Apache Software License, Version 1.1
5  *
6  * Copyright (c) 1999-2003 The Apache Software Foundation.
7  * All rights reserved.
8  *
9  * Redistribution and use in source and binary forms, with or without
10  * modification, are permitted provided that the following conditions
11  * are met:
12  *
13  * 1. Redistributions of source code must retain the above copyright
14  * notice, this list of conditions and the following disclaimer.
15  *
16  * 2. Redistributions in binary form must reproduce the above copyright
17  * notice, this list of conditions and the following disclaimer in
18  * the documentation and/or other materials provided with the
19  * distribution.
20  *
21  * 3. The end-user documentation included with the redistribution, if
22  * any, must include the following acknowledgement:
23  * "This product includes software developed by the
24  * Apache Software Foundation (http://www.apache.org/)."
25  * Alternately, this acknowledgement may appear in the software itself,
26  * if and wherever such third-party acknowledgements normally appear.
27  *
28  * 4. The names "The Jakarta Project", "Commons", and "Apache Software
29  * Foundation" must not be used to endorse or promote products derived
30  * from this software without prior written permission. For written
31  * permission, please contact apache@apache.org.
32  *
33  * 5. Products derived from this software may not be called "Apache"
34  * nor may "Apache" appear in their names without prior written
35  * permission of the Apache Software Foundation.
36  *
37  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
38  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
39  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
40  * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
41  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
42  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
43  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
44  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
45  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
46  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
47  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
48  * SUCH DAMAGE.
49  * ====================================================================
50  *
51  * This software consists of voluntary contributions made by many
52  * individuals on behalf of the Apache Software Foundation. For more
53  * information on the Apache Software Foundation, please see
54  * <http://www.apache.org/>.
55  *
56  */

57
58 package org.apache.commons.jrcs.diff;
59
60 import java.util.*;
61
62 /**
63  * Implements a simple differencing algortithm.<p>
64  *
65  * @date $Date: 2005/06/16 18:11:11 $
66  * @version $Revision: 1.2 $
67  * @author <a HREF="mailto:juanco@suigeneris.org">Juanco Anez</a>
68  *
69  * <p><b>Overview of Algorithm</b></p>
70  *
71  * <p><i>by <a
72  * HREF='http://www.topmeadow.net/bwm'> bwm</a>
73  * </p>
74  *
75  * <p>The algorithm is optimised for situations where the input sequences
76  * have few repeated objects. If it is given input with many repeated
77  * objects it will report sub-optimal changes. However, given appropriate
78  * input, it is fast, and linear in memory usage.</p>
79  *
80  * <p>The algorithm consists of the following steps:</p>
81  * <ul>
82  * <li>compute an equivalence set for the input data</li>
83  * <li>translate each element of the orginal
84  * and revised input sequences to a member of the equivalence set
85  * </li>
86  * <li>match the the input sequences to determine the deltas, i.e.
87  * the differences between the original and revised sequences.</li>
88  * </ul>
89  *
90  * <p>The first step is to compute a an equivalence set for the input data.
91  * The equivalence set is computed from objects that are in the original
92  * input sequence</p>
93  * <pre>
94  * eq(x) = the index of the first occurence of x in the original sequence.
95  * </pre>
96  *
97  * <p>With this equivalence function, the algorithm can compare integers rather
98  * than strings, which is considerably more efficient.</p>
99  *
100  * <p>The second step is to compute the datastructure on which the
101  * algorithm will operate. Having computed the equivalence function
102  * in the previous step, we can compute two arrays where
103  * indx[i] = eqs(orig[i]) and jndx[i] = eqs(rev[i]). The algorithm can
104  * now operate on indx and jndx instead of orig and rev. Thus, comparisons
105  * are then on O(int == int) instead of O(Object.equals(Object)).
106  * </p>
107  *
108  * <p>The algorithm now matches indx and jndx. Whilst indx[i] == jndx[i]
109  * it skips matching objects in the sequence. In seeking to match objects
110  * in the input sequence it assumes that each object is likely to be unique.
111  * It uses the known characteristics of the unique equivalence function. It can
112  * tell from the eq value if this object appeared in the other sequence
113  * at all. If it did not, there is no point in searching for a match.</p>
114  *
115  * <p>Recall that the eq function value is the index earliest occurrence in
116  * the orig sequence. This information is used to search efficiently for
117  * the next match. The algorithm is perfect when all input objects are
118  * unique, but degrades when input objects are not unique. When input
119  * objects are not unique an optimal match may not be found, but a
120  * correct match will be.</p>
121  *
122  * <p>Having identified common matching objects in the orig and revised
123  * sequences, the differences between them are easily computed.
124  * </p>
125  *
126  * @see Delta
127  * @see Revision
128  * Modifications:
129  *
130  * 27/Apr/2003 bwm
131  * Added some comments whilst trying to figure out the algorithm
132  *
133  * 03 May 2003 bwm
134  * Created this implementation class by refactoring it out of the Diff
135  * class to enable plug in difference algorithms
136  *
137  */

138 public class SimpleDiff
139     implements DiffAlgorithm
140 {
141
142     static final int NOT_FOUND_i = -2;
143     static final int NOT_FOUND_j = -1;
144     static final int EOS = Integer.MAX_VALUE;
145
146     public SimpleDiff()
147     {
148         // do nothing
149
}
150
151     protected int scan(int[] ndx, int i, int target)
152     {
153         while (ndx[i] < target)
154         {
155             i++;
156         }
157         return i;
158     }
159
160     /**
161      * Compute the difference between original and revised sequences.
162      *
163      * @param orig The original sequence.
164      * @param rev The revised sequence to be compared with the original.
165      * @return A Revision object describing the differences.
166      * @throws DifferenciationFailedException if the diff could not be computed.
167      */

168     public Revision diff(Object JavaDoc[] orig, Object JavaDoc[] rev)
169         throws DifferentiationFailedException
170     {
171         // create map eqs, such that for each item in both orig and rev
172
// eqs(item) = firstOccurrence(item, orig);
173
Map eqs = buildEqSet(orig, rev);
174
175         // create an array such that
176
// indx[i] = NOT_FOUND_i if orig[i] is not in rev
177
// indx[i] = firstOccurrence(orig[i], orig)
178
int[] indx = buildIndex(eqs, orig, NOT_FOUND_i);
179
180         // create an array such that
181
// jndx[j] = NOT_FOUND_j if orig[j] is not in rev
182
// jndx[j] = firstOccurrence(rev[j], orig)
183
int[] jndx = buildIndex(eqs, rev, NOT_FOUND_j);
184
185         // what in effect has been done is to build a unique hash
186
// for each item that is in both orig and rev
187
// and to label each item in orig and new with that hash value
188
// or a marker that the item is not common to both.
189

190         eqs = null; // let gc know we're done with this
191

192         Revision deltas = new Revision(); //!!! new Revision()
193
int i = 0;
194         int j = 0;
195
196         // skip matching
197
// skip leading items that are equal
198
// could be written
199
// for (i=0; indx[i] != EOS && indx[i] == jndx[i]; i++);
200
// j = i;
201
for (; indx[i] != EOS && indx[i] == jndx[j]; i++, j++)
202         {
203             /* void */
204         }
205
206         while (indx[i] != jndx[j])
207         { // only equal if both == EOS
208
// they are different
209
int ia = i;
210             int ja = j;
211
212             // size of this delta
213
do
214             {
215                 // look down rev for a match
216
// stop at a match
217
// or if the FO(rev[j]) > FO(orig[i])
218
// or at the end
219
while (jndx[j] < 0 || jndx[j] < indx[i])
220                 {
221                     j++;
222                 }
223                 // look down orig for a match
224
// stop at a match
225
// or if the FO(orig[i]) > FO(rev[j])
226
// or at the end
227
while (indx[i] < 0 || indx[i] < jndx[j])
228                 {
229                     i++;
230                 }
231
232                 // this doesn't do a compare each line with each other line
233
// so it won't find all matching lines
234
}
235             while (indx[i] != jndx[j]);
236
237             // on exit we have a match
238

239             // they are equal, reverse any exedent matches
240
// it is possible to overshoot, so count back matching items
241
while (i > ia && j > ja && indx[i - 1] == jndx[j - 1])
242             {
243                 --i;
244                 --j;
245             }
246
247             deltas.addDelta(Delta.newDelta(new Chunk(orig, ia, i - ia),
248                                            new Chunk(rev, ja, j - ja)));
249             // skip matching
250
for (; indx[i] != EOS && indx[i] == jndx[j]; i++, j++)
251             {
252                 /* void */
253             }
254         }
255         return deltas;
256     }
257
258     /**
259      * create a <code>Map</code> from each common item in orig and rev
260      * to the index of its first occurrence in orig
261      *
262      * @param orig the original sequence of items
263      * @param rev the revised sequence of items
264      */

265     protected Map buildEqSet(Object JavaDoc[] orig, Object JavaDoc[] rev)
266     {
267         // construct a set of the objects that orig and rev have in common
268

269         // first construct a set containing all the elements in orig
270
Set items = new HashSet(Arrays.asList(orig));
271
272         // then remove all those not in rev
273
items.retainAll(Arrays.asList(rev));
274
275         Map eqs = new HashMap();
276         for (int i = 0; i < orig.length; i++)
277         {
278             // if its a common item and hasn't been found before
279
if (items.contains(orig[i]))
280             {
281                 // add it to the map
282
eqs.put(orig[i], new Integer JavaDoc(i));
283                 // and make sure its not considered again
284
items.remove(orig[i]);
285             }
286         }
287         return eqs;
288     }
289
290     /**
291      * build a an array such each a[i] = eqs([i]) or NF if eqs([i]) undefined
292      *
293      * @param eqs a mapping from Object to Integer
294      * @param seq a sequence of objects
295      * @param NF the not found marker
296      */

297     protected int[] buildIndex(Map eqs, Object JavaDoc[] seq, int NF)
298     {
299         int[] result = new int[seq.length + 1];
300         for (int i = 0; i < seq.length; i++)
301         {
302             Integer JavaDoc value = (Integer JavaDoc) eqs.get(seq[i]);
303             if (value == null || value.intValue() < 0)
304             {
305                 result[i] = NF;
306             }
307             else
308             {
309                 result[i] = value.intValue();
310             }
311         }
312         result[seq.length] = EOS;
313         return result;
314     }
315
316 }
317
Popular Tags