Hidden Complexity

I've said it many times:

Never use a tool you couldn't write yourself.

That is - make sure you understand what's going on under the hood.

In AP we've been playing with ArrayLists. The problem for today? Create an ArrayList with consecutive integers and then write a routine that will randomize the ArrayList.

For example, you might start with this ArrayList:

0,1,2,3,4,5

and end up with

3,5,1,4,2,0

First cut, the students grabbed a random element from the ArrayList, removed it, and added it to the end of a new list. Then repeat until the original list is empty and the new one is randomized. Then return the list.

public ArrayList<Integer> shuffle1(ArrayList<Integer> l){
ArrayList<Integer> result = new ArrayList<Integer>();
while (l.size()>0){
int i = rnd.nextInt(l.size());
int v = l.remove(i);
result.add(v);
}
return result;
}

Looks good.

Version two was much the same but after it removed a random value, it added it to the end of the same ArrayList:

public ArrayList<Integer> shuffle2(ArrayList<Integer> l){
ArrayList<Integer> result = new ArrayList<Integer>();
for (int s=l.size();s>0;s--) {
int i = rnd.nextInt(s);
int v = l.remove(i);
l.add(v);
}
return l;
}

Then it was time to time. Both seemed pretty quick but as our data set grew things seemed strange:

Size	Time
100,000	2 seconds
200,000	7 seconds
400,000	26 seonds

We're just looping through an ArrayList, what's going on? When we double the size of the list, it should just take double the time.

Since the class already wrote their own ArrayList implementation, they were quick to realize that every time we removed an item from the original Array, we were doing a linear or O(n) operation. That means our algorithms, which look linear, are in fact O(N²).

Can we do better? You bet. They just changed the remove and add to using get and set. Instead off removing an item and re-inserting it, just swap the randomly selected item with the last element:

public ArrayList<Integer> shuffle3(ArrayList<Integer> l){
ArrayList<Integer> result = new ArrayList<Integer>();
for (int s=l.size();s>0;s--) {
int i = rnd.nextInt(s);

int tmp = l.get(i);
l.set(i, l.get(s-1));
l.set(s-1,tmp);
}
return l;
}

No removes so no hidden linear component.

The run time?

Size	Time
100,000	.15 seconds
200,000	.16 seconds
400,000	.17 seonds

In fact, it took data sets in the size of millions before we even broke more than a couple of seconds.

The algorithm looks the same but understanding what goes on under the hood can make a big difference.

2014-11-17

#pedagogy