# From selection to sorting

# COMMENTS<!DOCTYPE html>

<script type="text/javascript" src="http://orgmode.org/mathjax/MathJax.js"></script> <script type="text/javascript" src="assets/static/mj.js"></script>

<style> div.center {text-align:center;} </style>

<p> When I first saw the <a href="http://en.wikipedia.org/wiki/Quicksort">quicksort</a> it was in an algorithms class back in the day. We first learned the quicksort, then choosing a good pivot element and then as an afterthought we did <a href="http://en.wikipedia.org/wiki/Quickselect">quickselect</a>. </p>

<p> Fast forward to teaching. I was never really happy teaching quicksort. Mergesort is easy to motivate and it's pretty easy to write. Quicksort always felt a little forced. </p>

<p> I thought I'd try switching things up this time and doing quickselect first. </p>

<p> The motivating problem: find the K<sup>th</sup> smallest item in a list - in our case the list is an array of ints. </p>

<p> I want to start with the least efficient algorithm so I stack the deck. I remind them that we've been finding the smallest item in a list for two years now. </p>

<p> They don't disappoint and suggest something like this: </p>

<div class="org-src-container">

<pre class="src src-python"><span style="color: #DFAF8F;">L</span> = [10,3,28,82,14,42,66,74,81]

<span style="color: #F0DFAF; font-weight: bold;">def</span> <span style="color: #93E0E3;">findKth</span>(L,k): <span style="color: #DFAF8F;">omits</span>=[] <span style="color: #F0DFAF; font-weight: bold;">for</span> i <span style="color: #F0DFAF; font-weight: bold;">in</span> <span style="color: #DCDCCC; font-weight: bold;">range</span>(k): <span style="color: #DFAF8F;">ans</span>=<span style="color: #DCDCCC; font-weight: bold;">max</span>(L) <span style="color: #F0DFAF; font-weight: bold;">for</span> item <span style="color: #F0DFAF; font-weight: bold;">in</span> L: <span style="color: #F0DFAF; font-weight: bold;">if</span> item < ans <span style="color: #F0DFAF; font-weight: bold;">and</span> item <span style="color: #F0DFAF; font-weight: bold;">not</span> <span style="color: #F0DFAF; font-weight: bold;">in</span> omits: <span style="color: #DFAF8F;">ans</span>=item omits.append(ans) <span style="color: #F0DFAF; font-weight: bold;">return</span> ans

<span style="color: #F0DFAF; font-weight: bold;">print</span> findKth(L,3) </pre> </div>

<p> Clearly an \(O(n^2)\) algorithm. </p>

<p> Can we do better? </p>

<p> Certainly. </p>

<p> The students then suggest sorting the data set first. If we use mergesort, we can sort in \(O(nLg (n))\) time. This lead to a great conversation about sorting being so fast it's practically free and that you don't have to hard code everything from scratch. Not only is sorting the data set then plucking the k<sup>th</sup> item out much faster, if you already have a sort written or if you use your language's library's sort, it's much easier as well: </p>

<div class="org-src-container">

<pre class="src src-python"><span style="color: #F0DFAF; font-weight: bold;">def</span> <span style="color: #93E0E3;">findKth</span>(L,k): <span style="color: #DFAF8F;">tmp</span> = <span style="color: #DCDCCC; font-weight: bold;">sorted</span>(L) <span style="color: #F0DFAF; font-weight: bold;">return</span> tmp[k] </pre> </div>

<p> But we can do even better. So now we talk about <b>quickselect</b> </p>

<p> We pick a random pivot, partition the list a la quicksort (reorder the list such that all items less than the pivot are to its left, and all items greater than the pivot are to its right). </p>

<p> We now know that after partitioning. the pivot is in it's exact location. If its index is <b>k</b> then we're done. If not, we can recursively <b>quickselect</b> on either the left or right side. </p>

<p> Pretty cool, but is it faster? </p>

<p> It's easy to see that if we keep choosing a bad pivot (the smallest or largest in the list), each iteration takes \(n\) time to partition and each iteration takes one item out of contention. This takes us back to \(O(n^2)\). </p>

<p> However… </p>

<p> If we choose a good partition – at the middle of the list, each partition takes less and less time. We get a run time of: </p>

<p> \(n+\frac{n}{2} +\frac{n}{4}+\frac{n}{8}+\dots\) and since \(\frac{n}{2} +\frac{n}{4}+\frac{n}{8}\dots=n\) this becomes an \(O(2n)\), or \(O(n)\) algorithm. </p>

<p> That's really cool. </p>

<p> Homework was the actual implementation. </p>

<p> I think this might be a better way to approach quicksort. It seems less forced, plus the class gets to go through the exercise of taking an algorithm form \(O(n^2)\) to \(O(nlg(n))\) to \(O(n)\). </p>

<p> Next, moving to the quicksort and also showing that we can indeed avoid those really bad pivots. </p>

<h4>Addendum</h4>

We moved to quicksort today and overall I'm happy with this approach. The only thing I think needs tweaking is going from the idea of partitioning to Java code. Java makes it somewhat of a bear. <br>

Tweet