Alternate Representations

There was a comment on my last post about arbitary trees on Reddit talking about how this type of data structure was a hold over from the old days when computer resources were more limited. Nowadays having a list of children makes more sense. The comment was of course correct but I still think it's worth teaching representations like the one I spoke of in my last post.

Looking at interesting and different ways of representing data and modeling solutions is one of the things that separates programmers or coders from computer scientists and software engineers. I'm not talking about commonly accepted definitions or advanced classwork or theory and mathy stuff. I'm talking about how you approach a problem.

To me, the coder or programmer is someone who has learned rudimentary coding prowess but learned via rote example. This is typical of a code school graduate or many self taught programmers. This is not to say that people who learned this way are not computer scientists at heart nor that they don't have the ability to think deeply about problems but rather that their training was superficial - usually to get them to market quickly.

I know many people who started on their own following a book or mooc or code school and grew beyond those basics but we've also seen many "coders" who brute force using the same techniques regardless of appropriateness. I can't tell you how many times I've seen an SQL database used inappropriately as a flat file store or forcing relationships when not optimal in something like MongoDB.

We don't want our students to merely parrot the solution that we went over in class. We want them to consider the problem at hand and all the tools in their tool belt and come up with what makes sense.

I like the arbitrary tree example as it uses the same internal structure as a binary tree does:

For a binary search tree pointer1 is left and pointer 2 is right. For an arbitrary tree they represent first child and next sibling. The same structure can also be used for a doubly linked list with previous and next or a skip list. It's all about creatively using the tools at hand.

There are a number of lessons during which we can stretch our students thinking in terms of data representations in our early classes. One is when we can augment a data structure to avoid special cases. I wrote about this when solving a maze or the knight's tour.

Those problems are also great place to talk about implicit data structures. When we do our maze solver we talk about state space search. The idea that there's this huge graph where each node is the a potential state of the maze and the edges represent how we can move from state to state:#alternate-representations.org#

The above graph can be considered an implicit data structure - it never actually exists as a whole in our program. Our solver usually uses a two dimensional array to represent a specific state of our maze - that is, one node of the graph. We change the array as we make and return from recursive calls. That recursion creates and destroys nodes in our graph. At any point in time, each layer of stack in the recursion holds one node in our graph with the top of the stack being our current state but we never actually create or internally represent the full graph. It's an ephemeral implicit data structure.

Finally, one very early opportunity to discuss these types of ideas come when you use an array as a set of buckets where the index is the value and the entry is the count. It's pretty simple but a very different way of using an array or list when all you've seen is using it to store actual items. I like doing this in the context of a frequency lesson.

In any event, it's sometimes worthwhile to look at solutions that may no longer be optimal - they can be great triggers of both student thought and discussion.

2020-04-28

data structures

#data structures