Friday, December 6, 2013

Sorts Sorts Sorts

Selection sort is probably one of the most natural algorithms for a programmer to implement, and may come to mind even before insertion sort, as it is conceptually extremely simple. The algorithm consists of taking the smallest item in the original list and adding it to the empty sorted list. This could be done in place or using two different lists. It can also be implemented to be stable, keeping elements of equal value in the same order. However, the simplicity of this algorithm means that it needs to scan the whole list n times, for each time it adds an element to the sorted array it needs to find the next least element, preforming (i-1) comparisons, leading to (n-1)+(n-2)+(n-3)… comparisons, or (n-1)(n-2)/2 comparisons, or a big O of n^2.

While selection sort can be implemented on small lists (really small), the inefficiency of the algorithm prevents it from having much use in practical applications. The two staple sorts which are easy to implement and provide adequate performance are quick sort and merge sort.

Quick sort is the next fastest of the three, according to sticky bigO performance. It runs in an average n log n time, and it is implemented also quite simply. To quick sort, one picks an element of the list to act as a pivot. The list is then partitioned into a part that contains all elements smaller than the pivot, and a part that contains all elements larger than the pivot. The two partitions are then themselves quick sorted, until all partitions contain one or 0 elements, which are already sorted. On average the list is split log n times, if the list is split approximately in half by the pivot. It then takes n comparisons to do the actual split, as we need to compare every element to the pivot. So we do n comparisons log n times on average, leading to a performance of nlogn. However, in the worst case the list is nearly sorted, so the pivot ends up splitting the list into an empty list and a list of n-1 elements. Then we do n comparisons to split the list, and end up making n splits, leading to a worst case performance of n^2, similar to selection sort. However, the probability of encountering an already sorted list is relatively small, so in practice Quick Sort lives up to its name, and it was included as the standard sort for the C and C++ libraries until the more complicated hybrid sorts timsort and introsort became more common.

The third sorting algorithm is MergeSort, an algorithm which is a strictly n log n sort. Mergesort works by recursively splitting the list in half until you are left with lists of length 1, and then merging the lists so that the elements within the list are sorted, until you are left with one sorted list. The splitting once again takes log n steps, and then the merge takes n comparisons for each split up list, leading to a NlogN efficiency. As the lists are always split in half, the process is guaranteed to only make order of logN splits.

Wednesday, November 13, 2013

Evil sort

I found a sort I find quite impressive. It claims to be (n^2)!. While I haven't tried to verify the math, I doubt my attempts would be better than the authors, I did try running this search to sort the recommended max of seven integers. Just posting for the interest of others.

More on it here:
http://richardhartersworld.com/cri_d/cri/2001/badsort.html#evilsort

Tuesday, November 12, 2013

And now on my first disagreement with python

I understand that this is probably is nearly cliche in csc 148. People either hate on python or valiantly defend python. Well, I have always been on the dislike side, but I actually have a (almost) valid reason now.

While doing the final exercise, I ran into one of the downsides of python, the idea of not completely encapsulating the attributes and forcing getter and setter functions to be written. I was using a version of the BST file that was posted in the lab notes, while the version expected by the exercise auto mark was the one posted on the course homepage. The object stored at each node in one tree was named item, while in the other it was named value. So the automaker was looking for the object under a different name than it was stored. As the error wasn't part of the code I wrote, you can imagine how long it took me to find it.

Here is the place where I can argue explicit getter and setter function would have done me lots of good. Instead of having to notice a difference of a one word attribute in the code, I would have seen multiple lines of code defining a function that would have differed. I want to argue that two function definition point out the difference in the API of the object much more explicitly.

While I understand proper documentation of the BST class would have solved this error, and it would still have been possible in a more strictly encapsulated language, the face still remains that in this case, one word in the code caused me an hour of headache before, with help from a classmate on piazza, who noticed I was referring to the object by a different name in my own comments, I was able to catch my mistake and save my final exercise mark.

PS Yes this post is delayed. It started as a draft the night of the exercise and regretfully was only finished now.

Tuesday, October 29, 2013

Assignment 2 reflection

So, having finished up assignment two around thursday last week, I think its about time to write up a reflection onto this blog. Overall, I think assignment two was valuable as it enhanced my way of thinking about how to represent simple patterns or sequences in a way that is unambiguous to a machine that knows nothing about human conventions.

In combination with the reading this week, Assignment 2 really taught me how to think of statements such as pattern matching or mathematical formulas in terms of trees. More specifically, it is often ambiguous the order one must preform operations such as the binary comparisons of assignment 2 or the binary operators of simple arithmetic when the statements are written down in our standard form. From the statement, 2+3*5, the order of operations is not contained in the statement. So we need to learn separately the order of operations, and this is why schools teach us bedmas. However, consider the following binary tree:

Sum( 2, Product(3,5) )

The order in which these operations come is unambiguous. Binary trees are a powerful and natural tool to make an implicit order of binary operations explicit, and trees in general allow representation of both hierarchy and data.

Thursday, October 17, 2013

Recursion


In my opinion, recursion hails from the same neighbourhood as OOP. Both are abstractions designed to allow computer code to be designed cleaner and easier for human being to understand. However, recursive code is really just a way of paraphrasing a long series of operations out computer does in it's usual iterative, procedural manner.

Recursive code is code that solves a problem by braking it down into smaller subproblems with identical structure, and repeating the breaking down until it reaches a trivial base case. The code then can return the answer to this base case, and from this answer build up the answer to the original problem. It is a mirror of mathematical induction, the act of taking a problem indexed by the natural numbers, proving it is solve for 1, and then proving that the solution for any number implies the solution to the next. Recursion uses an inductive solution but runs through it backwards.

I have to say, recursion and induction are beautiful concepts in their own right, but their usage should be applied properly in programming. While induction is elegant and quick at proving the validity of an argument, recursion has to use the validity of that argument to compute an value. It is at this point that recursions detriments start to show.

A recursive function calling itself with a smaller case has to create another entry in the call stack of your program. This takes both time and memory. While for smaller cases this isn't necessarily a big detriments, recursive code that had to call a thousand or two thousand copies of itself to solve a case has to deal with the overhead of creating memory for this function call 1000 times. If this is a problem where an iterative solution with a integer used as a counter is possible, one can see how decrementing a counter is much faster and more memory efficient that creating an entry in the call stack.

Therefore, I believe recursion is a very powerful tool to understand a problem, and to write code that makes coincides with our human process of solving the problem. However, the elegance of recursion takes a load off of us and transfers it onto our computers.

OOP

Object Oriented Programming is, in my opinion, is simply a useful tool. However, I disagree with the idea of us being taught OOP from the very first courses in high school and university.

OOP is a method of organizing code into logical units called objects. Objects are supposed to reflect actual real items, ones which we would like to poke, and be able to observe. An object knows everything about itself, all the data that determines what it is, and an object knows how people could interact with it. However, it is a black box in how it manages to respond to our pokes. If we ask a table to record a value, how it stores that value does not matter to us, all we need is a key to later retrieve it with.

OOP has its benefits. It makes designing our code closer to the way we design actual objects. If we were designing a box, we would have a way to put things in it and take things out. So we makes those methods. The box also need to contain whats in it, so when we move the box, we move whats in it. So we give it that field. This is intuitive to think about, and we abstract away from the machine underneath our code.

However, I believe this is not the right way to learn computer programming. While it is an easier and more intuitive way to write code, I believe that writing code without an understanding of what goes on underneath in the machines most basic actions is not entirely correct. While I am not saying one should start programming with machine code, or with assembly, I believe a low level language like C would be a good place to start, and make the basic concepts behind a running program more accessible to delve into from that base, letting us understand the structure behind many things, such as abstract data types, the idea of linked lists, how memory is managed by the computer and our operating systems, the speed of specific operations done by our computer, the troubles of floating point arithmetic, and all these concepts that an OO interpreter takes care for us and we take for granted.

In a way, OOP is a way for the programmer, a translator between normal language and computer language, to think closer to our normal physical world than the world of binary digits and manipulations  of 0s and 1s.

First Post

So Assignment One and Test One are over, finally time to start this slog. I think appropriately the first post will be a reflexion on those two assessments.

So far I believe this course to be quite fair. Assignment one seemed momentous at first, but eventually I realized that the assignment sheet had all the computational problems done for us. It was simply a matter of realizing, after a hint from Professor Heap, that our computer could not only be programmed to move the blocks for us, but to calculate the best way to move the blocks as well.

After that point the assignment became an exercise in the thing I dislike most in computer programming, implementing an animation onto the GUI. After many false starts in trying to find an elegant solution to solve the puzzle whilst animating it, I gave up, and settled to having an algorithm compute the moves before the GUI started, and then simply read off the list when animating.

What surprised me the most about the assignment was, however, that I ended up using both the stacks and queues we implemented during the labs. It was nice how the disjoint components of the course ended up coming together.

The test, however, was another story. It pitted my lack of knowledge of python against my knowledge of simpler programming languages. Needless to say, not knowing python definitely did me harm.

If I had known the more about the standard libraries of python, I would have known there is a way to join lists of string using a str.join() method, saving me marks by preventing the errors I made in my code. However, it would also have saved me the trouble of implementing my own sorting algorithm in python, saving me much stress on the third question and maybe allowing me to put more insight into the other problems.