Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: programming

205: More Haskell Syntax

Continuing to familiarize myself with the basics of Haskell syntax, so here are some of the oddities thereof I came across today:

  • if statements are expressions, and thus must evaluate to a value, and thus require an else clause. Contrast this with imperative languages like Python and Java where if clauses are instructions, not expressions, and the else bit is optional.
  • Lists are homogenous, which isn’t that weird but is worth noting. For instance, you can’t have ints and strings in the same list. Also [1, 2, 3.0] will evaluate to the list [1.0, 2.0, 3.0]: since the list has to be homogenous the ints must be converted to equivalent floats.
  • There’s a really nice consistency in the syntax between variable definition (x = 0) and function definition (f x = 2*x). Both x and f are just expressions dependent on 0 or more arguments after all, so Haskell syntax reflects this view that values are just 0-ary functions.
  • ++ is used for list concatenation, e.g. [1, 2] ++ [3, 4] evaluates to [1, 2, 3, 4]. You can use sloppy infix notation without parentheses, so that "f" ++ "o" ++ "o" is a valid expression resulting in the string "foo". Concatenating lists s and t with s ++ t takes O(m) time, where m is the length of s. (Haskell walks through the entire first argument to ++.)
  • : is the symbol for the cons operator. For instance, 1 : [2, 3] evaluates to the list [1, 2, 3](and in fact [1, 2, 3] is just syntactic sugar for 1 : 2 : 3 : [].

 

 

Advertisements

204: Some Introductory Haskell Syntax

Picking up Haskell again. I taught myself a little bit of its syntax years ago, but I’d forgotten all of it so I’m starting from scratch. Here’s some of the basics of Haskell syntax I learned today:

  • /= is used for ‘not equal’.
  • &&  is used for logical AND, || for logical OR, but not is used for logical negation.
  • single quotes denote characters, while double quotes denote strings. For instance, 'h' == "h" results in error because you can’t compare the types Char and [Char].
  • You need to surround a negative number by parentheses (e.g. (-5)), otherwise the compiler thinks you’re trying to apply subtraction.
  • 5 + 4.0 is a valid expression because 5 can be treated as a float. However, the converse isn’t true: floats can’t act as ints in the same way.
  • Function application is typically in prefix notation and doesn’t require (or allow) parentheses surrounding arguments, as is common in other languages. For instance, min 4 5 is a valid expression and results in the value 4.
  • Function evaluations like that of min also have the highest priority in arithmetic/logical expressions. For instance, succ 3 * 10 gives 40 while succ (3 * 10) gives 31.
  • For convenience and readability, functions with exactly 2 arguments can be called in infix notation if the function name is sandwiched by backticks. For instance, div 9 4 and 9 `div` 4 both return 2.

 

 

 

203: The Unit Type

In type theory, a unit type is a type which can take on only one value. With its underlying set being a (the) singleton set, values of this type contain no information within its scope. (Contrast with the boolean type, which has 2 values, and where the value T does carry information because it is not F.)

This doesn’t mean the type itself can’t carry information in a larger scope, however. For instance, in programming languages it’s common to have a unit type and use its single value for representing a lack of a value. Python has the ‘None’, Java has ‘void’/’null’, and Haskell has ‘()’, to give a few examples. This value is useless within the scope of its type, but useful when other types are in scope.

198: Abstract Classes in Python

While Python doesn’t have a default, out-of-the-box option for abstract classes like other languages such as Java, there are several ways to achieve the equivalent: a class which can’t itself be instantiated but has associated methods which subclasses must implement.

In Python 3.4+, you can use the \texttt{abc} (Abstract Base Class) module:

from abc import ABC, abstractmethod
class Abstract(ABC):
    @abstractmethod
    def foo(self):
        pass

The class inheriting from the \texttt{ABC} class guarantees it can’t itself be instantiated, and the \texttt{@abstractmethod} decorator requires subclasses implement the method.

In Python 2 and 3.0+, the \texttt{abc} (Abstract Base Class) module can still be used, but its use in each is a little more complicated:

# Python 3.0+
from abc import ABCMeta, abstractmethod
class Abstract(metaclass=ABCMeta):
    @abstractmethod
    def foo(self):
        pass
# Python 2
from abc import ABCMeta, abstractmethod
class Abstract:
    __metaclass__ = ABCMeta

    @abstractmethod
    def foo(self):
        pass

An alternative to using \texttt{abc} which is less ‘proper’ in a sense but also less complicated to set up is to do

class Abstract(object):
    def __init__(self):
        raise Exception('Abstract is an abstract class and can't be instantiated')
    def foo(self):
        raise NotImplementedError('subclasses must override foo()!')

class Derived(Abstract):
    def foo(self):
        print 'Foo'

so that an Exception is raised when you try to directly instantiate the super class, and if the subclass doesn’t implement the abstract method then it’s super’s implementation will be called and will raise a \text{NotImplementedError}.

[Taken from stackoverflow.]

197: Instance, Class, Static Methods in Python

In Python, a method in a class can be either an instance method, class method, or static method.

Instance methods are the most common type of method and require no decorator to mark them as such. (The typical function with no decorator within a class is an instance method.) They require an associated instance of a class and as such require the additional argument \texttt{self} when they are called. Because they ‘belong’ to an instance, they also know their associated class.

Class methods, by contrast, do not have an associated instance but do know their associated class. They are designated with the \texttt{@classmethod} decorator and require an argument for their class, which is automatically passed when they are called from the class with dot notation. (e.g. \texttt{MyClass.my\_method()}.)

Static methods not only don’t belong to an instance, but don’t know anything about their associated class. When called from their class (e.g. \texttt{MyClass.my\_method()}), there’s no arguments automatically passed regarding information about the class. In this sense they’re pretty much just functions that are attached to the class for convenience. Static methods are designated with the \texttt{@staticmethod} decorator.

 

194: Array Stride

In an array, the stride is the space between adjacent elements. More precisely it is the number of memory locations in between the beginning addresses of 2 adjacent items in the array. An array with stride exactly equal to the size of its elements is said to have unit stride.

Stride can’t be smaller than the size of the elements in the array for obvious reasons, but can be bigger. In fact, despite the immediate reduction in space efficiency, it’s often useful to make the stride larger than the element size, e.g. for more efficiency in multi-dimensional arrays.

193: Fast-Forward Merge (Git)

In Git, a fast-forward merge is one in which the checked-out branch is an ancestor of the target branch being merged. That is, it’s the straightforward case of the command \texttt{git merge <branch>} where \texttt{<branch>} is simply ‘ahead’ of \texttt{HEAD} (the currently checked-out branch) and there is no real branching point in between the 2 commits in the tree. In a fast-forward merge, the current branch will just be reassigned to point to \texttt{<branch>}, essentially ‘fast-forwarding’ it.

The effect here is actually equivalent to \texttt{git branch -f <current-branch> <branch>}, since it just changes the \texttt{<current-branch>} pointer.

By contrast, a non-fast-forward merge is one in which the checked out branch is not an ancestor of \texttt{<branch>}, so it actually contains changes which need to be incorporated into the merge. In this case, a new commit incorporating the changes in both branches is created, and the current branch is re-pointed to this new commit.

Incidentally, to prevent overwriting commits, Git won’t let you push to a remote repository if the resulting merge in the remote is non-fast-forward. This is precisely the reason you often have to pull from a remote before pushing to it, if the remote has been modified since you last pulled from it.

184: TreeMap vs. HashMap vs. LinkedHashMap (Java)

In Java, TreeMap, HashMap, and LinkedHashMap are all implementations of the Map interface, and are all just different ways of mapping keys to values. Importantly, though, they differ in their lookup and insertion times, as well as ordering they maintain.

  • HashMap is implemented as an array of linked lists, and is the implementation of a hash table most people are probably most familiar with. It offers O(1)  lookup and insertion, but doesn’t maintain any ordering of its keys.
  • TreeMap is implemented as a red-black tree, and so offers O(\text{log} N ) lookup and insertion. However, its keys are ordered according to its keys’ implementation of the \texttt{Comparable} interface.
  • LinkedHashMap is implemented as a linked list and like HashMap offers O(1) time, but in addition maintains ordering of its keys according to the order in which they were added to the map.

When to use which? If you need to retrieve keys ordered by their implementation of \texttt{Comparable} (perhaps when the keys are Integers or Strings), TreeMap can do that. If you need to retrieve keys ordered by their insertion time (as in a caching system), LinkedHashMap can do that. If neither of these orderings is needed, HashMap is the best go-to, being typically faster with less overhead than the other two options.

182: final (Java)

In Java, the \texttt{final} keyword has several different uses depending on whether it’s used on a variable, method, or class. In all uses, it’s generally used to prevent something from being modified.

A variable declared \texttt{final} can’t have its value modified after declaration. When the variable is primitive, this is straightforward; \texttt{final int x = 5} will always have the value 5. If the variable is a reference type, however, it being final just means it will always point to the same object. The object itself, if mutable like a list, can still change.

A method declared \texttt{final} can’t be overridden.

A class declared \texttt{final} can’t be extended.

181: Vectors (Java)

The Vector class in Java is essentially a legacy version of ArrayList (ArrayList was actually the Java 1.2 Collections replacement for it) which by default synchronized each individual operation. While this makes the structure automatically thread-safe (unlike ArrayList), the better design is to lock sequences of operations rather than individual operations. The result is that in addition to not providing any control over synchronization, the Vector class performs considerably worse than alternatives, such as using Collections.synchronizedList to decorate a non-thread-safe collection such as ArrayList for concurrency. Thus it’s generally recommended that use of the Vectors class be avoided.