Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: software development

202: The XY Problem

The ambiguously-named XY Problem is a meta-problem at the intersection of communication and problem solving, common in places like technical support and Stack Exchange. Essentially, it’s what can happen when you ask how to implement a chosen solution to a problem rather than ask how to solve the problem itself.

Suppose a person A is trying to solve a problem X, and is attempting to solve it via another problem Y. (An equivalent view is that A is trying to do X by doing Y.) To this end, they ask person B for help with solving problem Y, but do not give B the context of why they want to solve Y, i.e. what they intend to use Y for.

Now suppose solving Y is an inefficient or otherwise bad way of going about solving X, or maybe not a valid way at all. B has no idea of this and will nevertheless waste time and possibly other resources helping A solve Y, which may or may not do any good in the end.

[If this is too abstract, there are a lot of good examples in this post.]

Clearly the better meta-solution here is for A to give the context of why they want to solve Y, thereby allowing B to infer that the real problem is X and instead helping A solve that.

Moral of the story: context is important when asking for help. If you’re the asker you should try and provide it, and if you’re the asked you should ask questions to make sure you’re really dealing with the root problem.


201: ‘git add -A’ vs. ‘git add .’

Both of the git commands “git add -A” and “git add .” stage all modified files, including new and deleted ones. The difference between them is in their directory scope. “git add -A” includes files higher than the working directory if they belong to the same git repository, while “git add .” will ignore these and only include those in the working directory.

193: Fast-Forward Merge (Git)

In Git, a fast-forward merge is one in which the checked-out branch is an ancestor of the target branch being merged. That is, it’s the straightforward case of the command \texttt{git merge <branch>} where \texttt{<branch>} is simply ‘ahead’ of \texttt{HEAD} (the currently checked-out branch) and there is no real branching point in between the 2 commits in the tree. In a fast-forward merge, the current branch will just be reassigned to point to \texttt{<branch>}, essentially ‘fast-forwarding’ it.

The effect here is actually equivalent to \texttt{git branch -f <current-branch> <branch>}, since it just changes the \texttt{<current-branch>} pointer.

By contrast, a non-fast-forward merge is one in which the checked out branch is not an ancestor of \texttt{<branch>}, so it actually contains changes which need to be incorporated into the merge. In this case, a new commit incorporating the changes in both branches is created, and the current branch is re-pointed to this new commit.

Incidentally, to prevent overwriting commits, Git won’t let you push to a remote repository if the resulting merge in the remote is non-fast-forward. This is precisely the reason you often have to pull from a remote before pushing to it, if the remote has been modified since you last pulled from it.

180: Software Aging and Rejuvenation

Software aging is the degradation of a software system’s ability to perform correctly after running continuously for long periods of time. Common causes include memory bloating, memory leaks, and data corruption.

To prevent undesirable effects of software aging, software rejuvenation can be done proactively. This can take many forms, the most well-known of which is a simple system reboot. Other forms include flushing OS kernel tables, garbage collection, or Apache’s method of killing and then recreating processes after serving a certain number of requests.

168: Memory Leaks

In computing, a memory leak occurs when a program fails to release memory which is no longer required. For instance, a C program where the programmer dynamically allocates memory for a task but forgets to de-allocate it.

In certain circumstances, memory leakage may not cause big problems or might even be asymptomatic, like in programs with a little memory leakage which never actually run for long enough periods of time that memory usage becomes a problem. However, in worse cases memory leakage can lead to the complete failure of a program.

A couple types of programs at high risk for memory leakage are those in embedded systems (which can run continuously for years at a time, giving even small leaks the potential for being problematic) and those where memory is allocated extremely frequently for one-time tasks (such as rendering in a video or game).

While many languages like Java have built-in garbage collection which automatically cleans up unused or unaccessible memory, other languages like C and C++ require the programmer to be on top of making sure these leaks don’t happen in the first place. However, there are memory management libraries and memory debugging tools written to help with these languages. It should also be noted that automatic garbage collection comes with a performance overhead, so whether it’s desirable varies depending on the situation.

163: Iron Law of Performance

In the study of computation, the Iron Law of Performance refers to the equation

\frac{\text{time}}{\text{program}} = \frac{\text{time}}{\text{cycle}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{instructions}}{\text{program}},

which is just a way of roughly compartmentalizing the factors involved in determining how long it takes a program to execute.

While the first 2 terms depend on the instruction set architecture (ISA), microarchitecture, and base hardware technology involved, the 3rd depends mostly on things like the source code, the language the source code is written in, and the way the code is compiled.

160: Soak Testing (Software)

In software development, soak testing is a form of load testing where a system is exposed to some ‘normal’ (anticipated) load for a relatively long period of time to expose issues that would develop under these circumstances. For instance, certain problems like memory leaks in a system under normal usage might take a long time to have noticeable effects. The period of time required or desired varies depending on specifics, but some companies have been known to perform soak tests up to several months long. (Contrasting with the rapid deployment cycles of many systems.)

Ideally, one would be able to vary the load on the system in order to better simulate the fluctuations that it would encounter in deployment, instead of just using a flat average expected load. If this isn’t possible, one might instead set the flat load to one that’s higher than expected in order to get a better simulation. (“Passed tests give you no information.”)

158: Stress Testing (Software Development)

In software testing, stress testing is generally testing software by subjecting it to a ‘heavier load’ than it would ordinarily be expected to encounter during ‘normal’ usage. It differs from load testing in that its focus is less on controlled variations in load and more on chaotic, unpredictable causes of stress to a system.

This can be advantageous because it’s not always possible to forsee all the different circumstances in which the software may be used in the future. For instance:

  • Operating systems and middleware will eventually have to support software which hasn’t even been written yet.
  • Users may run the software on systems with less computational resources than those on which the software was tested in development.

In some cases, stress testing over a short period of time can also serve as a good simulation of ‘normal’ use over a long period of time, potentially exposing bugs with effects that take time to become noticeable. Additionally, stress testing is good for finding concurrency problems like race conditions and deadlocks.

128: Git Commit Message Subject Conventions

Some conventions that should be followed when writing the subject of a commit message in Git:

  • Separate the subject from the body (if it exists) with a blank line
  • Keep the length at no more than 50 characters
  • Capitalize the first letter, don’t end it with a period
  • Use the imperative mood. For instance, “Update getting started information” instead of “Updated getting started information”, “Refactor X for readability” instead of “Making X more readable”.

Additionally, a commit message subject should convey information about not only what was done but also why.

127: Linters

Linters in the modern sense generally refer to programs which check code for potential errors (or even just style errors) in a static context (without running the code). Potential such errors might include syntactic errors, “variables being used before being set, division by zero, conditions that are constant, and calculations whose result is likely to be outside the range of values representable in the type used.” [Wikipedia]

The name comes from a program called Lint that performed this function for C code, but has since become a more general term.

Linting can be helpful for debugging interpreted languages like Python, since they don’t have a compilation phase where things like syntactic errors would be caught.