About a year ago, I began giving some serious thought to an article named Functional Complexity Modulo a Test Suite by Reg Braithwaite. Today, I think I have something to say on the matter.

### Background

Suppose you’re a programmer embracing the trends of test or behavior driven development. You have a problem to solve and expectations on how the solution should behave. So, you think about the problem, enumerate the behaviors and write some tests to model this version of reality. You start writing code to satisfy these tests. Red becomes green, you refactor clumsy first drafts into terse and expressive statements that are almost a pleasure to read, when without warning, four years of undergraduate mathematics grab ahold of your brain and thrust you into theoretical domains. Your output of practical, working code is halted for the day, while you begin contemplating ways to measure “readability”, “maintainability”, “complexity” and how these ideas fit within the framework you have spent the past few years operating within. You immediately regret taking that second major in mathematics, but it’s an integral part of you now. You know you can’t run from it, so you do your best to appease it by firing up your favorite text editor, ensuring that your blog is running MathJax, and whipping out your fu!

### Some Preliminary Definitions

The following definitions have been directly derived from the original article linked above.

For our purposes, a program, , is a function of inputs that returns some kind of output. We will use to denote the set of all programs.

A test, , is a function that maps a program, , to either 1, for true, or 0, for false: .

A program, , is said to satisfy a test, , if and only if . We will represent this relationship with the symbol :

A test suite, , is a set of tests. We denote the number of tests in a suite as . A program is said to satisfy a test suite, if it satisfies each of the tests in . More formally, if, and only if .

We say two programs are functionally congruent modulo a test suite when they both satisfy the same test suite. When there is no danger of ambiguity, we may simply refer to two (or more) programs as being “congruent.” We represent this notion of congruence as follows:

Consider all of the programs that might satisfy the test suite. There are an infinite number of programs in this set (a trivial proof of this statement can be found in “Example Programs and Calculations.”) Now, let’s suppose we have a metric, , that measures the size of a program. As a trivial example, let’s suppose is the number of characters in the string representation of program . We could use a more interesting measurement, but for now let’s stick with the simple “string length” measurement. We take this measuring stick and apply it to each of the programs in the set of programs that satisfy a given test suite, and record the “shortest length” measured. This “shortest length” is the satisfaction complexity of the test suite. Given a metric for a program , the satisfaction complexity of a test suite, , is given by,

And finally, we define the functional complexity of a program, modulo a test suite, , to be the satisfaction complexity of the test suite, , if and only if the program satisfies the test suite. Alternatively,

The important point to take away from this is that every program that satisfies a given test suite has the same functional complexity relative to that test suite.

### Example Programs and Calculations

Suppose we want a program that takes the sum of all of the integers from
1 to . We begin by writing some tests (note: I am going to
make use of a hypothetical `assert`

function that returns 1 if the block it
is given evaluates to true, and 0 otherwise):

```
def test_1 program
assert { program.call(10) == 55 }
end
def test_2 program
assert { program.call(6) == 21 }
end
def test_3 program
assert { program.call(83) == 3486 }
end
```

Our test suite consists of three tests, each checking that our program produces the correct sum for distinct values of . As mentioned earlier, there are an infinite number of programs that can satisfy this test suite, and here is the trivial proof:

```
def sum_with_while k
i = 1
sum = 0
while i <= k
sum += i
i += 1
end
sum
end
def sum_with_while_1 k
unused_local = 1
sum = 0
i = 1
while i <= k
sum += i
i += 1
end
sum
end
def sum_with_while_2 k
unused_local = 2
sum = 0
i = 1
while i <= k
sum += i
i += 1
end
sum
end
```

Hopefully, the pattern is obvious: the program `sum_with_while_<x>`

will set
`unused_local = <x>`

. This assignment adds nothing of value to the program,
but the program still returns the appropriate result, and thus satisfies the
test suite.

Now, let’s consider a few non-trivial variations:

```
p1 = lambda do |k|
i = 1
sum = 0
while i <= k
sum += i
i += 1
end
sum
end
p2 = lambda do |k|
a = b = 0
while a < k
b += (a += 1)
end
b
end
p3 = lambda do |k|
(1..k).inject { |sum,i| sum + i }
end
# Only when Symbol#to_proc is available (eg: Ruby 1.8.7+)
p4 = lambda do |k|
(1..k).inject(&:+)
end
# Only if Enumerable#sum is available (eg: active_support)
p5 = lambda do |k|
(1..k).sum
end
p6 = lambda do |k|
k * (k + 1) / 2
end
p7 = lambda do |k|
case k
when 10 then 55
when 6 then 21
when 83 then 3486
end
end
```

Each of these programs, , satisfy our test suite.
Programs and are very similar, but with
some variables renamed and some operations switched about. Programs
, , and are also
similar: removes some verbosity by using Ruby’s
`Symbol#to_proc`

method while makes use of ActiveSupport’s
`Enumerable#sum`

method which in turn calls `inject`

. The program
calculates the sum analytically without iteration while
program provides results only for the values tested for.

We will ignore the variable assignment and line ending characters when
calculating the length of these programs. For example, when calculating the
length of , we count only the characters in “lambda do |k|”
(13 characters), “ k * (k + 1) / 2” (17 characters, including the
two leading spaces), and “end” (3 characters), for a total of 33 characters.
We ignore line ending characters (eg: `\n`

) because they aren’t readily
visible. Why make the process of verifying these numbers more tedious than
it already is?

Below are the lengths of each of the programs based upon this method of counting characters:

We see that is the shortest of our programs, weighing in
at 28 characters. One could argue that the size of is
misrepresented because the `sum`

method is not a native Ruby method, and we
could address that concern by adding the length of the definition of `sum`

to
. When measured in the same way as our programs, the
`Enumerable#sum`

method found in ActiveSupport 3.0.7 weighs in at 112
characters, so let’s tack that on, . Ensuring that
the metric accounts for the program’s definition as well as any external
dependencies keeps our measurements meaningful. Otherwise, we could create
a very small program that satisfies our test suite:

```
p8 = lambda do |k|
p1[k]
end
```

If our notion of size does not account for the size of all external dependencies, our metric (and that which we intend to build upon it) loses nearly all utility.

Taking into consideration the adjustment made to , the “smallest” example program that satisfies our test suite is . It is entirely possible that there exist programs even smaller that also satisfy the suite, but given that we can conclude

So, we only have an upper bound on ? Close enough!

### Kolmogorov Complexity and Functional Complexity

One way of measuring the complexity of a string is by measuring its Kolmogorov complexity. A quick overview of the process, taken straight from the linked Wikipedia article, is to take a string:

```
abababababababababababababababababababababababababababababababab
```

and search for a smaller representation of the it:

```
ab repeated 32 times
```

The first string has 64 characters, the second string has only 20. Our simplification of the original string may not be minimal, but it is substantially smaller, which suggests that the original string was not very complex. The actual Kolmogorov complexity of a string is the size of its minimal representation in some fixed universal description language. Provided that our language of choice is Turing complete, our measurements will vary from some other choice in language by a fixed constant. So, let’s pick Ruby.

`'ab'*32`

We now have a representation of our original string that is only 7 characters long. This representation may still not be minimal, but as with Functional complexity, we now have an upper bound — the Kolmogorov complexity of the original string in Ruby is at most 7. It’s been a while since I’ve thrown down some , so if we let represent the Kolmogorov complexity of a string, , and represent a minimal description of in Ruby, then:

So, unsurprisingly, or original string is really not that complex, it can be greatly compressed, and it is very “un-random.” All three of those statements are roughly synonymous. Now, what is the relationship, if any, between Kolmogorov complexity and Functional complexity modulo a test suite? Kolmogorov complexity deals with individual strings whereas Functional complexity deals in programs that satisfy a test suite, so to find a relationship between the two, we need to get them both working in the same domain. In our example test suite, we have a pretty simple mapping between input and expected output that can be represented by many different strings. Here’s one:

```
"10=55,6=21,83=3486".split(',').each_with_index do |io, i|
i, o = io.split('=')
define_method :"test_#{i+1}" do |p|
assert { p[i.to_i] == o.to_i }
end
end
```

This test suite builder weighs in at 159 characters (excluding `\n`

characters) compared to 169 characters in our original test suite. We could
use a regular expression instead of multiple calls to `String#split`

:

```
"10=55,6=21,83=3486".scan(/(\d+)=(\d+)/).each_with_index do |(i,o), k|
define_method :"test_#{k+1}" do |p|
assert { p[i.to_i] == o.to_i }
end
end
```

to bring our length down to 147 characters. The best part of this little
excursion is that none of the numbers I’ve thrown at you are important, I just
like to count things. What really matters is that you now see how
`10=55,6=21,83=3486`

serves as a complete representation of our original
test suite. We want to measure this string’s Kolmogorov complexity.

None of the programs we’ve written to satisfy our test suite will generate the string we’re now after, so we need to wrap them in a loving adapater:

```
adapter = lambda do |p|
[10,6,83].map { |i| "#{i}=#{p[i]}" }.join ','
end
```

Excluding line endings — Do I need me to keep repeating that? Let’s assume it’s implied from here on out — our adapter is 63 characters long. We can now represent one 20 character string as a string of at least 63 characters. In terms of compression, we’re doing it wrong, but fret not, for soon things will get better. In the meantime, we now have a program that takes old programs and turns them into new programs capable of producing the string we seek: . Taking as an example, we can do it all in Ruby with roughly 128 characters. Hopefully it is clear that if , then generates our desired string.

Now comes the improvement to our compression fail: when measuring Kolmogorov
complexity, we are free to pick our language. Instead of using Ruby, we
could use Python or Pascal. We are also free to pick a superset of Ruby, say
Ruby + `adapter`

. We’re going to be little pickier than that, though. Our
language of choice is the one where every program written is fed into the
adapter. With this restriction, the program:

`lambda { |*_| "10=55,6=21,83=3486" }`

will not produce the desired string. Instead, it will be wrapped by `adapter`

and evaluate to:

```
10=10=55,6=21,83=3486,6=10=55,6=21,83=3486,83=10=55,6=21,83=3486
```

Thus, if our wrapped program, , produces the desired string, then our original program satisfies the original test suite. Combine this with our earlier statement, and we’ve got an equivalence:

where is our encoded test suite `10=55,6=21,83=3486`

. Thus,
given a minimal description, , of our encoded test suite
in this adapter wrapped Ruby language, we can infer that
(perhaps with the help of `eval`

.) We also
know that , since is
minimal in length. So, gives us our Kolmogorov
complexity and our
Functional complexity*.

From our previous excursions in counting characters and whitespace (but not newlines!), we see that all of the programs that satisfy the test suite are longer than the 20 characters. Bummer, we still fail at compression. But what happens if we add a few more tests? Let’s expand our encoded test suite to the following:

```
10=55,6=21,83=3486,99=4950,1019=519690,9001=40513501,15146=114708231
```

Now our desired string is 68 characters long. We know that , and we have finally un-failed at compression! In addition to finding a representation for the string that is half as long, we also kicked program out of the set of programs that satisfy our test suite. Keeping with the spirit of how satisfies the test suite, we can get it back in line with the following change:

```
p7 = lambda do |k|
case k
when 10 then 55
when 6 then 21
when 83 then 3486
when 99 then 4950
when 1019 then 519690
when 9001 then 40513501
when 15146 then 114708231
end
end
```

In adjusting to satisfy the new test suite, we have increased its length to 175 characters. So when our test suite was 20 characters long, satisfied it in 81 characters. When the test suite grew by 48 characters, was forced to grow by 94 characters. Changing a test changes a program… I smell a potentially useful metric in there, and we will dig deeper into this next time.

The chosen problem of summing the integers from 1 to is
certainly a trivial one, and because of its simplicity, encoding the test
suite as a string was also pretty easy. However, with the right `adapter`

we could encode any test suite as a string even if each test spends a lot of
time setting up the initial state before verifying its expectations.

The goal here was to show a relationship between Kolmogorov and Functional
complexity measurements, so we tried to keep the components simple and
manageable. In doing so, we were able to find a direct relationship between
the two by considering a description language that, informally, is the result
of augmenting our host language (Ruby) with the expectations of our test suite
(this is how the `adapter`

was constructed.) With a dash of mathematics and
a pinch of hand-waving, we have shown that if Kolmogorov complexity is a
meaningful measurement of a program then so to is Functional complexity
modulo a test suite.

### A Taste of Things to Come

I had intended to explore the relationship between these measures of complexity and somewhat vague notions such as “readability” and “maintainability” in this article, but it’s already quite a bit longer than I had anticipated. I will definitely be talking about “maintainability” next time: I believe there is a relationship between how much code must change when test suites are modified, but I need some time to think more about the math (I may have to whip out calculus or difference equations.) I would also like to explore “readability,” though I may refer to it as “comprehensibility” instead, by inverting a lot of the work done in this article, but I may have to turn this series of articles into a trilogy to do so. Stay tuned… or don’t, I’m still going to write it anyway!

### Foot Noted

#### Note: Approximate Functional Complexity

I played a little fast and loose with notation earlier. The description
string, , is just that a *string*. As I mentioned, we may
need the help of `eval`

to map it from the world of characters to the domain
of programs, so may be a few characters longer than
. However, the number of additional characters is
constant, so I’m okay with saying the two measures of complexity are roughly
equivalent.
[ jump back ]