Tag Archives: programming

Nonintrusive concurrency with Clojure

最近对Clojure比较感兴趣,写了一个quick sort做测试。

单线程版:

(defn qsort-seq [seq]
  (if (<= (.length seq) 25)
    (selection-sort seq)
    (let [[first-half second-half] (qsort-partition seq 0)
          sorted-first-half (qsort-seq first-half)
          sorted-second-half (qsort-seq second-half)]
          (concat sorted-first-half sorted-second-half))))

多线程版:

(defn qsort [seq]
  (if (<= (.length seq) 25)
    (selection-sort seq)
    (let [[first-half second-half] (qsort-partition seq 0)
      sorted-first-half (future (qsort first-half))
      sorted-second-half (qsort second-half)]
      (concat @sorted-first-half sorted-second-half))))

可以看到两个版本基本是一样的,所以在Clojure里通常可以很快把单线程的程序并行化。下面是一些调用到的函数:

(def random (new java.util.Random))

(defn gen_rand_ints [n]
  (loop [result [], i n]
    (if (zero? i)
      result
      (recur (conj result (.nextInt random)) (dec i)))))

(def values (vec (gen_rand_ints 10000)))

(defn swap [vec i j]
  (let [iv (vec i),
    jv (vec j)]
    (assoc vec i jv j iv)))

(defn selection-sort [seq]
  (defn index_of_smaller [seq i j]
    (if (< (seq i) (seq j)) i j))
  (loop [seq- seq start 0]
    (if (< start (.length seq))
      (let [index_of_smallest_v
            (reduce (partial index_of_smaller seq-)
                    (range start (.length seq-)))
            new_seq (swap seq- start index_of_smallest_v)]
        (recur new_seq (inc start)))
      seq-)))

(defn qsort-partition [seq pivot_index]
  (let [pivot (seq pivot_index)
    pair (loop [i 0, le [], gt []]
           (if (< i (.length seq))
         (if (<= (seq i) pivot)
           (let [le (cons (seq i) le)]
             (recur (inc i) le gt))
           (let [gt (cons (seq i) gt)]
             (recur (inc i) le gt)))
         [(vec le), (vec gt)]))]
    pair))

比较一下两个版本排列10000个数的时间:

(time (qsort-seq values))
(time (qsort values))
(shutdown-agents)

输出

"Elapsed time: 1532.536441 msecs"
"Elapsed time: 1121.529135 msecs"

C++ Pragmatics: Testing (Part 2, Mechanics)

Jelly fish

There are many tools for testing, and there are different kinds of testing at different granularities, for different phases of the development cycle.

For tests of the largest granularity, such as acceptance tests, the tools and methods are usually language neutral. For example, you can test a Ruby web application using webrat or watir, and you can use the same set of tools to test a C++ web server. In this section, I’m just going to focus on small tests, in particular, tests of individual methods and classes.

There are a number of test frameworks for C++. Boost has a test library, Qt has a unit test library, and Google also open-sourced its C++ testing framework. There are a few others. You could write tests without any of them, but the tools help to reduce repetition and make tests more concise. In my experience this is quite important, because the easier it is to write tests, the more motivated I am to write them.

The testing frameworks all use similar terminology. Usually a test source file contains one or more test cases, and a test case contains one or more tests. It’s customary to write one test case for each class, and one or more tests for each public method. Most of those tools are similar, and it doesn’t really matter which one you use. I like Google’s framework because it’s the least verbose and it’s also what I use at work. Here’s a quick example of one test in a test case for an RSS feed parser:

class FeedParserTest: public ::testing::Test {
  protected:
    void SetUp() {
        feed_.reset(new Feed);
    }

    RssFeedParser parser_;
    shared_ptr<Feed> feed_;
};

TEST_F(FeedParserTest, ParseSlashdot) {
    QFile rss(TEST_DATA_DIR "/slashdot.rss");
    ASSERT_TRUE(rss.exists());
    rss.open(QIODevice::ReadOnly);
    parser_.startNewFeed(feed_);
    while (rss.bytesAvailable() > 0) {
        parser_.append(rss.read(CHUNK_SIZE));
    }
    EXPECT_TRUE(parser_.finished());
    EXPECT_EQ("Slashdot", parser_.feed()->title());
    EXPECT_EQ("http://slashdot.org/", parser_.feed()->site_url());
}

Each test class inherit from testing::Test, and there are two virtual functions you could override to set up and tear down your test fixture. void SetUp() and void TearDown() are called before and after each test respectively. You can also define two static class functions static void SetUpTestCase() and static void TearDown() which are called before and after all tests in the test case respectively. In other words, they are only called once for the test case. Ideally, you should implement the test fixture in SetUp() and TearDown(), because you don’t want to introduce dependencies between the tests. For example, you don’t want to be in a situation where just rearranging the order of tests would break them. SetUpTestCase() and TearDownTestCase() are mainly used for expensive initialization such as reading large amounts of data from disk, which you don’t want to do for every single test.

The TEST_F() macro defines a test with fixture, which has access to the test class defined earlier. There is also a TEST() macro that defines a test without fixture which is essentially just a simple function. The framework provides two families of assertions that you can use in tests. The ASSERT_* family of assertions immediately abort the current test if the expected condition is not true. They are usually used when it doesn’t make sense to continue the test, for example, where there is a NULL pointer that you need to dereference later, or some critical data is missing. The EXPECT_* family of assertions cause the test fail if the condition does not hold, but they allow the test to continue, so that you can find all failing expectations in one run.

There is no need to write a main() function. The testing framework provides a standard main() that you can link with, which runs all all defined tests. The resulting test executable has a few command line options that you can use to filter tests or change output formats. Just run your test executable with the --help flag to see all available options.

(To be continued …)

C++ Pragmatics: Testing (Part 1)

Tree @ Reyes Point

The first time that I truly appreciated the benefit of automated software testing was when I served as the teaching assistant for CS323 at Yale University, taught by Professor Stanley Eisenstat. If you have read Joel on Software, either the book or the blog, you would have heard of Stan and CS323@Yale a few times.

The title of the course was Introduction to Systems Programming and Computer Organization. The student were taught computer systems, UNIX C programming, and some PERL. To quote Dan Andrei Iancu (from his homepage):

The turning point for many wannabe CS majors at Yale, Professor Eisenstat’s class can easily turn into a marathon of sleepless nights and endless frustration… However, the survivors can claim to have coded (among other things) a fully functional C shell, a Perl server, and an LZW compressor.

The students turned in their assignments by placing the source code under a specific path under their home directories on NFS. Stan gave me a sheet of very specific programming style criteria complete with how much weight to assign to each of the 25 or so items. I would read each solution and grade it for style — and for style only, because there is no way to reliably judge the correctness of a program just by reading it.

To check correctness, I wrote acceptance tests for the assignments. For each assignment, I had a PERL script to compile the students’ solutions and then ran them with STDIN and STDOUT redirected to my script with UNIX pipes. The script then went through each test case, feeding specified input and checking if the output was expected. Stan let me design the final project, which turned out to be a good exercise for writing requirements specification. I asked the students to design an old-school multi-user BBS system. The users would login with a Telnet client and can read, post, and reply with commands. I thought It would be a good practice for multi-threaded programming. Some of the programs turned in by students were quite impressive. I tested all solutions in a similar way, using a PERL script to simulate multiple users in the system and test various functionalities.

Interestingly, the students’ programs also served as tests for my scripts. I usually did the assignment myself and made sure my grading script could properly run my solutions and correctly parse the output, but it is hard to make sure that the script was robust enough to telerate minor differences. Sadly, I was not able to pass all the tests. One student wrote her solutions in Microsoft Word. Of course my script was never ready to parse .doc files. Talk about unexpected user behavior!

Enough history. I will start writing the technical content next time.

To be continued…

[Ruby notes] Beware of name collisions with mixed-in modules

This is mostly just note to self, since most Ruby developers probably know this.

The following code snippet:

module TestModule
  def change_name
    @name = "TestModule"
  end

  private
  def private_fun
    puts "private_fun"
  end
end

class TestClass
  attr_accessor :name
  include TestModule

  def initialize
    @name = "TestClass"
  end

  def call_private
    private_fun
  end
end

tc = TestClass.new
puts tc.name
tc.change_name
puts tc.name
tc.call_private

Outputs

TestClass
TestModule
private_fun

It shows that when you include a module, everything is brought into the Class namespace, including private functions and instance variables. Coming from the C++ world, I found this somewhat surprising. This shows one of the practical difference between C++ multiple inheritance and Ruby mix-ins: included modules don’t have its own copy of instance variables, and private functions are visible.

Emacs smart split for programmers

I spend most of my conscious hours in front of Emacs in a terminal window these days, and I share my configuration across all my computers. At work I have a huge monitor, so I split the emacs frame into 3 side-by-side 80-column windows. At home I have a smaller screen with room only enough for two windows. To share the same configuration file, I use the following snippet:

(defun smart-split ()
  "Split the frame into 80-column sub-windows, and make sure no window has
   fewer than 80 columns."
  (interactive)
  (defun smart-split-helper (w)
    "Helper function to split a given window into two, the first of which has 
     80 columns."
    (if (> (window-width w) (* 2 81))
    (let ((w2 (split-window w 82 t)))
      (smart-split-helper w2))))
  (smart-split-helper nil))

(smart-split)

The smart-split function split the emacs frame into a maximum number of 80-column windows. A very portable solution.

JSON++: A JSON parser for C++

I wrote a very simple JSON parser in C++ a while ago. It’s just a pair of header and source files, thus very easy to use (compile the .cc file and link with your source). Check it out here.

Here are a few examples:

string teststr(
        "{" 
        "  \"foo\" : 1," 
        "  \"bar\" : false," 
        "  \"person\" : {\"name\" : \"GWB\", \"age\" : 60}," 
        "  \"data\": [\"abcd\", 42]" 
        "}" 
               );
istringstream input(teststr);
Object o;
assert(o.parse(input));
assert(1 == o.get<long>("foo"));
assert(o.has<bool>("bar"));
assert(o.has<Object>("person"));
assert(o.get<Object>("person").has<long>("age"));
assert(o.has<Array>("data"));
assert(o.get<Array>("data").get<long>(1) == 42);
assert(o.get<Array>("data").get<string>(0) == "abcd");
assert(!o.has<long>("data"));

Caveat: It currently does not support floating-point numbers.

LINQ++: An embeded DSL for C++

LINQ have been the new hotness in Microsoft’s .Net platform. Well, you can have the same syntactic sugar in C++ without writing a new compiler. I wrote a small library (just a short header file right now) that can do some interesting things. (source available at github) The following snippets from the companion unit test shows a few:

// Count the number of people older than 30
cout << from(guests)
        .where(&_1 ->* &Person::age > 30)
        .count()

// combine the people older than 30 with the person with name
// "joe" into one table.
DataSet<vector<Person> > results =
        insert(
                from(guests)
                .where(&_1 ->* &Person::age > 30))
        .into(
                from(guests)
                .where(&_1 ->* &Person::name == "joe"));

// select the age column from the previous table.
shared_ptr<vector<int> > ages = results
                                .select<int>(&_1 ->* &Person::age)
                                .get();

It should work with all STL-compatible sequence containers and requires the boost library. You can chain the clauses to form complicated queries.

I wrote it up on the shuttle from work to home. Hopefully I can find some time to polish it up and make it actually useful.

Coroutine and continuation in C

An article about the EVE Online game and stackless Python got me thinking about how to implement coroutines and continuations in C. After a short research, I found a simple implementation (without using longjmp!).

Coroutines can be thought of as cooperative multithreading, sometimes also known as user threads. Two or more procedures can yield the CPU to each other in the middle of their execution, so that they appear to run concurrently. It’s a very well-known feature in more academic programming languages such as Lisp and Scheme which have the Call/CC (call-with-current-continuation) mechanism. A continuation is a way of representing “the rest of the execution” at a certain point of time.

Here’s the C implementation of two functions that run “concurrently”. First, the required headers:

#include <signal.h>
#include <stdio.h>
#include <ucontext.h>

Because the two routines need to cooperate, they need to communicate with each other. I will just use a global variable to signal whose turn it is. The volatile key word is needed to prevent possible compiler optimization. I will explain later.

// Signals which routine should run.
volatile int g_turn;

Let the first routine print the elements of the array {1, 2, 3} each in a line.

void routineOne(ucontext_t* self, ucontext_t* other) {
    int numbers[] = {1, 2, 3};
    int i;
    for (i = 0; i < sizeof(numbers)/sizeof(int); ++i) {
        printf("Routine one: %d\n", numbers[i]);
        // Call other with current continuation
        if (g_turn != 1) {
            g_turn = 1;
            swapcontext(self, other);
        }
    }
}

A few things to notice here. ucontext_t is a struct that stores a user thread’s execution state, including it’s stack pointers and instruction pointer. We pass two contexts to the function, one for itself and one for the other routine. This is very similar to the continuation-passing programming style in functional languages such as Haskell. The swapcontext() call stores the current state of execution to the first argument and switches to the state stored in the second argument. Notice that in this function, there is only one assignment to g_turn. The compiler does not know the semantics of swapcontext(). To the compiler, it’s just a function, and there’s no way to know it switches execution context. Therefore the compiler can assume g_turn never changes within this function after the first assignment, and replace all subsequent references to g_turn with the constant 1. That’s why we need to use volatile in the declaration.

We let the second routine print the array `{-1, -2, -3}. It’s completely symmetric to the first one.

void routineTwo(ucontext_t* self, ucontext_t* other) {
    int numbers[] = {-1, -2, -3};
    int i;
    for (i = 0; i < sizeof(numbers)/sizeof(int); ++i) {
        printf("Routine two: %d\n", numbers[i]);
        if (g_turn != 2) {
            g_turn = 2;
            swapcontext(self, other);
        }
    }
}

Unlike Stackless Python and most symbolic or functional languages, C is a stack based language. The main function needs to set up a stack for each user thread before starting them.

int main() {
    // Continuations
    ucontext_t cont_one;
    ucontext_t cont_two;
    ucontext_t cont_main;

    // one stack for each thread
    char stack_one[SIGSTKSZ];
    char stack_two[SIGSTKSZ];

    // Initialize the coutinuations.
    cont_one.uc_link = &cont_main;
    cont_one.uc_stack.ss_sp = stack_one;
    cont_one.uc_stack.ss_size = sizeof(stack_one);
    cont_two.uc_link = &cont_main;
    cont_two.uc_stack.ss_sp = stack_two;
    cont_two.uc_stack.ss_size = sizeof(stack_two);
    getcontext(&cont_one);
    makecontext(&cont_one, (void (*)())routineOne, 2, &cont_one, &cont_two);
    getcontext(&cont_two);
    makecontext(&cont_two, (void (*)())routineTwo, 2, &cont_two, &cont_one);
    g_turn = 0;

    // Call routineOne with current continuation. Continue from here
    // after routineOne finishes.
    getcontext(&cont_main);
    if (g_turn == 0) {
        setcontext(&cont_one);
    }
    return 0;
}

Run this program and you should see the following output:1

Routine one: 1
Routine two: -1
Routine one: 2
Routine two: -2
Routine one: 3
Routine two: -3

Hopefully you’ve found something interesting in this article. Now an exercise for the reader: If you step through this program in gdb, what do you think will happen? Will gdb get confused? Can you step from main() into routineOne() and then into routineTwo()? Try it. :-)


  1. This program runs fine in Linux, but unfortunately it causes a bus error in Mac OS X. I’m not sure why. Perhaps there is some issue in the Mac OS X system library. 

Building Thrift

Thrift is Facebook’s cross-language data-exchange and RPC library, similar to Google’s Protocol Buffers. In my opinion, it also seems to be better (supports more data structures and programming languages) than Protocol Buffers. Unfortunately, its documentation isn’t very comprehensive, and many people have trouble getting it to work. In Ubuntu/Debian Linux, besides the requirements on the official homepage, you need the following packages to compile Thrift:

libboost-dev
python-dev
ruby1.8-dev
byacc
flex

Hopefully after thrift comes out of the Apache Incubator, the documentation would be more complete.

Writing Tests First

The test-driven development (TDD) methodology advocates the following practice (in that order):

  1. Write tests for the feature you want to implement.
  2. Watch the tests fail.
  3. Write enough code for the tests to pass.
  4. Refactor your code.

I usually don’t care about the order in which the tests and the production code are written. I am used to a more traditional approach — write the code, and then the tests. Recently I realized one big benefit of writing tests first (in addition to all the other benefits the TDD advocates have been saying). Writing the tests first and watching them fail make sure the tests are indeed working, and after you write more code to make the tests pass, you are sure that the new code is indeed doing the work. If you write the production code first and then write the tests, the tests pass, but then you cannot be sure whether your code is indeed correct or your tests are broken (pass when they shouldn’t have).

In a few places in one of my personal projects, I mistakenly used assert() instead of the assertTrue() JUnit function. assert() is only effective in debug mode, so these tests end up useless.