More parrallel programming

As a faithful follower of the Fedora Planet, today I stumbled upon a post about parallel programming in Python. Having made similar experiences myself, I would like to add another alternative for parallel programming in Python. I could have posted this in the comments of the original post, but this way the formatting is nicer.

My point is, that Parallel Python is a really nice library, but the functionality (at least at the level demonstrated here) is also provided by the multiprocessing module included with Python.

Here is my slightly modified implementation of the same program:

#!/usr/bin/python
 
"""
Another asynchronous python example
"""
 
import multiprocessing
import time
 
def background_stuff(num):
  time.sleep(5)
  return "%s I'm done" % num
 
if __name__ == "__main__":
    print "Start at:" , time.asctime(time.localtime())
    pool = multiprocessing.Pool()
 
    print "Start doing something"
    it = pool.imap(background_stuff, [(1,), (2,)])
 
    print "Do something..."
    print " ... do something else..."
 
    print it.next()
    print it.next()
 
    print "End at:", time.asctime(time.localtime())

Go C10k

Inspired by (admittedly „ancient“) part 1 of a very instructive post series by Richard Jones and while waiting for Channels API support in Google’s new Go App Engine, I decided to make a quick port of Richard’s mochiconntest_web Erlang module to Go.

Source first, discussion later:

package main

import (
	"fmt"
	"http"
	"log"
	"time"
)

func main() {
	http.HandleFunc("/test/", feed)
	if err := http.ListenAndServe("127.0.0.1:8080", nil); err != nil {
		log.Fatal(err)
	}
}

func feed(w http.ResponseWriter, r *http.Request) {
	ch := time.Tick(1000000000)
	fmt.Fprintf(w, "Goconntest welcomes you! Your Id: %s\n", r.URL.String())
	w.(http.Flusher).Flush()
	for N := 1; ; N++ {
		<-ch
		n, err := fmt.Fprintf(w, "Chunk %d for id %s\n", N, r.URL.String())
		if err != nil {
			log.Print(err)
			break
		} else if n == 0 {
			log.Print("Nothing written")
			break
		}
		w.(http.Flusher).Flush()
	}
}

Of course this could be stripped down by ignoring (i.e. not handling) errors, but IMHO this doesn’t fall behind the Erlang implementation in terms of beauty; well, my sense of „beauty“ anyway.

You might have noted, that I’m not using the time.Sleep function, but instead a seemingly „artificial“ construct with a time.Ticker channel. This internally allows for a multiplexing of the Goroutines spawned by the http module to only a few system threads, scheduling the network operations via epoll.

Update 2011-05-23: Removed usage of ChunkedWriter from Go source according to a hint from Brad Fitzpatrick.

Update 2011-05-25: The flushing by w.(http.Flusher).Flush() is necessary to make sure that the chunks get written in-time. Thanks to Brad Fitzpatrick for pointing this out.

For the client side, I used Richard’s floodtest.erl module, only removing the {version, 1.1} line from the http:request, because that option doesn’t seem to be supported by Erlang R14B.

The result from my Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz:

The RSS usage of goconntest_web converges at 321MB i.e. roughly 32KB per connection (13KB less than the ad-hoc Erlang implementation). Also note, that Go indeed spawned only 4 threads on this quad-core machine to handle the 10k open connections.

The only disappointment was the rather high CPU usage. Although I’m aware that the screen-shot is far from any acceptable benchmark measure, still 1.5h CPU time seems to be more than „practically nothing“ Richard reported for the Erlang implementation.

Update 2011-05-25: While testing the new code above, I logged the process statistics with pidstat and then made the following plot (Python script, data file):

Mind the logarithmic time scale! The dashed line indicates the ramp-up of opened connections to 10k (theoretical, not measured). The colorful background is a stacked plot of the per-thread CPU usage with one color per kernel thread-id. Obviously Go spawns more threads than there are cores. Finally, the thick black line is the resident memory usage, which seems to converge much later than the CPU usage.

Thanks for reading. Maybe, if the App Engine Channels API takes long enough, I will continue my investigations.