Cross-posted from the Arc90 blog.

I’ve been a fan of Ryan Tomayko ever since I stumbled across his How I Explained REST to My Wife in 2005. (That piece was so good that I actually performed it live during a company lunch in early 2006, with the inimitable Kamni Khan.) He’s had a special place in my feed reader since before the Avi Flax release of FeedLounge (yes, that actually happened). So as soon as his most recent essay, I like Unicorn because it’s Unix, appeared in my current feed reader, as usual I read it immediately, and as usual I learned something.

The essay’s main points are that core Unix concepts such as fork(), accept(), select(), and Unix sockets had lamentably been ignored or neglected by proponents of Ruby and Python; that Unicorn is undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across; and that Ruby (and Python) developers should learn more about these paradigms, and consider using them where appropriate.

While I learned much from the essay, and enjoyed reading it and following the “* is Unix” meme it sparked, after I read it I had some questions rolling around my head which I couldn’t quiet. So, since comments are currently disabled on Ryan’s site, I wrote him an email. He was kind enough to respond quickly and at length, illuminating a subject I’d been fuzzy on for years. In fact, I thought his responses were edifying to the degree that they should be published. So after securing his permission, I’m reproducing our dialog here.

If you haven’t read it yet, I suggest reading I like Unicorn because it’s Unix before proceeding.

I wrote Ryan:

Hi Ryan, thanks for your recent piece on Unicorn and Unix. Enjoyed it, and learned… well, I learned that there’s even more important stuff I don’t know. (It’s funny sometimes being a software developer/architect and not having a CS background.)

Anyway, you definitely got me interested in learning more about the Unix programming and concurrency models, thank you for that.

One thing that has me a little confused: I got the impression that a lot of the difference between the Unix and Java/Windows approaches boils down to processes versus threads. Is that right? Is it an oversimplification?

(The process vs. threads thing seems related to the discussions going on about how to write concurrent programs; specifically the debates about shared data vs. message passing. Isn’t IPC essentially message passing?)

If that’s right, then what about Apache? Wasn’t the point of 2.x that it moved to a thread-based model over the process-based model of 1.x? And if so, then… what’s up with that? I’d think there were a ton of very smart, very unixy people involved in that effort; they must have had some good reasons for the switch, right? Or, more likely, I’m missing something.

Thanks!

Avi

And he responded, that night:

On Wed, Oct 7, 2009 at 7:44 AM, Avi Flax avif@arc90.com wrote:

> Hi Ryan, thanks for your recent piece on Unicorn and Unix. Enjoyed it, and learned… well, I learned that there’s even more important stuff I don’t know. (It’s funny sometimes being a software developer/architect and not having a CS background.)

Tell me about it. I don’t have a high school diploma :)

> Anyway, you definitely got me interested in learning more about the Unix programming and concurrency models, thank you for that. One thing that has me a little confused: I got the impression that a lot of the difference between the Unix and Java/Windows approaches boils down to processes versus threads. Is that right? Is it an oversimplification?

I think that’s about right. There’s a lot of other important differences but, now that I think about it, they all seem to stem from the basic model of concurrency. Java/Windows people pretty much have to use threads because their processes are handicapped. Unix has good processes, so Unix people can use either processes or threads.

Now, the Unicorn piece was talking specifically about threads in Ruby. Native threads are a lot more powerful than Ruby threads. I always choose processes over threads (even native threads) when I can get away with it, but native threads are useful, whereas Ruby threads really aren’t.

> (The process vs. threads thing seems related to the discussions going on about how to write concurrent programs; specifically the debates about shared data vs. message passing. Isn’t IPC essentially message passing?)

That’s precisely what it is. “Message passing” usually implies a specific kind of object encoding and message dispatch, though, whereas IPC is a bit more general. You have to build up messages and dispatch on top of simple binary streams between processes, which is what most message passing systems do but IPC usually implies you build that part yourself.

It’s good to think of Unix IPC as message passing, conceptually.

> If that’s right, then what about Apache? Wasn’t the point of 2.x that it moved to a thread-based model over the process-based model of 1.x? And if so, then… what’s up with that? I’d think there were a ton of very smart, very unixy people involved in that effort; they must have had some good reasons for the switch, right? Or, more likely, I’m missing something.

Process-per-connection — the model used by unicorn and the echo server example and apache’s preforking mpm (I think) — falls down at a certain level of high concurrency. There’s too much overhead in processes to solve C10K (http://www.kegel.com/c10k.html) problems. At that point, you basically have to use native threads or async/events. This doesn’t mean process-per-connection isn’t the right solution for a wide range of problems. Further, the basic technique of forking and sharing a socket between processes underlies async/event solutions — it’s just that each process can handle more than one connection in parallel (without threads). High concurrency servers that use fork(2), a shared socket, and async IO include nginx, memcached, lighttpd, and HAProxy. High concurrency servers that use threads include Apache, Varnish, and Squid. All are stable and production proven.

Here’s what I’m personally trying to forward in all of this: people confuse the idea that threads can always be applied with the idea that threads are always *right*. Just because a general solution exists (threads) doesn’t mean a more specific technique that uses processes isn’t the better solution. I really like process-per-connection where I can get away with it (and Unicorn is a perfect example of where you can get away with it) because it’s so amazingly simple; much simpler than threaded code and definitely much simpler than async/events. I like processes + async IO for highly concurrent systems because you just can’t beat it performance-wise and I personally enjoy working with event-based code. But threads? I can’t think of a problem I like solving with them, even though they can be used to solve a wide range of them.

Thanks,

Ryan

As you can see, Ryan was extremely generous with his time in responding to my email. Thanks Ryan! I learned a lot from this exchange, and I really appreciate it. Congratulations on your new position at GitHub, it’s clearly a perfect match.