Tuesday 10 February 2009

C ain't so bad

I've been catching up with the Stackoverflow podcast recently, and the usual conversation about C and whether a developer should learn it came up a few times.

Instead of telling you why you should or should not learn C, I'm going to talk about the observations I've made while programming in it. It really isn't as bad as people make out.

I've entered an interesting stage in my evolution as a programmer... after 6 months of Java at my previous employer, and seeing how far OOP can be taken, I decided to get back to my roots and work on some C projects in my spare time.

Abstract types and OOP

One of the first things you need to understand when coding C is the importance of the typedef keyword. When I first learnt C I must have been around 14, and at that time typedef was just a useful way of creating aliases in code. Instead of writing struct foo all over the place, I could just write Foo. Neat.

But now, with everything I've learnt about abstraction and types over the years, I realize the power this keyword has. With typedef I can create abstract types, and as long as everybody follows the rules, these abstractions are just as solid as in any other language:

typedef struct {
        llist_t** buckets;
        size_t capacity;
        size_t size;

        /* ... */

} hashmap_t;

I've just created an abstract hashmap type. And whatever is inside that struct must not be touched by anybody, except for the functions that operate on that type.

Sure, somebody could access these members, but that's breaking the rules, and there are rules in every language. In C you don't touch the members of a typedef. Got it?

So now I have this type, and I can operate on it with the functions provided:

hashmap_t* hashmap;

hashmap = hashmap_create(&hash_fp, &key_compare_fp);
hashmap_put(hashmap, L"key", L"value");

/* ... */

hashmap_destroy(hashmap);

This is perfectly abstract, and the implementation details are just as hidden as in any other language. I can see the members of the typedef, but I can see the members of a C++ or Java class too. So what are we missing as far as OOP goes?

OOP is not abstraction or information hiding. It's not even associating types with the methods that operate on them... there's not much difference between writing this:

byte[] a = myStream.read(512);

...and writing this:

byte_vector_t* a = stream_read(my_stream, 512);

stream_read() can only be used on a stream_t. So why do we need to bundle them together? It basically amounts to a syntactic difference. And with the C way of doing things, we get mixins for free.

So take away all these things, and we are left with the essence of what OOP is: inheritance and polymorphism. OK, so we've learnt nothing, except that C has all the power for creating abstract types that other languages have.

Oh, and no, I do not think implementing polymorphism in C is a good idea!

Productivity

This is one topic where I have to agree with the anti-C folks. In C, you have to build your entire world before you can get anything done. Take my current pet project as an example. In the past few days I've created the following types:

llist_t          /* linked-list, naturally */
hashmap_t
string_t         /* Every C programmer will end up writing a string type */
stringbuilder_t
stream_t         /* Simple stream type allowing multiple
                    backend implementations. */

bytestream       /* implements stream_t */
filestream       /* implements stream_t */
tcpsocket_t
tcpsocket_stream /* implements stream_t */
udpsocket_t
thread_t

I've also written functions for logging, utf8 encoding/decoding, and a whole host of other basic requirements... and I'm still not done. This is quite a lot of work, especially if you want to build robust and well-tested code.

I could use libraries for this functionality, but the fact remains that you don't get this as standard. I also think C programmers don't trust other programmers' code, even their own code from a few years ago. This means rewriting your own personal C library every 3 years.

Resource management

Every C programmer will create memory leaks. Every C programmer will corrupt the heap or stack. Every C programmer will forget to call close() on something they open()'d. This means hours of frustrated debugging and lost time.

Even if they don't screw up, there is significant overhead in writing open/close create/destroy type code in the first place. Not to mention it's ugly as hell.

I would kill puppies to get better resource management in C. I'm not talking about garbage collection here, but at least some sort of stack-based destructor support so we can do RAII.

Polish

It will take you 3x as long to do anything in C as it will in a higher level language with good libraries. This means you don't have as much time to add polish to your program because you are still trying to get the damned thing to work correctly.

You can get a polished product in C, but it will take you longer. And your competitor who has written the same application in C# for the .NET platform will have V2.0 out before you've even finished your new localization library.

It makes me wonder... maybe this is why so much open source software sucks. A lot of it is written in C, and it's just harder to get a decent product at the end. With Java and NetBeans I can drag and drop buttons onto a form, and spend 20 minutes making the interface look perfect. In C I have to hand-code the pixel values and recompile before I can see how it looks. It's not hard to see which version will be better.

A few plus points

Everybody who has written C will remember with pain the hours spent hunting down a memory corruption bug. You know you've screwed up when you see 'impossible' behaviour and random crashes which mysteriously disappear when you add some trace statements.

But searching for these mistakes develops your skills of debugging. You are forced to think things through logically, and narrow down the possible locations for the bug. You become a master of printf debugging which is still an essential skill, even in today's world of interactive debuggers.

I spoke about how you have to build your own world in C. But this is also a good thing. You learn so much from implementing things yourself, whether it's just a hash table, or some HTTP handling code for your file download library. You end up with a broad knowledge of different protocols and algorithms, which puts you on a totally different level to the kind of programmers that stick with their Java or .NET libraries.

Jeff Atwood has written about this very same topic in Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels. Just because it's been done before, or there is a library for it, does not mean it's wrong to implement it yourself. The learning experience alone justifies the effort.

Another thing I like about C is that it is possible to write cross-platform code. The standard library doesn't give you much, but it's easy to write your own cross-platform types for threads, sockets, file system interaction, etc. So you get cross-platform code, with the leanness and speed of C.

The next stage for me

So that's what I think about C, but the question of productivity is still strong in my mind. I think the next stage for me is to say goodbye to the build-your-own-world philosophy and really immerse myself in the maximum-productivity space for a while.

It's strange though, I feel a slight apprehension when I picture myself seated in the world of high-level .NET or Java programming. There are people that naturally stay at the low levels, writing operating systems and low level APIs, or compilers and runtimes for new languages. Maybe that's where I belong.

Meh, I'll just get back to building my world. It's fun.