Object-oriented programming in C - Florian octo Forster's Homepage

Introduction

"Object-oriented programming" (OOP) is a programming paradigm, in which data is bundled with functionality, such that the structure of data and the implementation of functionality on that data is "encapsulated", i. e. hidden from other parts of the program which don't need to know how something is done. Other aspects of OOP are "polymorphism" and "inheritance", both of which are used excessively among OO-programmers, often without much justification. But I take it you're not reading this to learn what OOP is, but how this idea can be used in C and when it may make sense to be used.

A common and good example for where OOP is great are storage data structures, such as AVL-trees: They are complex enough so that you don't want to rewrite them from scratch every time you need to store some data, yet their functionality is strictly limited. To implement AVL-trees you need some data structures, primarily a tree, of course. Each "node" has a pointer to it's "parent", there's possibly some kind of comparator and there's a pointer to user data in each node or only the leaf nodes. Also, there's functions to do a left rotation, right rotation and combinations of the two. All of this is absolutely uninteresting for the code using the tree. When using a tree you want to put stuff in and you want to get it back again, and that's all there's to it.

Method calls

It's important to note at this point that OOP is a programming paradigm, not a language feature. C++ has some special syntax which is supposed to help writing OO-code, but that doesn't mean that you need these features to write OO-code. The most important OOP-feature is probably that you can call "methods" (functions that are associated with the data) so that these methods can access the data it's supposed to work on. This is often done through so called "self pointer". C doesn't have this feature, so the functions cannot automatically access the data. As a consequence you cannot write something like:

my_object->my_method (args);

Instead you will have to use:

my_method (my_object, args);

Sure, the first syntax has some appeal, but the second example is doing basically the same, just without automatic self pointer.

Opaque data types

If you give out a struct which is then used as your object (i. e. handed to every "method") you won't have to wait long until someone messes with the data instead of calling your methods. It's astonishing how creative people will become when telling you why they did this, but they will do it, trust me! The only way to prevent this is to not tell them how an object is organized. Fortunately, C has the concept of opaque data types which accomplish exactly that. It is possible to declare a structure, but not define it. The code can then use pointers to such a data type, but it cannot dereference it or instantiate such a struct. To come back to the AVL-tree example, the data type could be declared (but not defined) as follows:

struct my_avl_tree_s;
typedef struct my_avl_tree_s my_avl_tree_t;

Declaring and defining methods

To use an AVL-tree you'd need to get a pointer to such a structure from somewhere and pass it to all methods that operate on the tree. Since you cannot allocate nor instantiate such a structure yourself, this must be done by the object, too. In OOP-talk this is called a constructor:

my_avl_tree_t *my_avl_create (void);

Since the internal structure is unknown and may need some cleaning up (e. g. closing file handles, freeing more memory) when the object is no longer needed. Thus, there needs to be a function to do that for you, called the destructor:

void my_avl_destroy (my_avl_tree_t *obj);

Now for some more methods that operate on the tree:

void my_avl_insert (my_avl_tree_t *obj, const char *key, void *value);
void *my_avl_search (my_avl_tree_t *obj, const char *key);
void *my_avl_remove (my_avl_tree_t *obj, const char *key);

To implement the public functions the module in which the functions reside must know the my_avl_tree_t structure. All you have to do is to define the structure in the module file, not the header file:

#include "my_avl_tree.h"

struct my_avl_tree_s
{
  my_avl_node_t *root;
  uint32_t depth;
  /* more information as needed */
};

The constructor knows then how big the structure is and can allocate enough memory easily:

my_avl_tree_t *my_avl_create (void)
{
  my_avl_tree_t *obj;

  obj = (my_avl_tree_t *) malloc (sizeof (my_avl_tree_t));
  if (obj == NULL)
    return (NULL);
  memset (obj, '\0', sizeof (my_avl_tree_t));

  /* more initialization */

  return (obj);
} /* my_avl_tree_t *my_avl_create */

Closing thoughts

Here are the advantages of this method as I see them:

The "user" (in this case: The author of other parts of the program or the programmer of a program using your library) is forced to use your interface. If you do internal changes (and you do them right ;) other parts of the program are not affected.
A modular layout of functionality is enforced, improving the overall structure of a program.
Last but not least: Splitting up the declaration and definition of a structure is not much work. Passing the object as first argument isn't either.

And some final thoughts on OOP in general and OOP in C in specific:

OOP is a nice concept when used in the right dose. I've seen cases where there was an abstract class, a class which inherited from it, another class that inherited from that one and another one (using multiple inheritance) and then there was a synchronized version from that last class, too. And then that very last object was instantiated exactly once. WTF?
I've seen many cases where inheritance was used and in almost no case it improved the program (from a code-wise point of view). In fact, right now I cannot recall a positive example of the usage of inheritance. There's a reason why OOP tutorials talk about dogs and cars and stuff, but not about actual real-world code examples, if you ask me.
In many cases it's best to have a new object which uses another object internally. For example you might want to write a cache which stores data and purges old data automatically. This cache will need some sort of storage internally. The die-hard OOP way would be to inherit from such a storage class, i. e. some sort of tree, list, hash table or similar.
If you reorganize the inner working of your cache (e. g. you just implemented the AVL-trees and you want to use them now instead of your simple linked list implementation) you need to inherit from another class, breaking a lot of code (as a consequence out of Murphy's Law). Thus in OOP libraries you'll probably find AVLCache, LLCache (linked list cache), HTCache (hash table cache) and so on. So much for "abstraction".
Operator overloading is made by the devil.
Polymorphism makes writing bad code easy and reading good code hard.

Thanks for your attention. If you have comments, questions or have found a typo, please let me know.