This article is part of the series Wandering in CRuby


What is an object?

We're used to saying that in Ruby everything is an object, so it seems only logical to start with this simple question: what is an object?

The RObject struct is defined as:

/* include/ruby/internal/core/robject.h */

struct RObject {

    /** Basic part, including flags and class. */
    struct RBasic basic;

    /** Object's specific fields. */
    union {

        /**
         * Object that use  separated memory region for  instance variables use
         * this pattern.
         */
        struct {
            /** Pointer to a C array that holds instance variables. */
            VALUE *ivptr;

            /**
             * This  is a  table that  holds  instance variable  name to  index
             * mapping.  Used when accessing instance variables using names.
             *
             * @internal
             *
             * This is a shortcut for `RCLASS_IV_INDEX_TBL(rb_obj_class(obj))`.
             */
            struct rb_id_table *iv_index_tbl;
        } heap;

        /* Embedded instance variables. When an object is small enough, it
         * uses this area to store the instance variables.
         *
         * This is a length 1 array because:
         *   1. GCC has a bug that does not optimize C flexible array members
         *      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102452)
         *   2. Zero length arrays are not supported by all compilers
         */
        VALUE ary[1];
    } as;
};

It contains a struct RBasic basic element and an array of instance variables VALUE *ivptr.

More specifically, ivptr contains instance variables values. For optimization reasons, instance variables names are stored apart from the values and can even be stored in different ways. They deserve and will have a dedicated article.

As the comment says, the RBasic struct contains flags and class information. Without the C++ -specific code, it is defined as such:

/* include/ruby/internal/core/rbasic.h */

struct
RUBY_ALIGNAS(SIZEOF_VALUE)
RBasic {

    /**
     * Per-object  flags.  Each  ruby  objects have  their own  characteristics
     * apart from their  classes.  For instance whether an object  is frozen or
     * not is not  controlled by its class.  This is  where such properties are
     * stored.
     *
     * @see enum ::ruby_fl_type
     */
    VALUE flags;

    /**
     * Class of an object.  Every object has its class.  Also, everything is an
     * object  in Ruby.   This means  classes are  also objects.   Classes have
     * their own classes,  classes of classes have their classes,  too ...  and
     * it recursively continues forever.
     *
     * Also note the `const` qualifier.  In  ruby an object cannot "change" its
     * class.
     */
    const VALUE klass;

};

To sum up:

  • An object has flags (see include/ruby/internal/fl_type.h). These flags indicate whether it is frozen, whether it is a singleton class (we'll talk about singleton classes in an upcoming article), etc.
  • An object has a class (in the sense of is an instance of, as in is an instance of String).
  • An object has instance variables.

RObject

Notice there is absolutely no method stored in an object.

But objects have instance methods, right?

They don't have instance methods, they respond to instance methods, which is different.

Since in Ruby methods are defined in classes, let's talk about classes.

What is a class?

The RClass struct is defined as:

/* internal/class.h */

struct RClass {
    struct RBasic basic;
    VALUE super;
    struct rb_id_table *m_tbl;
};

It contains:

  • a RBasic struct (as RObject does),
  • a super value that represents its superclass (which class the current class inherits from),
  • and a methods table named m_tbl.

RClass

Hold on.. Aren't Ruby classes supposed to store more information than that? Where are class variables stored? And the constants defined within classes? Instance variables can also be defined in classes, where are they?

Good questions. Maybe we'll find hints if we look at how classes are allocated and/or initialized. I don't know exactly where I should start looking, so let's git grep "struct RClass" and see what comes out.

include/ruby/internal/core/rclass.h:#define RCLASS(obj)  RBIMPL_CAST((struct RClass *)(obj))
=> A macro to cast a value into `struct RClass`. Not helping for the current matter.

include/ruby/internal/core/rclass.h:struct RClass; /* Opaque, declared here for RCLASS() macro. */
=> "Opaque". I agree..

internal/class.h:#define RCLASS_EXT(c) ((rb_classext_t *)((char *)(c) + sizeof(struct RClass)))
=> Interesting. This RCLASS_EXT macro takes the address of an RClass and returns the address of some rb_classext_t structure
=> that apparently exists right after it in memory. I need to check what this struct is.

class.c:    size_t alloc_size = sizeof(struct RClass) + sizeof(rb_classext_t);
=> Following what we saw above, here is calculated the memory space required for both the RClass struct and rb_classext_t.

Let's read the function that uses this alloc_size variable:

/* class.c */

/**
 * Allocates a struct RClass for a new class.
 *
 * \param flags     initial value for basic.flags of the returned class.
 * \param klass     the class of the returned class.
 * \return          an uninitialized Class object.
 * \pre  \p klass must refer \c Class class or an ancestor of Class.
 * \pre  \code (flags | T_CLASS) != 0  \endcode
 * \post the returned class can safely be \c #initialize 'd.
 *
 * \note this function is not Class#allocate.
 */
static VALUE
class_alloc(VALUE flags, VALUE klass)
{
    size_t alloc_size = sizeof(struct RClass) + sizeof(rb_classext_t);
    /* ... */
    NEWOBJ_OF(obj, struct RClass, klass, flags, alloc_size, 0); /* <== malloc */
    memset(RCLASS_EXT(obj), 0, sizeof(rb_classext_t));
    /* ... */
    return (VALUE)obj;
}

It confirms what I thought: when a class' memory space is allocated, it contains enough space for both an RClass struct and a rb_classext_t.

Let's now read the definition of rb_classext_t:

/* internal/class.h */

struct rb_classext_struct {
    VALUE *iv_ptr;
    struct rb_id_table *const_tbl;
    struct rb_id_table *callable_m_tbl;
    struct rb_id_table *cc_tbl; /* ID -> [[ci, cc1], cc2, ...] */
    struct rb_id_table *cvc_tbl;
    size_t superclass_depth;
    VALUE *superclasses;
    struct rb_subclass_entry *subclasses;
    struct rb_subclass_entry *subclass_entry;
    /**
     * In the case that this is an `ICLASS`, `module_subclasses` points to the link
     * in the module's `subclasses` list that indicates that the klass has been
     * included. Hopefully that makes sense.
     */
    struct rb_subclass_entry *module_subclass_entry;
    const VALUE origin_;
    const VALUE refined_class;
    union {
        struct {
            rb_alloc_func_t allocator;
        } class;
        struct {
            VALUE attached_object;
        } singleton_class;
    } as;
    const VALUE includer;
    attr_index_t max_iv_count;
    unsigned char variation_count;
    bool permanent_classpath : 1;
    bool cloned : 1;
    VALUE classpath;
};
typedef struct rb_classext_struct rb_classext_t;

#define RCLASS_EXT(c) ((rb_classext_t *)((char *)(c) + sizeof(struct RClass)))
#define RCLASS_CONST_TBL(c) (RCLASS_EXT(c)->const_tbl)
#define RCLASS_M_TBL(c) (RCLASS(c)->m_tbl)
#define RCLASS_IVPTR(c) (RCLASS_EXT(c)->iv_ptr)
/* ... */

Great! We found the missing elements we were looking for:

  • iv_ptr: instance variables pointer. This is where instance and class variables are stored.
  • const_tbl: constants table. This is where the constants defined within the class are stored.

Extended RClass

Wait.. You said both instance and class variables, defined within a class, are stored in the same location?

Yes. As you know, in a Ruby script, instance variable names must start with @, and class variable names must start with @@. In CRuby, instance and class variables are stored in the same place and simply differentiated using their name. Again, the CRuby implementation of instance variables will be further explained in a dedicated article, let's not dig into this for the moment.

Okay, so from what we saw, a class has:

  • flags (flags),
  • a class (klass),
  • a superclass (super),
  • constants (const_tbl),
  • an array of superclasses (superclasses),
  • a linked list of immediate subclasses (subclasses),
  • instance and class variables (iv_ptr),
  • instance methods (m_tbl)
  • and other stuff.

Beware that superclasses are not to be confused with the ancestors method which returns more than just the content of the superclasses array, and that the subclasses linked list references all immediate subclasses but the Ruby subclasses method excludes singleton classes. Also note that, as far as my understanding goes, the subclasses method is a tricky tool to use: the returned classes' order is not guaranteed, and more importantly garbage collection can mess with it while it computes its result and cause your application to crash. We will talk about superclasses, ancestors, and subclasses in a dedicated article.

We now know what objects and classes are. Or do we?

"A class is an object"

Is it really? I mean, if everything in Ruby is an object, why don't we find a RObject element in either RClass or rb_classext_struct? Wouldn't it make sense?

I spent quite some time thinking about this before I realized I might be mistakingly thinking of objects as necessarily being RObject elements. What if objects were anything that behaves as an object? If it has flags, a class, and instance variables, then it's an object. After all, Ruby is known for duck-typing, so why not find this feel in its implementation as well? I might be wrong, though. Let's verify this hypothesis.

Both RObject and the extended form of RClass have a RBasic element, thus they share the exact same way of storing their flags and class. So far, nothing goes against the idea that both structs behave as objects. Remains the instance variables pointer which is called ivptr in RObject and iv_ptr in RClass. If they had the same variable name for storing instance variables, the matter would be closed, but they don't.

We're going to look at how they each handle their instance variables and try to understand how it magically becomes accessible in the same manner from within a Ruby script.

In Ruby, we fetch an instance variable's value using #instance_variable_get. In CRuby, the function that returns an instance variable's value is rb_obj_ivar_get.

/* object.c */

rb_define_method(rb_mKernel, "instance_variable_get", rb_obj_ivar_get, 1);
/* variable.c */

VALUE
rb_ivar_get(VALUE obj, ID id)
{
    VALUE iv = rb_ivar_lookup(obj, id, Qnil);
    /* ... */
    return iv;
}

VALUE
rb_ivar_lookup(VALUE obj, ID id, VALUE undef)
{
    /* ... */
    switch (BUILTIN_TYPE(obj)) {
      case T_CLASS:
      case T_MODULE:
        {
            /* ... */
        }
      case T_OBJECT:
        {
            /* ... */
        }
      default:
        /* ... */
    }
    /* ... */
}

Aha moment

There is a switch statement!

rb_ivar_lookup takes anything that can be considered as an object (obj), and depending on its built-in type (T_CLASS, T_MODULE, T_CLASS or any other type), it looks for instance variables in a different manner. They all behave as objects, although they're not all RObject objects.

This, for me, was an "Aha moment".

So the instance_variable_get method uses a switch statement, but is it an exception or is this really how CRuby always magically turns anything into objects? Let's look at another method just to double-check: instance_variables.

/* object.c */

rb_define_method(rb_mKernel, "instance_variables", rb_obj_instance_variables, 0);
/* variable.c */

VALUE
rb_obj_instance_variables(VALUE obj)
{
    VALUE ary;

    ary = rb_ary_new();
    rb_ivar_foreach(obj, ivar_i, ary);
    return ary;
}

void
rb_ivar_foreach(VALUE obj, rb_ivar_foreach_callback_func *func, st_data_t arg)
{
    if (SPECIAL_CONST_P(obj)) return;
    switch (BUILTIN_TYPE(obj)) {
      case T_OBJECT:
        obj_ivar_each(obj, func, arg);
        break;                                                                                                                                                                                                     
      case T_CLASS:
      case T_MODULE:
        /* ... */
        class_ivar_each(obj, func, arg);
        /* ... */
        break;
      default:
        /* ... */
        gen_ivar_each(obj, func, arg);
        /* ... */
        break;
    }
}

From now on, when you'll hear that in Ruby everything is an object, you'll know that CRuby makes everything behave as an object.

In the next episode...

In the next article, we'll talk about how instance variables are stored in CRuby 3.3. You will see that it's not the straightforward implementation you'd think of.

Meanwhile, if you too want to wander in the CRuby source code, here are some must-read files:

  • include/ruby/internal/core/*.h
  • include/ruby/internal/intern/*.h
  • object.c
  • variable.c
  • class.c

Thank you for reading!
Younes SERRAJ