Lessons from reading ostruct.rb
During this month’s Scottish Ruby User Group meeting we paired up read some code. We choose the source for OpenStruct as it was small and self-contained enough to get through in the hour or so available.
I expected it to be dull, but it was great fun and we all learnt a lot, mostly about stuff I should have known about Ruby, but had missed or forgotten. Here’s some highlights:-
First let’s quickly revise what on what an OpenStruct does. From the documentation:
An OpenStruct is a data structure, similar to a Hash, that allows the definition of arbitrary attributes with their accompanying values. This is accomplished by using Ruby’s metaprogramming to define methods on the class itself.
So you can do the following:-
While OpenStruct is similar to Hash, it isn’t a Hash; it does not extend Hash (or include Enumerable). The attributes are stored in a Hash member variable (@table) (see the initialize method). New attributes are captured using method_messing and the accessors are defined as methods on the object.
Freezing an OpenStruct
Freezing a Ruby object is supposed to prevent modifications. By default, this is achieved by disallowing assignment to instance variables. As the OpenStructs attributes are stored within a Hash that is assigned on initialisation(@table), then this alone would not prevent assigning values to an OpenStruct; while the OpenStruct would be frozen, @table would not be.
OpenStruct prevents assigning to frozen objects by all write operations accessing @table through the method modifiable.
Assigning a value to @modifiable will raise an error, if the object has been frozen.
Another way of ensuring an OpenStruct is properly frozen might be to override the freeze method.
My guess is that this method was not followed as it would have made it harder to control the error message and stack; the error would be “can’t modify frozen Hash”, not “can’t modify frozen OpenStruct”.
Massaging the backtrace
When errors are raised (in modifiable and method_missing) the backtrace is modified to start at the offending piece of client code. I like this - that’s where the debugging programmer needs to look to work out a fix, not in the middle of the library code which has had its contract violated.
define_singleton_method is method on Object that was introduced in Ruby 1.9, but had passed all us ScotRUG members by. It does what it says - defines a method on an object’s singleton class: that is it defines a method on an object instance without affecting other instances of its class. Prior to 1.9, the method would need to be retrieved - messy business.
This is the current way OpenStruct dynamically defines methods:-
The 1.8.7 way is a little less readable:-
There’s no opposite of _define_singleton_method; _remove_singleton_method isn’t a thing. So, delete_field finds itself dealing directly with the object’s singleton class.
singleton_class was introduced in 1.9.2 to be used in place of
This is the feature request thread for singleton_class.
In method_missing we found:-
I have never seen id2name before. It is a method on Symbol that returns the string corresponding to the symbol. I’ve always used to_s for that, which apparently is a synonym for _id2name.
Being a bit like a Hash, OpenStruct provides the each_pair method for iterating over the key-value pairs:-
Delegating to the @table Hash is straightforward enough. Using to_enum to return an enumerator needed a bit more reading.
to_enum is defined on object and creates a new enumerator, by calling the passed-in method. So by getting an enumerator from _each_pair, here’s what happens:-
- Call each_pair without a block
- to_enum on the instance is called passing in each_pair as the method_name.
- This time a block will be passed in, allowing the iteration (delegated to @table)
The number of attributes stored (@table.size) is given to to_enum as the return value of a block, because that’s how it is optionally done.
Using the return value of a block to get an optional value is a bit unusual. to_enum uses this, as it already has optional values in its method signature - arguments to pass to the method that takes the block.
This is a private method on Object which is called when dup or clone are used to create a copy (or clone). See Jon Leighton’s blog post.
OpenStruct overrides this initialize_copy to ensure that a copied object, gets a duplicate version of the @table Hash holding the key value pairs; otherwise the copy would share that data store, which would get weird. It also ensures that the dynamic methods are defined on the new copy; copy (unlike clone) does not duplicate the singleton class, so they would otherwise be missing.
I don’t see the protected keyword used much in application ruby code. I think being able to override encapsulation with send has made us a bit lazy. Allowing the @table data store to be read through a protected accessor, means it can be accessed by other OpenStruct instances when checking equality.
Inspect shows the contents of the OpenStruct in “key=value” form, where inspect is called on each of the values. Straightforward? You would think so, but here’s the implementation:
The thread current storage is a bit confusing at first. It’s purpose is to guard against infinite recursion, if an OpenStruct instance is stored in itself.
The object ids of all the OpenStructs currently being inspected are stored in the Thread.current, to ensure that they are only inspected once.
Evan Phoenix suggested that we should read code, in his keynote at this year’s Scottish Ruby Conference. Picking apart some well-written code is a great way to pick up on all the things you should know, but have somehow missed or forgotten.