You should not implement #to_a for your classes

Well, occasional reader of this blog is no stranger to strong statements, so here is one more of them.

For a short preface, lets reiterate about one useful Ruby idiom: Kernel#Array method, a.k.a. “I don’t care; it should be an Array!”

If at some point of working with data you have a method that could’ve received, or returned, both singular value and array of values, and at next point you need to remove this distinction, and have definitely an array, you can do that:

Array(1) # => [1]
Array('test') # => ["test"]
Array([1, 2, 3]) # => [1, 2, 3]  -- nothing changed, it is already an array
Array(nil) # => []  -- well... makes sense

That’s one of the small brilliant pieces of functionality that is easy to overlook, but nice to have, and it follows your intuitions most of the time. Array(nil) is a nice touch here, being “less logical” than returning [nil], but in fact more of “what you really wanted” in real life.

But sometimes, this intiutions got broken:

val = Time.now
Array(val)
# => [3, 21, 10, 2, 2, 2018, 5, 33, false, "EET"]

Ugh… WHAT? The thing is, Array method works by first trying val#to_ary, and if it doesn’t respond, val#to_a, which Time eventually implements.

Note: here is an explanation of #to_ary vs #to_a difference and its importance.

Which leads us to a valuable lesson:

Never implement #to_a for your class (especially if it is representing value objects), unless it IS a collection.

But why?..

I’ll ask you this: what is the semantics of #to_a, if the object is not a collection?

Most of the time (like in Time’s case) the existence of this method is caused by the fact that Ruby does not have “array vs. tuple” distinction, so it means “convert to some meaningful tuple of internal values.”

But for this task, there are no benefits in using “standard” name, as compared to specialized one. There is no situation (at least, I can’t imagine one) when you have, say, an array of heterogeneous objects, and you want to call array.map(&:to_a) and if one of the objects is instance of Time, it is expected and desired to have “list of time components” in its place. It instead means you have some bug (like something returned Time when you’ve expected an array).

Moreover, selection of specialized method name almost always leads to a cleaner design. As an example, I recently run into the same problem when playing with my own Geo::Coord datatype. Being designed closely after Time’s design, it used to have #to_a method, and return [lat, lng]. Again, it didn’t play well with Array() idiom… So, I decided to rename #to_a into …. just #latlng.

Which, by the way, advice the existence of symmetric #lnglat method, and, indeed, some APIs expect a pair of coordinates in that order. It can be modeled with to_a(order: :lnglat) or something like this, but two methods (and NO #to_a) seems the cleanest and most discoverable way.

Q: What about other `#to_<x>` methods?

You may ask: if it is so bad about #to_a, should we also abandon #to_h or #to_s for our value objects? You should not. Because:

Their semantics is different, and less questionable one.
- #to_s is “how the object would be represented on puts and string interpolation.”
- #to_h, in those JSON-fueled times, became a useful idiom for “dump entire object into basic data types, while preserving its meaning visible,” in fact, a proper and modern replacement for “dump entire object into some tuple.”
Despite the fact Hash() method also exists, it is much more restrictive, and it is much harder to imagine code using the idiom “convert any value into hash” while NOT expecting some value objects to be convertible.

Q: Why does `Time` implement it?

Historical reasons, I believe. At the time of Time design (pun is not intended), there was no notion of keyword arguments, or even key: val Hash syntax shorthands. So, core classes designed at that dark ages, if they should’ve been able to be constructed from many values (like Time does), just had to accept this long list of values to the constructor. Like Time still does, in several different ways – because with just positional arguments it is hard to design proper constructor from 1 to 10 values.

Therefore, it seem logical and symmetrical to have “deconstruct” method, so that it can be easily constructed back:

Time.mktime(*Time.now.to_a)
# => 2018-02-02 06:57:57 +0200

The funny thing is to present date time have no keyword arguments constructor, and no #to_h method either.

Q: What other core classes do implement `to_a`?

Object.constants.map(&Object.method(:const_get))
  .grep(Class).select { |c| c.instance_methods.include?(:to_a) }
# => [Array, Hash, NilClass, File, Struct, Enumerator, MatchData, Dir, Time, Range, IO, StringIO]

Alongside already mentioned Time, what stands out here is Struct, which is core language base for all value objects… And fall into the same problems with “is it Array” question. But with Struct it is even stranger (if you think in “value objects” mindset): it not only implements #to_a but also #each and includes entire Enumerable! So it probably thinks of itself more as a “fancy hash with the dot-key notation” than value object.

Which leads us, by the way, to one more questionable rule:

Bonus: You should not implement `#each` either (unless it is a collection)

The reasons are the same: what is semantics for “each value in X” if X is not a collection?.. It is pretty vague, and some core classes still demonstrate that:

Quick, from the top of your head, what IO#each enumerates?..

OK, most of us already know the answer (lines, which are split by… $\, or $OUTPUT_FIELD_SEPARATOR, or else what you’ve passed to the method as a string separator), but intuitively it is neither only possible implementation, nor the most expectable one. Well, it was in the age of “scripts” and plaintext, but let’s face the truth: this age is far gone, and now “iostream, each (something)” can mean bytes, or Unicode codepoints, or buffer chunks or whatnot. IO#each_byte and IO#each_line(separator) have the same flexibility, and much more readability.

Historical note: String used to have #each and be Enumerable (with the same meaning as for IO), but it was rethought, and now we have #each_<byte,char,codepoint,line> and no bare #each. Which is good.

So, please read Arcency’s excellent “Stop including Enumerable, return Enumerator instead”, and name the method #each_<something_meaningful>, and don’t include Enumerable.

Unless it is a collection.

Q: But what if it IS a collection?

Then implement #each, include Enumerable (which will give you free #to_a, which, though, you may reimplement more effectively if you need), and also probably implement #to_ary, which will give you a bonus of playing well in *your_collection context.

Side note: Being bold, I’d even say that Hash#each and #to_a never felt that comfortable for me. Again, there is hard to imagine the context where Array({some: 'hash'}) # => [[:some, "hash"]] would be most expected outcome, and sometimes I feel like #each_pair # => Enumerator could suite for a better code than included Enumerable. But maybe I am just high on architectural astronautics.

But why?..

Q: What about other #to_<x> methods?

Q: Why does Time implement it?

Q: What other core classes do implement to_a?

Bonus: You should not implement #each either (unless it is a collection)

Q: But what if it IS a collection?

Q: What about other `#to_<x>` methods?

Q: Why does `Time` implement it?

Q: What other core classes do implement `to_a`?

Bonus: You should not implement `#each` either (unless it is a collection)