What one must pass to includes() to include Active Storage attachments
If you're using Active Storage, eager-loading nested associations that contain attachments in order to avoid the "N + 1" query problem can quickly reach the point of absurdity.
Working on the app for Becky's strength-training business, I got curious about how large the array of hashes being sent to the call to includes() is whenever the overall strength-training program is loaded by the server. (This only happens on a few pages, like the program overview page, which genuinely does contain a boatload of information and images).
Each symbol below refers to a reference from one table to another. Every one
that descends from :file_attachment
is a reference to one of the tables
managed by Active Storage for
keeping track of cloud-hosted images and videos. Those hashes were extracted
from the
with_all_variant_records scope that Rails provides.
I mean, look at this:
[{:overview_video=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob}, :preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
{:overview_thumbnail=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob}, :preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
{:warmup_movement=>
{:movement_video=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob}, :preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}},
:movement_preview=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob}, :preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}}},
{:workouts=>
{:blocks=>
{:mobility_movement=>
[{:primary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:secondary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:tertiary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:movement_video=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}},
:movement_preview=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}}],
:exercises=>
{:exercise_options=>
{:movement=>
[{:primary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:secondary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:tertiary_equipment=>
{:equipment_image=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}},
:movement_video=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}},
:movement_preview=>
{:file_attachment=>
{:blob=>
{:variant_records=>{:image_attachment=>:blob},
:preview_image_attachment=>{:blob=>{:variant_records=>{:image_attachment=>:blob}}}}}}}]}}}}}]
By my count, that's 167 relationships! Of course, in practice it's not quite this bad since the vast majority are repeated, and as a result this winds up executing "only" 50 queries or so. But that's… a lot!
I've run into a lot of papercuts with Active Storage since starting to work with it in January of this year. I still believe it's the best tool for the job, but qualitatively it feels like it really would benefit from some simplification and refactoring, even if that would require some breaking changes to its (mostly undocumented, thankfully) rough edges.
An example frustration: 14 of these includes
hashes are to
preview_image_attachment
and each of those include four more associations for
a total of 70 out of 167 relationships. But preview_image_attachment
is
actually a specially-named variant record that only exists separate and apart
from variant_records
because of a quirk in how non-image videos and PDFs are
processed. Videos are analyzed the first frame of content and PDFs for their
first page, which is saved as an image attachment-of-the-attachment, and it's
from that second-order attachment that all other image variants (thumbnails,
etc.) are derived. However (and I could be wrong about this, because my own
efforts to unwind Active Storage's code have been unsuccessful), that preview
image could have just been stored as a normal variant record itself (perhaps
referenced as the root variant via an referential column on
active_storage_attachments
) rather than factored as a full-blown attachment
that requires 5 additional eager-load declarations for each attachment in a tree
of models.
Does the number of symbols in the above really matter from a performance perspective? I don't know! But if we could cut the size of the mess I just pasted above by nearly half, that would certainly feel nice.