justin․searls․co

Drive-by Active Storage advice

I'm working on a conference talk and there won't be time for me to detail each and every piece of advice I've accrued for each technical topic, so I'm going to dump some of them here and link back to them from the slides.

Today's topic is Active Storage, the Ruby on Rails feature that makes it easy to store user-generated assets like photos and videos in the cloud without clogging up your application or database servers.

Before you do anything, read this absolutely stellar post describing how to get the most out of the feature and avoid its most dangerous foot-guns.

Here goes.

Wrap each attachment in a model

You never know when an attachment will need the other trappings of a model (behavior, validation, etc.), so I future-proofed mine by wrapping each type of attachment in a model.

Here's an example. This model represents the video attachment for each movement in Build with Becky:

module Build
  class MovementVideo < ApplicationRecord
    include Attachable

    belongs_to :movement, touch: true
  end
end

Keep consistent names, variants, and validations

The class above is so empty because I have an Attachable concern that does all the basic attachment-y stuff:

module Attachable
  extend ActiveSupport::Concern

  included do
    has_one_attached :file, dependent: :purge_later do |attachable|
      attachable.variant :preview, resize_to_fill: [400, 400], preprocessed: true
      attachable.variant :still, format: "jpg", resize_to_limit: [2000, 2000], saver: {quality: 85}, preprocessed: true
    end
    validate :file_seems_legit

    def preview_image_representation(process: false, variant: :preview)
      return if file.blank?
      representation = file.representation(variant)
      representation = representation.processed if process
      representation
    rescue ActiveStorage::Preview::UnprocessedError
      nil
    end

    def feed_ready?
      file.attached? &&
        file.representation(:preview).key.present? &&
        file.representation(:still).key.present?
    rescue ActiveStorage::Preview::UnprocessedError
      false
    end

    # This will enqueue a job to reprocess the variants for this visual.
    # It will only do so if the variants are not already processed, unless force: true
    def reprocess_variants!(force: true)
      file_attachment.send(:named_variants).each do |name, named_variant|
        if named_variant.preprocessed?(self) && (force || file.representation(name).key.blank?)
          file_attachment.blob.preprocessed(named_variant.transformations)
        end
      end
    end

    def video?
      file.attached? && file.video?
    end

    def media_type
      if file.video?
        :video
      elsif file.image?
        :image
      end
    end

    def aspect_ratio
      return unless file.metadata.key?(:width) && file.metadata.key?(:height)

      file.metadata[:width] / file.metadata[:height].to_d
    end

    def file_seems_legit
      if !file.attached?
        errors.add(:file, "must be attached")
      elsif !file.content_type.match?(/^(image|video)\//)
        errors.add(:file, "must be an image or video")
      elsif file.image? && file.byte_size >= 8.megabytes
        errors.add(:base, "images must be smaller than 8 MB")
      elsif file.video? && file.byte_size >= 1.gigabyte
        errors.add(:base, "videos must be smaller than 1 GB")
      end
    end
  end
end

This obviously does quite a few things. Notably:

  • Calls has_one_attached. Never use has_many_attached. If your Movement needs multiple videos, it should have a has_many to the MovementVideo model instead. Trust me
  • Names the attachment :file. It has been extremely nice knowing that I can rely on the assumption that every single attachment across dozens of attachment types is named file
  • Defines a consistent set of variants and schedule them to be pre-processed asynchronously
  • Validates basic presence, type, and size rules
  • Can quickly answer whether it's an image or video
  • feed_ready? provides the very useful answer of saying whether or not the attachments variants have been processed. Since this is asynchronous and (for large videos, especially) can be slow, this can allow the UI to skip unprocessed attachments and avoid the risk of triggering synchronous processing on the application server (which would be very bad)

For most apps, it's unlikely any of this stuff has a valid reason to be different from attachment to attachment and the risk of duplicating all this everywhere is that you forget something important (like a size limit validation).

Turn on Direct Upload

You almost certainly want to utilize Direct Upload so that you don't have application servers clogged waiting for a user's crappy connection to finish uploading an 8GB video.

And if you do enable direct upload, you should realize that if validation fails when persisting the model after a form submission, the uploaded file will be orphaned by default upon re-render. This is bad. So to avoid it, you can do this weird thing I hacked together to render a hidden input and a file input with the exact same name.

Here's my _direct_upload_file_field.html.erb partial, which will ensure the file is still attached upon resubmission while still allowing the user to choose a different file:

<%# This is some real horseshit. If you upload a file with direct_upload and
  validation/save fails server-side, it's up to you to check on re-render
  for a blob ID and re-embed it in the form in a hidden field. Mercifully it
  works even though two inputs are named the same thing (this and the file field). %>
<%= f.file_field name, direct_upload: true, **local_assigns[:input_options] %>
<% if f.object.new_record? && f.object.send(name).blob.present? %>
  <%= f.hidden_field name, id: nil, value: f.object.send(name).blob.signed_id %>
  <% unless local_assigns[:hide_reassurance] %>
    <span class="font-bold text-danger">
      Don't worry we haven't lost your upload of <span class="font-mono"><%= f.object.file.blob.filename %></span>, you don't need to to upload it again
    </span>
  <% end %>
<% end %>

Remember to includes everything

If you don't love N+1 queries, you're going to want to get in the habit of auditing every route for cases where attachments are referenced. Prosopite seems good.

You get a magic scope for each attachment, and since I name mine file, it's always something like MovementVideo.with_attached_file. Of course, since I'm usually loading several layers of nested models, this helper isn't very useful without overly precious use of Arel's merge method, so I wound up writing my own helper to assemble all the deeply-nested hashes I need to pass to includes:

# app/lib/includes_hashes.rb
module IncludesHashes
  # Extracted from with_all_variant_records
  # https://github.com/rails/rails/blob/f4a9b7618fc32f0d3b2c0ff03a3f34f4964cc553/activestorage/app/models/active_storage/attachment.rb#L45
  INCLUDES_WITH_ATTACHED_FILE = {
    file_attachment: {
      blob: {
        variant_records: {image_attachment: :blob}
      }
    }
  }.freeze
  INCLUDES_WITH_ATTACHED_FILE_PREVIEW = {
    file_attachment: {
      blob: {
        preview_image_attachment: {blob: {variant_records: {image_attachment: :blob}}}
      }
    }
  }.freeze
  def self.includes_with_attached_file_hash(image: false)
    INCLUDES_WITH_ATTACHED_FILE.deep_merge(
      image ? {} : IncludesHashes::INCLUDES_WITH_ATTACHED_FILE_PREVIEW
    )
  end
end

This way, I can assemble an includes like this one to eager-load my attachments and variants without risking an N+1:

Movement.includes(
  video: IncludesHashes.includes_with_attached_file_hash,
  thumbnail: IncludesHashes.includes_with_attached_file_hash(image: true),
)

Using the proxy or redirect servers in production seems like a Bad Idea™. Proxying will block your application server and redirecting will result in bizarro Safari bugs, which will flip the fuck out when you have a page of 100 images that are all resolved by redirect.

Personally, my preferred approach is to generate URLs that point to a CDN of the assets. I use Amazon Cloudfront and configure it to be backed by an Amazon S3 bucket. There is a correct and arcane to configure all this, but what do I look like, a DevOps consultant?

On the Rails side, there's almost nothing to do, since your server won't be serving assets. I just defined a route like this 8 months ago and forgot about it until I sat down to write this:

# config/routes.rb
direct :public_cdn do |representation, options|
  if Rails.configuration.active_storage.service == :amazon
    "https://#{ENV["CDN_HOST"]}/#{representation.key}"
  else
    url_for(representation)
  end
end

This way, I can generate URLs with public_cdn_url(attachment_holder.file) that will resolve correctly in both development and production.

Variant processing is non-trivial

Processing preview images and variants requires non-trivial amounts of bandwidth and computational resources. You need vips and ffmpeg by default. On Heroku, that means adding the activestorage-preview buildpack.

You will run into issues

I ran into about a dozen Active Storage bugs over the course of working with it for a few months. Of these, I worked around almost all of them and only fixed one or two, because I'm a busy person.

My general advice is to be more paranoid than usual when Active Storage is involved:

  1. Have a backup strategy in place for your assets. I've already needed mine once after a pre-production background worker wound up purging production assets
  2. Audit how much space you're consuming on your cloud provider, looking for (and potentially purging) blob keys that are no longer known to your database
  3. Exploratory test regularly in production

Some of these issues are due to the fact that Active Storage isn't widely used. Some are because the code has experienced a lot of churn, particularly where it tries to distinguish images and non-image previewable attachments (e.g. PDF and video). Some are because the problem domain is inherently resource-intensive and rife with failure states. GLHF.


Got a taste for hot, fresh takes?

Then you're in luck, because you can subscribe to this site via RSS or Mastodon! And if that ain't enough, then sign up for my newsletter and I'll send you a usually-pretty-good essay once a month. I also have a solo podcast, because of course I do.