Drive-by Active Storage advice
This post is also available in Japanese, care of Shozo Hatta
I'm working on a conference talk and there won't be time for me to detail each and every piece of advice I've accrued for each technical topic, so I'm going to dump some of them here and link back to them from the slides.
Today's topic is Active Storage, the Ruby on Rails feature that makes it easy to store user-generated assets like photos and videos in the cloud without clogging up your application or database servers.
Before you do anything, read this absolutely stellar post describing how to get the most out of the feature and avoid its most dangerous foot-guns.
Here goes.
Wrap each attachment in a model
You never know when an attachment will need the other trappings of a model (behavior, validation, etc.), so I future-proofed mine by wrapping each type of attachment in a model.
Here's an example. This model represents the video attachment for each movement in Build with Becky:
module Build
class MovementVideo < ApplicationRecord
include Attachable
belongs_to :movement, touch: true
end
end
Keep consistent names, variants, and validations
The class above is so empty because I have an Attachable
concern that does all the basic attachment-y stuff:
module Attachable
extend ActiveSupport::Concern
included do
has_one_attached :file, dependent: :purge_later do |attachable|
attachable.variant :preview, resize_to_fill: [400, 400], preprocessed: true
attachable.variant :still, format: "jpg", resize_to_limit: [2000, 2000], saver: {quality: 85}, preprocessed: true
end
validate :file_seems_legit
def preview_image_representation(process: false, variant: :preview)
return if file.blank?
representation = file.representation(variant)
representation = representation.processed if process
representation
rescue ActiveStorage::Preview::UnprocessedError
nil
end
def feed_ready?
file.attached? &&
file.representation(:preview).key.present? &&
file.representation(:still).key.present?
rescue ActiveStorage::Preview::UnprocessedError
false
end
# This will enqueue a job to reprocess the variants for this visual.
# It will only do so if the variants are not already processed, unless force: true
def reprocess_variants!(force: true)
file_attachment.send(:named_variants).each do |name, named_variant|
if named_variant.preprocessed?(self) && (force || file.representation(name).key.blank?)
file_attachment.blob.preprocessed(named_variant.transformations)
end
end
end
def video?
file.attached? && file.video?
end
def media_type
if file.video?
:video
elsif file.image?
:image
end
end
def aspect_ratio
return unless file.metadata.key?(:width) && file.metadata.key?(:height)
file.metadata[:width] / file.metadata[:height].to_d
end
def file_seems_legit
if !file.attached?
errors.add(:file, "must be attached")
elsif !file.content_type.match?(/^(image|video)\//)
errors.add(:file, "must be an image or video")
elsif file.image? && file.byte_size >= 8.megabytes
errors.add(:base, "images must be smaller than 8 MB")
elsif file.video? && file.byte_size >= 1.gigabyte
errors.add(:base, "videos must be smaller than 1 GB")
end
end
end
end
This obviously does quite a few things. Notably:
- Calls
has_one_attached
. Never usehas_many_attached
. If yourMovement
needs multiple videos, it should have ahas_many
to theMovementVideo
model instead. Trust me - Names the attachment
:file
. It has been extremely nice knowing that I can rely on the assumption that every single attachment across dozens of attachment types is namedfile
- Defines a consistent set of variants and schedule them to be pre-processed asynchronously
- Validates basic presence, type, and size rules
- Can quickly answer whether it's an image or video
feed_ready?
provides the very useful answer of saying whether or not the attachments variants have been processed. Since this is asynchronous and (for large videos, especially) can be slow, this can allow the UI to skip unprocessed attachments and avoid the risk of triggering synchronous processing on the application server (which would be very bad)
For most apps, it's unlikely any of this stuff has a valid reason to be different from attachment to attachment and the risk of duplicating all this everywhere is that you forget something important (like a size limit validation).
Turn on Direct Upload
You almost certainly want to utilize Direct Upload so that you don't have application servers clogged waiting for a user's crappy connection to finish uploading an 8GB video.
And if you do enable direct upload, you should realize that if validation fails when persisting the model after a form submission, the uploaded file will be orphaned by default upon re-render. This is bad. So to avoid it, you can do this weird thing I hacked together to render a hidden input and a file input with the exact same name.
Here's my _direct_upload_file_field.html.erb
partial, which will ensure the file is still attached upon resubmission while still allowing the user to choose a different file:
<%# This is some real horseshit. If you upload a file with direct_upload and
validation/save fails server-side, it's up to you to check on re-render
for a blob ID and re-embed it in the form in a hidden field. Mercifully it
works even though two inputs are named the same thing (this and the file field). %>
<%= f.file_field name, direct_upload: true, **local_assigns[:input_options] %>
<% if f.object.new_record? && f.object.send(name).blob.present? %>
<%= f.hidden_field name, id: nil, value: f.object.send(name).blob.signed_id %>
<% unless local_assigns[:hide_reassurance] %>
<span class="font-bold text-danger">
Don't worry we haven't lost your upload of <span class="font-mono"><%= f.object.file.blob.filename %></span>, you don't need to to upload it again
</span>
<% end %>
<% end %>
Remember to includes
everything
If you don't love N+1 queries, you're going to want to get in the habit of auditing every route for cases where attachments are referenced. Prosopite seems good.
You get a magic scope for each attachment, and since I name mine file
, it's always something like MovementVideo.with_attached_file. Of course, since I'm usually loading several layers of nested models, this helper isn't very useful without overly precious use of Arel's merge
method, so I wound up writing my own helper to assemble all the deeply-nested hashes I need to pass to includes
:
# app/lib/includes_hashes.rb
module IncludesHashes
# Extracted from with_all_variant_records
# https://github.com/rails/rails/blob/f4a9b7618fc32f0d3b2c0ff03a3f34f4964cc553/activestorage/app/models/active_storage/attachment.rb#L45
INCLUDES_WITH_ATTACHED_FILE = {
file_attachment: {
blob: {
variant_records: {image_attachment: :blob}
}
}
}.freeze
INCLUDES_WITH_ATTACHED_FILE_PREVIEW = {
file_attachment: {
blob: {
preview_image_attachment: {blob: {variant_records: {image_attachment: :blob}}}
}
}
}.freeze
def self.includes_with_attached_file_hash(image: false)
INCLUDES_WITH_ATTACHED_FILE.deep_merge(
image ? {} : IncludesHashes::INCLUDES_WITH_ATTACHED_FILE_PREVIEW
)
end
end
This way, I can assemble an includes
like this one to eager-load my attachments and variants without risking an N+1:
Movement.includes(
video: IncludesHashes.includes_with_attached_file_hash,
thumbnail: IncludesHashes.includes_with_attached_file_hash(image: true),
)
Generate links to a CDN and bypass your app server
Using the proxy or redirect servers in production seems like a Bad Idea™. Proxying will block your application server and redirecting will result in bizarro Safari bugs, which will flip the fuck out when you have a page of 100 images that are all resolved by redirect.
Personally, my preferred approach is to generate URLs that point to a CDN of the assets. I use Amazon Cloudfront and configure it to be backed by an Amazon S3 bucket. There is a correct and arcane to configure all this, but what do I look like, a DevOps consultant?
On the Rails side, there's almost nothing to do, since your server won't be serving assets. I just defined a route like this 8 months ago and forgot about it until I sat down to write this:
# config/routes.rb
direct :public_cdn do |representation, options|
if Rails.configuration.active_storage.service == :amazon
"https://#{ENV["CDN_HOST"]}/#{representation.key}"
else
url_for(representation)
end
end
This way, I can generate URLs with public_cdn_url(attachment_holder.file)
that
will resolve correctly in both development and production.
Variant processing is non-trivial
Processing preview images and variants requires non-trivial amounts of bandwidth and computational resources. You need vips
and ffmpeg
by default. On Heroku, that means adding the activestorage-preview buildpack.
You will run into issues
I ran into about a dozen Active Storage bugs over the course of working with it for a few months. Of these, I worked around almost all of them and only fixed one or two, because I'm a busy person.
My general advice is to be more paranoid than usual when Active Storage is involved:
- Have a backup strategy in place for your assets. I've already needed mine once after a pre-production background worker wound up purging production assets
- Audit how much space you're consuming on your cloud provider, looking for (and potentially purging) blob keys that are no longer known to your database
- Exploratory test regularly in production
Some of these issues are due to the fact that Active Storage isn't widely used. Some are because the code has experienced a lot of churn, particularly where it tries to distinguish images and non-image previewable attachments (e.g. PDF and video). Some are because the problem domain is inherently resource-intensive and rife with failure states. GLHF.