Most startups have been there - you have a simple site, and you want to have users upload photos of themselves or something else to share. We were there as well just a few years ago, when building out the very first versions of TeachStreet. While previously working at Amazon, I worked on a few image hosting solutions and already knew some of the pitfalls and challenges of building out a system to scale.
Here were some of our high level requirements:
- Keep redundant copies of images in case of failure
- Allow dynamic resizing and cropping of images (so we don't have to pre-generate them)
- Must be fast (but cheap)
- Must scale independently of our core web application
- Handle request
- Fetch original source image from S3
- Resize/apply effects
- Return result back to user
Further steps/more optimization? Still, there are more steps we could take to optimize this further, if needed. First, if we know commonly requested image sizes & effects, we could prime the cache on image upload. This would avoid the extra lookup to S3 except in a failure case. If our caches begin to get very large (as we scale), we could use the dns to map to different servers, even increasing the number of dns entries for servers (modding out to a larger set), or routing to different servers based on url (for different image sizes/etc). Right now, most of our users are in the US. If we had an international site, we might consider using different S3 backends for storage (in Singapore, Hong Kong, Japan, or Europe), as well as using a CDN to front images. Generally speaking, CDNs are quite expensive for scrappy little startups like us. We could even consider using Amazon Cloudfront as our CDN. Alternatives? Other alternatives we've seen to this problem have varied. Paperclip is a great plugin that provides much of the same functionality, but doesn't provide the on-the-fly resizing, and is usually applied to a database model (our solution relies on external guids for each image). Cassandra (or MongoDB with GridFS) could also be an alternative backend for S3 if the latency on non-cached requests needs further improvement.