The Secret to Scaling Ruby on Rails Apps: Battle-tested Solutions

Photo of Kacper Rafalski

Kacper Rafalski

Jul 17, 2025 • 21 min read

Traffic spikes can turn a perfectly functioning Ruby on Rails application into a nightmare overnight.Your MVP might handle a few dozen users without breaking a sweat, but then an influencer's tweet brings 10,000 visitors in a single hour.When your application slows down or crashes during this peak usage, users lose confidence and might never return.

Rails applications need a strategic approach to handle growth effectively. Building a scalable Rails architecture is like constructing a skyscraper—you need solid foundations that can support not just today's load, but tomorrow's expansion. Developers expecting user growth beyond 10,000 active users face particular challenges in maintaining performance under increasing demand. What stands in their way to success? Often, it's the lack of proper caching mechanisms to avoid computing the same things repeatedly and the failure to move time-consuming tasks to background workers.

This guide presents battle-tested solutions for Ruby on Rails scalability, from performance monitoring tools like AppSignal to database optimization techniques. Whether you're running on a Digital Ocean Droplet with Dokku or managing a more complex infrastructure, these strategies will help ensure your application performs reliably as user requests per minute increase.

Start with Monitoring and Profiling

You can't fix what you don't measure. This principle becomes especially critical when scaling Rails applications, where performance bottlenecks often hide in places you'd least expect. Performance monitoring provides the roadmap for your optimization efforts, helping you invest resources where they'll have the greatest impact.

Use tools like AppSignal or New Relic

Several specialized tools exist for monitoring Ruby on Rails applications, each offering unique features to help pinpoint performance issues:

AppSignal provides tailored performance insights specifically for Rails apps, with features including exception tracking, anomaly detection, and comprehensive logging. This combination gives developers 360-degree visibility into their application's performance, allowing them to scale confidently without compromising stability.

New Relic supports Rails versions 2.1 through 6.x and automatically measures render time of ERB and Haml templates while exposing bottlenecks like slow queries triggered from views. Beyond just counting database calls, New Relic captures slow queries and reports the underlying database's query analysis, making it simple to identify missing indexes and verify improvements.

Scout offers similar functionality, providing insights into where performance bottlenecks occur and empowering developers to make data-driven optimization decisions.

Identify slow endpoints and bottlenecks

Once monitoring tools are in place, you can systematically identify where your application is struggling. Transaction monitoring serves as an excellent starting point. New Relic's transaction monitor allows you to select a broad time range (like 7 days) and examine captured transaction traces. Look for transactions with long-running queries or numerous database calls—these often indicate optimization opportunities.

The Rails Logger itself provides valuable insights into execution times for different application components. Examining these logs helps developers uncover slow endpoints and methods requiring optimization.

N+1 queries represent one of the most common performance killers in Rails applications. These occur when your application makes multiple database queries to load associated records instead of using a single efficient query. Tools like AppSignal can visually highlight these problems—for example, showing when an endpoint hits Memcached 62 times followed by 47 database queries.

Measure before scaling

The golden rule of scaling: profile first, optimize second. A systematic approach to measurement can yield significant performance improvements without requiring additional infrastructure.

When users experience latency, pinpointing the source becomes essential. Various factors contribute to sluggishness, from inefficient database interactions to poor indexing strategies. Measuring first creates a performance baseline against which you can compare your optimizations.

Development environments often differ dramatically from production. One developer successfully reduced average memory usage from 85MB to 7MB and average request duration from 3000ms to 150ms simply by identifying and fixing performance issues through profiling.

Rack Mini Profiler stands out as a particularly useful tool, providing database and memory profiling in real-time. It saves profiler output from previous requests and injects a badge into the next HTML page loaded, offering continuous feedback on your optimization efforts.

Before scaling your infrastructure, ensure you've thoroughly monitored your application's performance. As one expert notes, "Before you decide to rewrite your app from scratch in Rust, it's worth taking a step back and profiling your application". This approach not only saves resources but often reveals simple optimizations that can dramatically improve performance.

Optimize Performance with Caching

Caching represents one of the most effective techniques for scaling Ruby on Rails applications without adding hardware resources. When traffic increases, repetitive operations quickly become bottlenecks that drag down your entire application. Storing precomputed data for reuse eliminates these performance killers.

Fragment and view caching

Fragment caching allows developers to cache specific portions of a view rather than entire pages. This approach works particularly well for pages containing both static and dynamic elements, where Rails can serve cached fragments while still generating fresh content for dynamic sections.

Consider frequently accessed UI components like headers, footers, or product cards. Fragment caching can dramatically reduce rendering times:

<% cache [@product] do %>
<%= render "product", product: @product %>
<% end %>

This code generates a unique cache key based on the product's ID and updated_at timestamp. When any attribute of the product changes, the cache key automatically changes too, invalidating outdated content.

Russian Doll caching builds upon fragment caching through nested caches. This technique enables granular invalidation - when a single product's price changes, only that specific fragment needs regeneration while parent fragments remain valid:

<% cache [@product] do %>
<%= render "details", product: @product %>
<% cache [@product, "price"] do %>
<%= render "price", product: @product %>
<% end %>
<% end %>

This pattern proves especially effective for collections with numerous items where only a few might change between requests.

Low-level value caching

Low-level caching provides direct access to Rails' cache store, offering maximum flexibility for caching arbitrary data beyond view fragments. This approach excels when dealing with expensive computations, API responses, or database query results.

The Rails.cache.fetch method forms the foundation of low-level caching, elegantly combining both reading and writing operations:

def competing_price
Rails.cache.fetch("#{cache_key_with_version}/competing_price", expires_in: 12.hours) do
Competitor::API.find_price(id)
end
end

This method first attempts to retrieve the cached value. If none exists (or it has expired), it executes the block, stores the result, and returns it. This pattern works perfectly for operations involving external APIs or complex calculations that don't need real-time accuracy.

For effective cache management, consider these strategies:

  • Use model-based cache keys that automatically invalidate when records change
  • Set appropriate expiration times based on data volatility
  • Consider dependencies between cached items

Choosing the right cache store (Redis, Memcached)

Rails supports several cache stores, but Redis and Memcached dominate production environments due to their performance and reliability.

Redis offers versatility with support for multiple data types (strings, lists, sets, hashes) and persistence options. Its ability to store complex data structures makes it ideal for session management, real-time data operations, and more sophisticated caching needs. Redis also provides sub-millisecond latency, replication capabilities, and data partitioning via sharding for horizontal scalability.

Memcached focuses exclusively on being a high-performance, distributed memory caching system. It excels at caching strings and simple data types with minimal overhead. Memcached's multithreaded architecture allows it to handle multiple operations in parallel across CPU cores, making it highly efficient for read-heavy workloads.

Performance comparisons indicate Redis typically outperforms Memcached in both read and write operations by approximately 1.5x. Furthermore, Redis offers additional benefits like optional disk persistence and support for transactions, neither of which Memcached provides.

To implement either solution, configure your application in config/environments/production.rb:

# For Redis
config.cache_store = :redis_cache_store, { url: "redis://localhost:6379/0" }

# For Memcached
config.cache_store = :mem_cache_store, "cache-1.example.com", "cache-2.example.com"

The choice between Redis and Memcached depends on your specific requirements. If you need only basic caching functionality, Memcached might be sufficient. However, if you anticipate needing additional features beyond simple caching, Redis typically offers greater long-term flexibility.

Use Background Jobs to Offload Work

Background processing represents a critical strategy for scaling Ruby on Rails applications by moving time-consuming tasks away from the main request-response cycle. As applications grow, this approach becomes increasingly necessary to maintain responsiveness and efficiency.

When to use background jobs

Background jobs shine in multiple scenarios where immediate processing isn't essential. They're ideal for sending emails, processing images, generating reports, and performing data synchronization with third-party services. Any operation taking longer than a second in controller actions becomes a prime candidate for background processing.

Consider offloading these tasks:

  • Time-consuming operations that don't need immediate user feedback
  • Regular maintenance tasks like database clean-ups
  • Resource-intensive tasks such as logging and tracking
  • Operations reading large amounts of data for reporting purposes
  • Tasks involving external services with potential latency issues

Moving these operations to background jobs frees web servers to handle more user requests, resulting in better overall application performance.

Rails provides several options for background job processing, each with distinct advantages:

Sidekiq utilizes a multi-threaded approach where jobs are processed within the same process using multiple worker threads. This allows for better resource utilization and faster job processing compared to alternatives. Sidekiq relies on Redis for storing job queues and provides built-in support for retrying failed jobs with backoff strategies.

Resque, created by GitHub, uses a multi-process approach where each job is processed by a separate worker process. Although potentially more resource-intensive, this isolation ensures one job's failure doesn't affect others. Like Sidekiq, it depends on Redis for job management.

Delayed Job, extracted from Shopify, uses your existing database to store background jobs. Many teams choose it because of this simplicity, although performance benchmarks indicate Sidekiq processes jobs approximately 30x faster.

Rails' built-in Active Job framework provides a standardized interface that works with all these backends, allowing developers to switch between them without changing application code.

Scaling background workers independently

The major advantage of background processing lies in its scalability options. Background workers can be scaled independently from web servers, offering significant flexibility as application demands grow.

Several scaling approaches exist:

  1. Horizontal scaling - Deploy multiple Sidekiq worker instances to distribute workload
  2. Vertical scaling - Increase resources allocated to Redis instances
  3. Dedicated Redis instances - Use separate Redis servers solely for background job processing
  4. Queue-specific workers - Deploy distinct Sidekiq workers per queue, allowing granular resource allocation

For high-volume applications processing millions of jobs, combining these approaches yields optimal results. Companies like Shopify implement multi-layered strategies, including both vertical scaling of Redis instances and horizontal scaling of worker processes with autoscaling.

Understanding these options ensures your Ruby on Rails application maintains performance as it grows from handling a few users to supporting thousands of concurrent connections.

Improve Database Performance for Scalability

Database performance often becomes the first casualty as Rails applications grow. Even well-architected applications can grind to a halt when data volume increases without proper optimization strategies.

Add indexes to speed up queries

Indexes serve as vital data structures that help databases quickly locate and retrieve records, improving query performance from O(n) to O(log(n)). The impact becomes dramatic as tables grow—a query that takes milliseconds on a thousand records might take seconds on a million records without proper indexing.

Strategic indexing approaches include:

  • Adding indexes on foreign keys to accelerate association lookups
  • Using composite indexes for queries filtering on multiple columns
  • Implementing partial indexes that only cover specific subsets of data

There's a tradeoff here that many developers overlook. Indexes increase storage requirements and can slow write operations. Always measure performance before and after adding indexes to verify improvements rather than assuming they'll help.

Avoid N+1 queries with eager loading

N+1 queries represent one of the most insidious performance killers in Rails applications. They occur when your application makes one query to retrieve parent records, then makes additional queries for each child record. Development environments often mask this problem because small datasets don't expose the performance penalty.

ActiveRecord provides several solutions:

  • includes: Preloads associations using separate queries
  • joins: Uses SQL JOIN statements for more complex filtering
  • preload: Loads associations in separate queries without immediate joining

A simple change from @posts = Post.all to @posts = Post.includes(:comments).all transforms N+1 queries into just two database hits. The difference becomes significant when dealing with hundreds or thousands of records.

Use read replicas and sharding

Database load distribution becomes essential as your application scales beyond single-server capacity. Read replicas allow you to offload query traffic from your primary database, freeing it to handle writes. Rails 6+ provides native support for database switching between writers and replicas.

Database sharding splits data across multiple databases based on criteria like user_id or account_id. This horizontal scaling approach proves valuable when working with massive datasets that exceed single-server capacity, though it adds complexity to your application architecture.

Monitor PostgreSQL with pg_stat_statements

The pg_stat_statements module tracks planning and execution statistics for all SQL statements. Once enabled, it provides insights into:

  • Total execution time of queries
  • Number of calls per statement
  • Rows affected or retrieved
  • I/O operations performed

This data helps identify problematic queries that need optimization, enabling proactive performance tuning. Without this visibility, you're essentially flying blind when it comes to database performance.

Infrastructure and Deployment Strategies for Scaling

Your perfectly optimized Rails code means nothing if your infrastructure can't handle the traffic. Even the most elegant caching strategies and background job implementations need proper hardware and deployment configurations to support growing user demands.

Vertical vs horizontal scaling in practice

Vertical scaling means throwing more resources at your existing servers—more RAM, faster CPUs, or better storage. This approach works well for predictable growth and is often the first scaling method teams implement. It's simple to understand and execute. However, vertical scaling eventually hits a wall where additional hardware upgrades become either technically impossible or financially impractical.

Horizontal scaling takes a different approach by distributing work across multiple servers. This method operates on a three-tier architecture: load balancer, web app instances, and database instances. What makes horizontal scaling particularly attractive is its flexibility—you can use different server types for different operations. Less powerful servers can handle image processing, while more powerful ones tackle resource-intensive tasks.

Use load balancers for traffic distribution

Load balancers act as traffic directors, routing incoming requests across your application servers. Nginx, commonly used with Rails applications, requires only a medium-powered server yet effectively filters and distributes traffic. When properly configured, load balancers prevent any single server from becoming overwhelmed, improving overall application availability.

Think of load balancers as the bouncer at a popular restaurant—they ensure customers get seated at available tables without overwhelming any single server.

Auto-scaling with AWS, Heroku, or Kubernetes

Auto-scaling automatically adjusts server resources based on real-time demand. On Heroku, Rails Autoscale monitors request queue time and job queue time to scale both web and worker dynos accordingly. AWS offers Elastic Load Balancing for distribution and auto-scaling groups that respond to CPU utilization triggers. Kubernetes provides Horizontal Pod Autoscaler that automatically adjusts replicas based on defined metrics like 70% CPU utilization.

The beauty of auto-scaling lies in its responsiveness—your application can handle traffic spikes without manual intervention, then scale back down when demand decreases.

Choose the right deployment tool (Dokku, Kamal, Capistrano)

Dokku offers Heroku-like simplicity on your own infrastructure, making it ideal for smaller applications. Kamal, described as "Capistrano for Containers," streamlines deployments without server preparation—simply add an Ubuntu server to your configuration list. Capistrano provides extensive customization options through its script-based approach. Kamal particularly excels at asset bridging between deployments, ensuring static files are handled without downtime.

Which deployment tool should you choose? It depends on your team's expertise and infrastructure complexity. Dokku works well for straightforward deployments, while Kamal offers more flexibility for containerized applications.

Conclusion

Rails applications face real challenges when traffic suddenly multiplies. The strategies outlined in this guide provide a systematic path from struggling with performance issues to confidently handling growth.

Performance monitoring forms the foundation of any scaling effort. Tools like AppSignal or New Relic reveal where your application actually struggles, not where you think it struggles. This data-driven approach prevents wasted optimization efforts and ensures resources target genuine bottlenecks.

Caching delivers immediate performance gains without additional infrastructure costs. Fragment caching, Russian Doll patterns, and low-level value caching each solve different problems depending on your application's specific needs. The choice between Redis and Memcached significantly impacts overall performance as traffic grows.

Background job processing keeps your application responsive when traffic spikes occur. Moving time-consuming tasks away from the request-response cycle allows web servers to handle more user requests. Tools like Sidekiq, Resque, and Delayed Job each offer distinct advantages for different scaling requirements.

Database performance often determines whether scaling succeeds or fails. Proper indexing, eager loading to prevent N+1 queries, and strategic use of read replicas all contribute to database efficiency. Monitoring your database with tools like pg_stat_statements provides crucial insights for ongoing optimization.

Infrastructure decisions underpin every successful scaling effort. The balance between vertical and horizontal scaling, effective load balancers, and appropriate deployment tools creates a foundation that supports growth. Auto-scaling solutions from AWS, Heroku, or Kubernetes enhance your application's ability to handle traffic spikes without manual intervention.

Scaling a Ruby on Rails application doesn't have to be overwhelming. This systematic approach—monitoring, caching, background processing, database optimization, and infrastructure planning—creates a clear roadmap for growth. Rails continues to prove itself capable of supporting applications serving thousands or even millions of users when these battle-tested solutions are properly implemented.

Photo of Kacper Rafalski

More posts by this author

Kacper Rafalski

Kacper is an experienced digital marketing manager with core expertise built around search engine...

Build impactful web solutions

Engage users and drive growth
Start today

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business