ActiveRecord - Part 2: Useful methods
In the first part, we discussed the "N+1 queries" problem and how to deal with it using Rails' ActiveRecord.
In this part, we will discuss some other useful ActiveRecord methods that can help us achieve some results faster or just in a more elegant way.
#map vs #pluck
Suppose we want to fetch the email addresses of all our clients. We can do it this way:
What happens behind the scenes in such case? Well, in simplification:
1. Rails fetches the data from the database using the following query:
SELECT "clients".* FROM "clients"
2. From the acquired data, a collection of Client objects is created
3. Ruby goes over the whole collection, takes the email property from each object and stores it in the new collection (an array)
4. The newly created array is returned
Now, let's move the responsibility of fetching only client email addresses from Ruby straight to database. We can achieve that by using the #pluck method:
In this case, the following query will be executed
SELECT "clients"."email" FROM "clients"
and voila, we can observe the same result.
The difference is that in the first example, we fetched all the data and then Ruby had to process it. In the second example, we told the database to fetch only the email column/attribute.
From the table below, you can see that the database coped with that task much faster.
|(1000 clients in the database)|
The argument(s) (yes, we can fetch multiple attributes at once) that we passed to the #pluck method are nothing but names of columns in the database. For this reason, we cannot use this method to map records, for example, by using a custom method in our model.
class Client < ApplicationRecord ... def visible? [true, false].sample end end
For obvious reasons, we cannot invoke the #pluck method with the visible? argument, simply because there is no visible? column in the database table. In this case, we can only use the #map method.
Another useful thing
A typical use case of the #pluck method is when we want to retrieve a list of IDs (primary keys) from a database.
We can do this by writing even less code and using the #ids method:
Assume that we want to know the sum of all orders in our store.
There are two simple ways of achieving that:
1) Order.sum(:total) 2) Order.sum(&:total)
These two code lines look very similar. The difference is that in one case, we pass a simple symbol as an argument, and the other time, we pass a block. The difference in execution is significant. The following queries will be executed
1) SELECT SUM("orders"."total") FROM "orders" 2) SELECT "orders".* FROM "orders"
As we can see, in the first case, we transferred the responsibility of calculating the sum of orders onto the database.
The second case is more complicated. First, Rails fetches all orders from the database. A collection of Order objects is created. Then Ruby maps the collection – it goes over all items and takes the total property from them. We end up with an array containing the total attribute from each order. Now, ruby calls the #sum method on the created array, which results in the desired value.
Again, as you can see in the table below, the database does it much faster.
|(5000 orders in the database)|
THIS IS NO MORE VALID FOR RAILS 5.x. In Rails 5.x, there is no static .uniq method for ActiveRecord::Base class.
Another interesting case that shows how important the order of executing methods is. Suppose that we want to list all email addresses of clients who made at least one order in our shop. This list should contain unique values, so that if someone has made more than one order, their email should appear only once.
Okay. There are two simple ways of achieving that:
1) Order.pluck(:email).uniq 2) Order.uniq.pluck(:email)
As you can see, these examples differ only in the order in which we called #pluck & #uniq methods. In both examples, the database will be responsible for fetching only email addresses (so no extra data that Ruby would have to map). So where is the difference? Please, look on queries that will be executed in each of these cases:
1) SELECT "orders"."email" FROM "orders" 2) SELECT DISTINCT "orders"."email" FROM "orders"
Here is the difference. In the second case, the database is responsible for filtering unique values (by using the DISTINCT directive). In the first case, Rails fetches all email addresses, creates an array from them, and then calls the Array#uniq method on it.
The table below shows difference in performance:
|#uniq - Array#uniq||0.0762|
|#uniq - DISTINCT||0.0197|
|(5000 orders in the database)|
This method is used for... updating records in bulk :) Very useful, but not without flaws.
Imagine that we have some orders in our shop system. We want to identify all orders that have not been finalized (i.e. they remain in the cart state) and have not been updated for a week. Then we have to set the inactive flag to true on all such orders.
The first idea for doing this is:
Order.where(state: 'cart').where('updated_at < ?', 7.days.ago).each do |order| order.update(inactive: true) end
And of course, it works. But let's analyze the cost of the solution. One query is executed to fetch all orders that match given criteria, and n queries get executed to update each order. Of course, n is the number of orders returned by the first query. The cost is quite high, especially when we have a lot of records to update.
ActiveRecord gives us a solution for that too. It's the #update_all method:
Order.where(state: 'cart').where('updated_at < ?', 7.days.ago).update_all(inactive: true)
All the records will be updated in a single query.
Caveats? Of course, there are some. By using the #update_all method, we create a raw SQL query and send it straight to the database, which means that this method omits all the callbacks set on our model: validations, after_save, etc. Because of that, we have to be very careful using this method if we depend on such callbacks to a large extent. Otherwise, we can end up with data inconsistencies in our app.
Looking at all these tables comparing performance, we can reach a simple conclusion: the database is always faster than Ruby in retrieving, parsing, or processing data. So should we always ask the database to do such things? Of course not. As always, there is no silver bullet. The immortal statement: "IT DEPENDS" is the valid here too.
Can we afford to lock the database for a long time instead of making a simpler request and letting Ruby run the more complex operations? Of course, Ruby will take longer to process the same query, but at the same time, the database will available for performing other operations.
The answer to the question above and other related questions will be different depending on your app's logic, the applied architecture, or hardware capability.
What we need to know is that ActiveRecord is fairly flexible. It does many things for us based on certain conventions, but at the same time, it allows us to override these default conventions and do things the way we want. We have real control over how the active record performs its actions.
Measurements were made on Ruby 2.5.0 and Ruby on Rails 5.2.0