Cover Image for SQL query optimization with Index Only Scan

SQL query optimization with Index Only Scan

Indexing is a useful technique to speed up database queries. Almost every database management system offers an indexing mechanism, no matter the database type: relational, document-oriented, key-value, graph-based... whatever. In this post, we will focus on PostgreSQL.

PostgreSQL automatically creates an index on every single table primary key. For instance, if we create a 'users' table with two columns, ID (primary key) and email, it will create an index on the former:

postgres=# create table users(id serial primary key, email varchar(40));
CREATE TABLE

postgres=# select tablename, indexname, indexdef from pg_indexes where tablename = 'users';
  users | users_pkey | CREATE UNIQUE INDEX users_pkey ON public.users USING btree (id)

If we make a query searching for a non-indexed column (email, in this example), no index will be used, but full table scan (Seq Scan) will be performed (we must avoid this like the plague):

postgres=# explain analyze select * from users where email = 'user_40@gmail.com';        
 Seq Scan on users  (cost=0.00..17.75 rows=3 width=102) (actual time=0.025..0.097 rows=1 loops=1)
   Filter: ((email)::text = 'user_40@gmail.com'::text)
   Rows Removed by Filter: 499
 Planning Time: 0.067 ms
 Execution Time: 0.165 ms

However, if we search by ID, the query planner indicates that an Index Scan will be used:

postgres=# explain analyze select * from users where id = 40;
 Index Scan using users_pkey on users  (cost=0.27..8.29 rows=1 width=23) (actual time=0.010..0.026 rows=1 loops=1)
   Index Cond: (id = 40)
 Planning Time: 0.134 ms
 Execution Time: 0.082 ms

Here, PostgreSQL is querying the index and then querying the table in order to fetch the email. This is obviously faster than Seq Scan, but there is a way to get a better performance: including the email in the index. This way, PostgreSQL doesn't need to fetch the "extra data" from the table because it's already present in the index itself. This is a Index Only Scan query:

postgres=# create index id_idx on users(id) include(email);
CREATE INDEX

postgres=# explain analyze select * from users where id = 40;
 Index Only Scan using id_idx on users  (cost=0.27..8.29 rows=1 width=23) (actual time=0.048..0.074 rows=1 loops=1)
   Index Cond: (id = 40)
   Heap Fetches: 1
 Planning Time: 0.072 ms
 Execution Time: 0.079 ms

Of course, results may vary depending on the number of rows in the table. When you have a non-indexed column that normally it's included in your query, you should consider include it in the index as well.

(NOTE: In examples above, SELECT * has been used. Since users table has only ID and email columns, it makes sense in this case. However, you should never include all the table columns in your query. That's another performance tip.)


More posts...