Primary key, unique constraint or unique index?The SolutionNote: Multiple null values are not equal, so they are not considered as a duplicate record.
Postgres automatically creates a unique index in the table when a primary key and unique constraint is defined in the table.
Creating unique constraint then would be redundant, and unnecessarily creating indexes degrades the performance of Postgres.
According to suggestions by the Postgres product team, create a unique constraint on the table and then there is no need to create a unique index on those columns.
Postgres creates an index for the defined primary key itself.
When we create a unique constraint, Postgres automatically creates an index behind the scene.
However, there are cases where even indexing can’t improve performance.
One such case is when we do case-insensitive searches.
Let’s understand the difference between the query cost in case of case sensitive and case insensitive search in our scheme table.
Given we have an index on the column scheme_name.
EXPLAIN ANALYSE SELECT * FROM schemes where scheme_name = 'weekend_scheme'QUERY PLAN | Index scan using idx_scheme_name on schemes (cost=0.
29 rows=1 width=384)Planning Time: 0.
155 msExecution Time: 0.
063msEXPLAIN ANALYSE SELECT * FROM schemes where lower(scheme_name) = 'weekend_scheme'QUERY PLAN | Seq Scan on schemes (cost=0.
00 rows=5 width=384)Filter: (lower((scheme_name) :: text) = ‘weekend_scheme’ :: text)Rows removed by filter: 999Planning Time: 0.
119 msExecution Time: 0.
721msEven though we have an index created at scheme_name, the function lowerdegrades the performance as it does an additional effort of converting all the values of scheme_table to lower case.
Cases when an index is not used (although it is defined).
LIKE ‘%scheme’will never use an index, but LIKE ‘scheme%’ can possibly use the index.
The upper/lower case function used in where clause.
So whenever we want to use a function in our where clause, we could create the index in the following way to optimise the query.
CREATE INDEX idx_scheme_name ON schemes (lower(scheme_name))EXPLAIN ANALYSE SELECT * FROM schemes where lower(scheme_name) = 'weekend_scheme'QUERY PLAN | Bitmap heap scan on schemes ((cost=4.
83 rows=5 width=384))Recheck cond: (lower ((scheme_name) :: text) = ‘weekend_scheme’ :: text)Heap Blocks: exact=1Bitmap scan on schemes ((cost=0.
32 rows=5 width=0))Index cond: (lower ((scheme_name) :: text) = ‘weekend_scheme’ :: text)Planning Time: 1.
784 msExecution Time: 0.
079 msPartial IndexPostgres supports an index over a subset of the rows of a table (known as a partial index).
It’s often the best way to index our data if we want to repeatedly analyse rows that match a given WHERE clause.
Let us see how we can enhance the performance of Postgres using partial indexing.
Problem statementWe want to return all the schemes which are supposed to run before 11:00 am.
EXPLAIN ANALYSE SELECT * FROM schemes WHERE start_time < '10:00:00'QUERY PLAN | Seq Scan on schemes (cost=0.
50 rows=9 width=23)Filter: (start_time < ’10:00:00')Rows removed by filter: 991Planning Time: 0.
082 msExecution Time: 0.
226msWe can create an index on start_time column but assuming we have a huge database, this may not be optimal for insert, update and delete.
So we create an index with a condition.
This kind of indexing is used when we know what we need from our select queries.
Say we do a heavy read on all the schemes which are started before 10:00:00 and not much when started later.
CREATE INDEX idx_scheme_name ON schemes start_time WHERE start_time < '11:00:00'EXPLAIN ANALYSE SELECT * FROM schemes WHERE start_time < '10:00:00'QUERY PLAN | Bitmap heap scan on schemes ((cost=4.
30 rows=9 width=23))Recheck cond: (start_time < ’10:00:00')Heap Blocks: exact=8Bitmap index scan on schemes ((cost=0.
21 rows=9 width=0))Index cond: (start_time < ’10:00:00')Planning Time: 1.
729 msExecution Time: 0.
075 msThis reduces the execution time from 0.
Let’s validate that we have not indexed all the schemes where start_time is after 11:00 AM.
EXPLAIN ANALYSE SELECT * FROM schemes WHERE start_time >'12:00:00'QUERY PLAN | Seq Scan on schemes (cost=0.
50 rows=6 width=23)Filter: (start_time < ’12:00:00')Rows removed by filter: 993Planning Time: 0.
101 msExecution Time: 0.
228msThis proves that partial data from schemes table is indexed and the rest of the data is not indexed.
Our index size is very small and easy to maintain, helping in the maintaining task of reindexing.
Query plan on JOINSThe optimizer needs to pick the correct join algorithm when there are multiple tables to be joined in the select statement.
Postgres uses 3 different kinds of join algorithm based on the type of join we are using.
Nested Loop: Here, the planner can use either sequential scan or index scan for each of the elements in the first table.
The planner uses a sequential scan when the second table is small.
The basic logic of choosing between a sequential scan and index scan applies here too.
Hash Join: In this algorithm, the planner creates a hash table of the smaller table on the join key.
The larger table is then scanned, searching the hash table for the rows which meet the join condition.
This requires a lot of memory to store the hash table in the first place.
Merge Join: This is similar to merge sort algorithm.
Here the planner sorts both the tables to be joined on the join attribute.
The tables are then scanned in parallel to find the matching values.
EXPLAIN SELECT schemes.
rules FROM scheme_rules JOIN schemes ON (scheme_rules.
scheme_id = schemes.
id ) where scheme_name = 'weekend_scheme';Downsides of indexes in production environmentsFinding unused indexesIn a large production environment, finding unused indexes is advisable, because indexes eat memory.
Postgres wiki page details how we can find index summary, duplicate indexes, and index size.
CREATE/DROP index vs CREATE/DROP index concurrentlyCreating and dropping an index in a large database can take hours or even days and the CREATE INDEX command blocks all the writes on a table (it doesn’t block the reads, but this is still not ideal).
However, an index created concurrently withCREATE INDEX CONCURRENT will not acquire locks against writes.
When creating index concurrently, Postgres first scans the table to build indexes and runs the index once again for the things to be added since the first pass.
Creating an index concurrently also has a downside though.
If something goes wrong during the process, it does not roll back, and leaves an invalid index behind.
Invalid indexes can be found using the following query.
SELECT * FROM pg_class, pg_index WHERE pg_index.
indisvalid = false AND pg_index.
indexrelid = pg_class.
oid;Rebuilding indexesREINDEX rebuilds an index using the data stored in the index table, replacing the old copy of the index.
If we suspect corruption of an index on a table, we can simply rebuild that index, or all indexes on the table, using REINDEX INDEXor REINDEX TABLEREINDEX is similar to a drop and recreate of the index in that the index contents are rebuilt from scratch.
However, locking considerations are rather different.
REINDEX locks out writes but not reads of the index’s parent table.
It also takes an exclusive lock on the specific index being processed, which will block reads that attempt to use that index.
Another option is to drop index concurrently and create again concurrently.
ConclusionThis post aimed to provide an overview on how Postgres queries the database.
By understanding query plans better and carefully taking measures (mostly through indexes), we can get the best performance out of the Postgres database.
There are other ways to enhance query performance, but we’ll save those for later posts.
Further ReadingIndex locking considerationLocking indexesRoutine reindexingExamining index usageMonitoring statsEfficient, optimised, lean… These are words you’ll often hear at our offices.
At GOJEK, getting something to work is only half the battle.
Getting it to work as efficiently as possible without minimum intervention, that’s when we do the high fives.
We are also looking for more people to join out teams.
Head over to gojek.
jobs, and grab the chance to help us build better, more efficient solutions.