Partial Indexing In PostgreSQL

In this blog, we are going to learn about Partial unique indexing in PostgreSQL and Rails. But before that, I think you all should know about PostgreSQL Duplicate Null Values in Unique Column which I had written in my previous blog.

So, first of all what is Partial Indexing ????

According to the PostgreSQL documentation,

partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate.

In simple terms, it is process of adding unique index only on particular portion of database records, and that portion is defined by some condition.

So, in order to understand this, let’s take a scenario, where we have users table and we have email column, whose value must be unique. It means no two users can have same email, and we are deleting these users using soft deletion mechanism. Soft Deletion means deleting user from database will not remove it permanently from database, but will set deleted_at column value(current_timestamp). This column is used for categorizing deleted records from persisted ones.

Here, we have uniqueness constraint applied on email column, so if we delete one user with a email say “test@test.gmail”. And later if somehow we want to create user with same email, it will throw error, when we are using our conventional unique constraint. Because even though we have deleted user, it still persists in our database as it is soft deleted.

What is the solution then????

Okay at first, you all will think that creating unique index on the combination of these two columns(email, deleted_at) will solve the problem here.

  CREATE UNIQUE INDEX users_email_deleted_at_idx
  ON users (email, deleted_at);

But dear folks, this will not work.

WHY NOT????

Do you all remember this….?

PostgreSQL standard do not consider two null values as same, which I had already discussed in my previous blog.

  -> user1 = { "email": "test@test.com", deleted_at: null }
  -> user2 = { "email": "test@test.com", deleted_at: null }

Because of this, these user1 and user2 will be treated as different entity.

So, is there any solution for this ????

Yes, we have i.e our saviour “PARTIAL INDEX”

But, how will we implement partial index ?

In order to achieve this, we will add conditional index using “where” clause like mentioned below:

  CREATE UNIQUE INDEX users_email_deleted_at_null_idx
  ON users (email)
  WHERE deleted_at IS NULL;

Yayyy, now this will solve our all issues 🙂

BONUS: Also, we have active record helper for creating partial indexing in rails i.e

  add_index :users, :email, unique: true, where: "deleted_at is null"

Hope you liked my blog. For more information like this stay tuned ….. 🙂

Advertisements

PostgreSQL Duplicate Null Values in Unique Column

Here in this blog, I am going to tell you about some interesting behavior of PostgreSQL database. But before diving deep into it, we should be aware about terminology like Unique Constraint.

Unique Constraint is the concept that ensures that, data stored in a column is always unique, if uniqueness constraint is applied on that column . In short, Unique indexes are the way to avoid data inconsistency and duplication.

So, what is the interesting behavior ?

Consider a scenario, where we have to do unique indexing on a particular column in an already existing table. Also, we are confident that there are no duplicate values present for that column. But we are not sure about null values i.e, it may be possible that we have duplicate null values for that column.

So, what will happen if we do unique indexing on that column, which may have duplicate null values ?

Here comes the BOOM !!!

Answer is, we’ll not get any error when we’ll do unique indexing on that column.

You all must be thinking WHY ?

So, let me tell you one interesting or you can say weird behavior of  PostgreSQL……

Do you know that duplicate null values does not violate uniqueness constraint in PostgreSQL?

Exactly that was my reaction, when I came to know about it. Ideally when we insert same values twice in the column having uniqueness constraint, it should fail. But for null values it doesn’t fail.

  testname=# CREATE TABLE Demo (name varchar unique);
  CREATE TABLE

  testname=# INSERT INTO Demo (name) VALUES ('TestName');
  INSERT 0 1

  testname=# INSERT INTO Demo (name) VALUES ('TestName');
  ERROR:  duplicate key value violates unique constraint "demo_name_key"
  DETAIL:  Key (name)=(TestName) already exists.

In the above example, it is the normal behavior that inserting same value(‘TestName’) twice in the unique column(‘name’) will fail.

  testname=# INSERT INTO Demo (name) VALUES (null);
  INSERT 0 1

  testname=# INSERT INTO Demo (name) VALUES (null);
  INSERT 0 1.

But what is happening here, why it not showing error when we are inserting duplicate value(null) twice in the unique column(‘name’) ?

Yes…. Yes I know this is kind of confusing. But this is what PostgreSQL follows, SQL standards.

According to SQL standard :
“In general, a unique constraint is violated when there are two or more rows in the table where the values of all of the columns included in the constraint are equal. However, two null values are not considered equal in this comparison. That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior conforms to the SQL standard, but we have heard that other SQL databases may not follow this rule. So be careful when developing applications that are intended to be portable.”

For more information like this, stay tuned 🙂