3 Points about Indexes and Order

When dealing with indexes, order may be very significant, in several aspects.
Perhaps it’s not surprising after all when talking about a data structure that its purpose is to keep data in order
We’ll refer to three different points:
1. Is the order of columns in a composite index important?
2. Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
3. When creating several indexes, is the order of creation important?
Not always there is one correct answer that covers all the different cases, but it is always worth asking the questions before rushing to execute.

Is the order of columns in a composite index important?
Certainly yes. Let’s take a look at the following two indexes – they both contain the same two columns, but in a different order:

CREATE INDEX T_IDX_1 ON T (COL_A,COL_B);
CREATE INDEX T_IDX_2 ON T (COL_B,COL_A);

Let’s assume that the table T contains many records and that the following queries are highly selective (i.e., they return a relatively small number of records from the table), so it is likely that using an index is better than full scanning the table.

For the following query both indexes are equally good:

SELECT *
FROM T
WHERE COL_A = :VAL1
  AND COL_B = :VAL2;

But for the following query, the index T_IDX_1 is good, while T_IDX_2 is not:

SELECT *
FROM T
WHERE COL_A = :VAL1;

Although the index T_IDX_2 contains the column COL_A, it is not enough, as it does not contain it in its leading part. The order of columns in the index matters.
It’s easy to understand the difference when thinking of the following example: in the phone book the names are ordered first by last name, and then by first name. We can find very quickly all the subscribers whose last name is “Nakdimon”, but we must scan the whole phone book for finding all the subscribers whose first name is “Oren”.

Note: if the table contains a relatively small number of different unique values in the column COL_B, Oracle will still be able to use the index T_IDX_2 for answering the last query by using the Index Skip Scan access path, but still using T_IDX_1 with Index Range Scan will be better.

For the next two questions we’ll consider the following case: we need to create a new table with the following two indexes, and fill it with plenty of data:

CREATE TABLE T (
   COL_A NUMBER,
   COL_B DATE,
   COL_C VARCHAR2(4000),
   …
);
CREATE INDEX T_IDX_A ON T (COL_A);
CREATE INDEX T_IDX_BA ON T (COL_B,COL_A);

Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
Usually working in the former order (creating the indexes when the table is already filled) will take less time than working in the latter order (filling the table when the indexes already exist), since in the latter case the indexes need to be maintained with the insertion of each and every record.

When creating several indexes, is the order of creation important?
Here the answer is positive in certain cases.
Suppose that we created the table T and filled it with many rows, and now it’s time to create the indexes. We can create T_IDX_A first and T_IDX_BA second, or vice versa. Let’s examine both options:

Option 1:

  • We’ll create T_IDX_A first. For that, Oracle will do a Full Table Scan of T (and will take the value of COL_A from every record, and of course the ROWID of every record to know where to point to from the index)
  • Now we’ll create T_IDX_BA. Once again, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)

Option 2:

  • We’ll create T_IDX_BA first. For that, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)
  • Now we’ll create T_IDX_A, and this is where the plot changes. Oracle can do a Full Table Scan of T here as well, but in this case it has another alternative, a better one in most cases. The only details that are needed in order to build the index are the values of COL_A and the ROWID of all the records in the table (where COL_A is not null), and these details already exist in the index T_IDX_BA. Therefore, Oracle can do an Index Fast Full Scan of T_IDX_BA, instead of Full Table Scan of the table.

So, if all the columns of one index are included in a second index, it is recommended to create the second index first, and only then the first index, and enable Oracle to consider more alternatives. The more the number of columns in the table that do not exist in the indexes, the more significant the improvement in the creation time of the second index (by doing Index Fast Full Scan instead of Full Table Scan) is.

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

One Problem – Many Solutions

One of the things I love in the SQL language is that one problem may have many different solutions, or more specifically, one functional question may be solved by different SQL queries.

It doesn’t mean that all the solutions are equivalent in all aspects. If all the solutions solve the functional question correctly, then all the result sets must be the same. But they may be different in performance, for example (it’s so satisfying reducing the execution time of a query from hours to seconds just by rewriting it).

Another example: one query may be short and elegant while another “equivalent” query may be long and cumbersome.

And not all the solutions are even legitimate in all versions of the database – many times a new feature may make a query much simpler (analytic functions are a great example for that – I still remember how their appearance [in Oracle 8.1.6 if I remember correctly] had a huge effect on my query writing – making them shorter, and many times faster – and actually I see the same effect on people today when I rewrite their queries and at the same time introduce them to analytic functions).

Oracle 12c introduced some new features that let us write less for getting the same functionality, like Lateral Inline Views or the Row Limiting clause.

Here is a nice example for one problem with several solutions.

A few days ago, Lucas Jellema from AMIS raised in this blog post the question of “packing” multiple rows that represent adjacent time periods into a single row that represents the unified period, and suggested the following solution, that demonstrates well some important features such as the LAG and RANK analytic functions, Subquery Factoring (supported since Oracle 9i), and Recursive Subquery Factoring (introduced in Oracle 11g):

with chairpeople as
( select chairperson
  ,      date_from
  ,      date_to
  ,      case date_from - lag(date_to) over (partition by chairperson order by date_from asc)
         when 1 then 'PACK'
         end action
  ,      rank()  over (partition by chairperson order by date_from asc) rnk
  from   chairmanships
)
, packing_chairs (chair, date_from, date_to, lvl) as
( select chairperson, date_from, date_to, 1
  from   chairpeople
  where  action is null
  union all
  select p.chair, p.date_from, c.date_to, lvl+1
  from   chairpeople c
         join
         packing_chairs p
         on (c.chairperson = p.chair and c.rnk = p.lvl+1)
  where  c.action='PACK'
  )
, packed_chairs as
( select chair, date_from, nullif(max(nvl(date_to,date'9999-12-31')),date'9999-12-31') date_to
  from   packing_chairs
  group
  by     chair, date_from
)
select *
from   packed_chairs
order
by     date_from;

-- note: this is a slightly revised version of the query from the original post

I suggested another solution, based on the LAST_VALUE analytic function:

select chairperson,
       date_from,
       max(date_to) keep(dense_rank last order by date_to) date_to
from   (select chairperson,
               last_value(new_period_date_from ignore nulls) over(partition by chairperson order by date_from) date_from,
               date_to
        from   (select chairperson,
                       date_from,
                       date_to,
                       case when lnnvl(date_from – lag(date_to) over(partition by chairperson order by date_from) = 1) then date_from end new_period_date_from
                from   chairmanships))
group  by chairperson,
          date_from
order  by date_from;

In another comment to Lucas’ post, Sayan Malakshinov suggested an even simpler and shorter solution:

select 
  chairperson
 ,min(date_from) keep (dense_rank first order by date_from,date_to) as date_from
 ,max(date_to  ) keep (dense_rank last  order by date_from,date_to) as date_to
from (
      select
           chairperson 
         , date_from 
         , date_to 
         , sum(flag) over(partition by chairperson order by date_from,date_to) grp
      from (
            select 
                 chairperson 
               , date_from 
               , date_to 
               , decode( 1 + lag(date_to)over(partition by chairperson order by date_from,date_to), date_from, 0, 1) flag
            from chairmanships
           )
     )
group by chairperson, grp
order by chairperson, grp;

Finally (for now at least), I suggested yet another solution, using a new Oracle 12c feature – Pattern Matching:

select * from chairmanships 
  match_recognize (
     partition by chairperson
     order by date_from
     measures frst.date_from as date_from,
              date_to as date_to
     one row per match
     pattern (frst nxt*)
     define nxt as nxt.date_from = prev(nxt.date_to)+1) 
order by chairperson,date_from;

So, there you go – one question, four very different solutions.