Indexing Only Some of the Rows-Indexes

On 24/01/2022 by Robert Corvino

In addition to transparently helping out queries that use built-in functions like UPPER, LOWER, and so on, function-based indexes can be used to selectively index only some of the rows in a table. As we’ll discuss a little later, B*Tree indexes do not contain entries for entirely NULL keys. That is, if you have an index I on a table T (as follows) and you have a row where A and B are both NULL, there will be no entry in the index structure:

Create index I on t(a,b);

This comes in handy when you are indexing just some of the rows in a table. Consider a large table with a NOT NULL column called PROCESSED_FLAG that may take one of two values, Y or N, with a default value of N. New rows are added with a value of N to signify not processed, and as they are processed, they are updated to Y to signify processed. We would like to index this column to be able to retrieve the N records rapidly, but there are millions of rows and almost all of them are going to have a value of Y.

The resulting B*Tree index will be large, and the cost of maintaining it as we update from N to Y will be high. This table sounds like a candidate for a bitmap index (this is low cardinality, after all), but this is a transactional system and lots of people will be inserting records at the same time with the processed column set to N, and, as we discussed earlier, bitmaps are not good for concurrent modifications. When we factor in the constant updating of N to Y in this table as well, then bitmaps would be out of the question, as this process would serialize entirely.

So, what we would really like is to index only the records of interest (the N records). We’ll see how to do this with function-based indexes, but before we do, let’s see what happens if we just use a regular B*Tree index. Using the standard BIG_TABLE script described in the setup section at the beginning of the book, we’ll update the TEMPORARY column, flipping the Ys to Ns and the Ns to Ys:
$ sqlplus eoda/foo@PDB1
SQL> update big_table set temporary = decode(temporary,’N’,’Y’,’N’); 1000000 rows updated.

And we’ll check out the ratio of Ys to Ns:
SQL> select temporary, cnt, round( (ratio_to_report(cnt) over ()) * 100, 2 ) rtrfrom (select temporary, count(*) cntfrom big_tablegroup by temporary);

As we can see, of the 1,000,000 records in the table, only about one-fifth of one percent of the data should be indexed. If we use a conventional index on the TEMPORARY column (which is playing the role of the PROCESSED_FLAG column in this example), we would discover that the index has 1,000,000 entries, consumes almost 14MB of space, and has a height of 3:
SQL> create index processed_flag_idx on big_table(temporary); Index created.
SQL> analyze index processed_flag_idx validate structure; Index analyzed.
SQL> select name, btree_space, lf_rows, height from index_stats;

Any retrieval via this index would incur three I/Os to get to the leaf blocks. This index is not only wide but also tall. To get the first unprocessed record, we will have to perform at least four I/Os (three against the index and one against the table).

How can we change all of this? We need to make it so the index is much smaller and easier to maintain (with less runtime overhead during the updates). Enter the function-based index, which allows us to simply write a function that returns NULL when we don’t want to index a given row and returns a non-NULL value when we do. For example, since we are interested just in the N records, let’s index just those:
SQL> drop index processed_flag_idx;Index dropped.
SQL> create index processed_flag_idx on big_table( case temporary when ‘N’then ‘N’ end );
Index created.

That is quite a difference—the index is some 32KB, not 14MB. The height has decreased as well. If we use this index, we’ll perform one less I/O than we would using the previous taller index.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Indexing Only Some of the Rows-Indexes

Leave a Reply Cancel reply

Archives

Calendar

Categories