When Should You Use a Bitmap Index?-Indexes
On 06/10/2022 by Robert CorvinoBitmap indexes are most appropriate on low distinct cardinality data (i.e., data with relatively few discrete values when compared to the cardinality of the entire set). It is not really possible to put a value on this—in other words, it is difficult to define what low distinct cardinality is truly. In a set of a couple thousand records, 2 would be low distinct cardinality, but 2 would not be low distinct cardinality in a two-row table. In a table of tens or hundreds of millions records, 100,000 could be low distinct cardinality.
So, low distinct cardinality is relative to the size of the resultset. This is data where the number of distinct items in the set of rows divided by the number of rows is a small number (near zero). For example, a GENDER column might take on the values M, F, and NULL. If you have a table with 20,000 employee records in it, then you would find that 3/20000 = 0.00015. Likewise, 100,000 unique values out of 10,000,000 results in a ratio of 0.01—again, very small.
These columns would be candidates for bitmap indexes. They probably would not be candidates for having BTree indexes, as each of the values would tend to retrieve an extremely large percentage of the table. BTree indexes should be selective in general, as outlined earlier. Bitmap indexes should not be selective—on the contrary, they should be very unselective in general.
Bitmap indexes are extremely useful in environments where you have lots of ad hoc queries, especially queries that reference many columns in an ad hoc fashion or produce aggregations such as COUNT.
For example, suppose you have a large table with three columns: GENDER, LOCATION, and AGE_GROUP. In this table, GENDER has a value of M or F, LOCATION can take on the values 1 through 50, and AGE_GROUP is a code representing 18 and under, 19-25, 26-30, 31-40, and 41 and over. You have to support a large number of ad hoc queries that take the following form:
SQL> select count(*)from twhere gender = ‘M’and location in ( 1, 10, 30 )and age_group = ’41 and over’;
SQL> select *from twhere ( ( gender = ‘M’ and location = 20 )or ( gender = ‘F’ and location = 22 ))and age_group = ’18 and under’;
SQL> select count() from t where location in (11,20,30); SQL> select count() from t where age_group = ’41 and over’ and gender = ‘F’;
You would find that a conventional BTree indexing scheme would fail you. If you wanted to use an index to get the answer, you would need at least three and up to six combinations of possible BTree indexes to access the data via the index.
Since any of the three columns or any subset of the three columns may appear, you would need large concatenated B*Tree indexes on the following:
•\ GENDER, LOCATION, AGE_GROUP: For queries that used all three, or GENDER with LOCATION, or GENDER alone
•\ LOCATION, AGE_GROUP: For queries that used LOCATION and AGE_GROUP or LOCATION alone
•\ AGE_GROUP, GENDER: For queries that used AGE_GROUP with GENDER or AGE_GROUP alone
To reduce the amount of data being searched, other permutations might be reasonable as well in order to decrease the size of the index structure being scanned. This is ignoring the fact that a B*Tree index on such low cardinality data is not a good idea.
Here is where the bitmap index comes into play. With three small bitmap indexes, one on each of the individual columns, you will be able to satisfy all of the previous predicates efficiently. Oracle will simply use the functions AND, OR, and NOT, with the bitmaps of the three indexes together, to find the solution set for any predicate that references any set of these three columns. It will take the resulting merged bitmap, convert the 1s into rowids if necessary, and access the data (if you are just counting rows that match the criteria, Oracle will just count the 1 bits). Let’s take a look at an example. First, we’ll generate test data that matches our specified distinct cardinalities, index it, and gather statistics. We’ll make use of the DBMS_RANDOM package to generate random data fitting our distribution:
Predicate Information (identified by operation id):
5 – access(“LOCATION”=1)
6 – access(“LOCATION”=10)
7 – access(“LOCATION”=30)
8 – access(“AGE_GROUP”=’41 and over’)
9 – access(“GENDER”=’M’)
This example shows the power of the bitmap indexes. Oracle is able to see the location in (1, 10, 30) and knows to read the index on location for these three values and logically OR together the “bits” in the bitmap. It then takes that resulting bitmap and logically ANDs that with the bitmaps for AGE_GROUP=’41 AND OVER’ and GENDER=’M’. Then a simple count of 1s and the answer is ready.
This shows similar logic: the plan shows the OR’d conditions are each evaluated by AND-ing together the appropriate bitmaps and then OR-ing together those results. Throw in another AND to satisfy the AGE_GROUP=’18 AND UNDER’ and we have it all. Since we asked for the actual rows this time, Oracle will convert each bitmap 1 and 0 into rowids to retrieve the source data.
In a data warehouse or a large reporting system supporting many ad hoc SQL queries, this ability to use as many indexes as make sense simultaneously comes in very handy indeed. Using conventional BTree indexes here would not be nearly as usual or usable, and as the number of columns that are to be searched by the ad hoc queries increases, the number of combinations of BTree indexes you would need increases as well.
However, there are times when bitmaps are not appropriate. They work well in a read-intensive environment, but they are extremely ill-suited for a write-intensive environment. The reason is that a single bitmap index key entry points to many rows. If a session modifies the indexed data, then all of the rows that index entry points to are effectively locked in most cases. Oracle cannot lock an individual bit in a bitmap index entry; it locks the entire bitmap index entry. Any other modifications that need to update that same bitmap index entry will be locked out.
This will seriously inhibit concurrency, as each update will appear to lock potentially hundreds of rows preventing their bitmap columns from being concurrently updated. It will not lock every row as you might think—just many of them. Bitmaps are stored in chunks, so using the earlier EMP example we might find that the index key ANALYST appears in the index many times, each time pointing to hundreds of rows. An update to a row that modifies the JOB column will need to get exclusive access to two of these index key entries: the index key entry for the old value and the index key entry for the new value. The hundreds of rows these two entries point to will be unavailable for modification by other sessions until that UPDATE commits.
Archives
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
Calendar
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Leave a Reply