Bitmap Indexes Wrap-Up-Indexes-2
On 02/12/2022 by Robert CorvinoWe can see this query took two-tenths of a CPU second to execute and had to do a full scan on the table. The function MY_SOUNDEX was invoked almost 10,000 times (according to our counter). We’d like it to be called much less frequently, however.
Let’s see how indexing the function can speed up things. The first thing we’ll do is create the index as follows:
SQL> create index emp_soundex_idx on emp( substr(my_soundex(ename),1,6) ); Index created.
The interesting thing to note in this CREATE INDEX command is the use of the SUBSTR function. This is because we are indexing a function that returns a string. If we were indexing a function that returned a number or date, this SUBSTR would not be necessary. The reason we must SUBSTR the user-written function that returns a string is that such functions return VARCHAR2(4000) types. That may well be too big to be indexed—index entries must fit within about three quarters the size of a block. If we tried, we would receive (in a tablespace with a 4KB block size) the following:
SQL> create index emp_soundex_idx on
emp( my_soundex(ename) ) tablespace ts4k; emp( my_soundex(ename) ) tablespace ts4k
It is not that the index actually contains any keys that large, but that it could as far as the database is concerned. But the database understands SUBSTR. It sees the inputs to SUBSTR of 1 and 6 and knows the biggest return value from this is six characters; hence, it permits the index to be created. This size issue can get you, especially with concatenated indexes. Here is an example on an 8KB block size tablespace:
SQL> create index emp_soundex_idx on
emp( my_soundex(ename), my_soundex(job) ); emp( my_soundex(ename), my_soundex(job) )
The database thinks the maximum key size is 8000 bytes and fails the CREATE statement once again. So, to index a user-written function that returns a string, we should constrain the return type in the CREATE INDEX statement. In this example, knowing that MY_SOUNDEX returns at most six characters, we are substringing the first six characters.
We are now ready to test the performance of the table with the index on it. We would like to monitor the effect of the index on INSERTs as well as the speedup for SELECTs to see the effect on each. In the unindexed test case, our queries take over one second, and if we were to run SQL_TRACE and TKPROF during the inserts, we could observe that without the index, the insert of 9999 records took about 0.30 seconds:
insert into emp NO_INDEX (empno,ename,job,mgr,hiredate,sal,comm,deptno) select rownum empno,
initcap(substr(object_name,1,10)) ename, substr(object_type,1,9) JOB,
This was the overhead introduced in the management of the new index on the MY_SOUNDEX function—both in the performance overhead of simply having an index (any type of index will affect insert performance) and the fact that this index had to call a stored procedure 9999 times.
If we compare the two examples (unindexed vs. indexed), we find that the insert into the indexed table was affected by a little more than twice the runtime. However, the select went from two-tenths of a second to effectively instantly. The important things to note here are the following:
•\ The insertion of 9999 records took approximately two times longer. Indexing a user-written function will necessarily affect the performance of inserts and some updates. You should realize that any index will impact performance, of course. For example, I did a simple test without the MY_SOUNDEX function, just indexing the ENAME column itself. That caused the INSERT to take about one second to execute—the PL/SQL function is not responsible for the entire overhead. Since most applications insert and update singleton entries, and each row took less than 1/10,000 of a second to insert, you probably won’t even notice this in a typical application. Since we insert a row only once, we pay the price of executing the function on the column once, not the thousands of times we query the data.
•\ While the insert ran two times slower, the query ran many times faster. It evaluated the MY_SOUNDEX function a few times instead of almost 10,000 times. The difference in performance of our query here is measurable and quite large. Also, as the size of our table grows, the full scan query will take longer and longer to execute. The index- based query will always execute with nearly the same performance characteristics as the table gets larger.
•\ We had to use SUBSTR in our query. This is not as nice as just coding WHERE MY_SOUNDEX(ename)=MY_SOUNDEX( ‘King’ ), but we can easily get around that, as we will see shortly.
So, the insert was affected, but the query ran incredibly fast. The payoff for a small reduction in insert/update performance is huge. Additionally, if you never update the columns involved in the MY_SOUNDEX function call, the updates are not penalized at all (MY_SOUNDEX is invoked only if the ENAME column is modified and its value changed).
Let’s see how to make it so the query does not have to use the SUBSTR function call. The use of the SUBSTR call could be error-prone—our end users have to know to SUBSTR from 1 for six characters. If they use a different size, the index will not be used. Also, we want to control in the server the number of bytes to index. This will allow us to reimplement the MY_SOUNDEX function later with 7 bytes instead of 6 if we want to. We can hide the SUBSTR with a virtual column—or a view in any release quite easily as follows:
SQL> create or replace view emp_v as
select ename, substr(my_soundex(ename),1,6) ename_soundex, hiredate from emp;
View created.
We would see the same sort of query plan we did with the base table. All we have done here is hide the SUBSTR( F(X), 1, 6 ) function call in the view itself. The optimizer still recognizes that this virtual column is, in fact, the indexed column and so does the right thing. We see the same performance improvement and the same query plan. Using this view is as good as using the base table—better even, because it hides the complexity and allows us to change the size of the SUBSTR later.
Archives
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
Calendar
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Leave a Reply