Physical Organization-Indexes-2

On 28/05/2022 by Robert Corvino

THE EFFECT OF ARRAYSIZE ON LOGICAL I/O

It is interesting to note the effect of the ARRAYSIZE on logical I/O performed. ARRAYSIZE is the number of rows Oracle returns to a client when they ask for the next row. The client will then buffer these rows and use them before asking the database for the next set of rows. The ARRAYSIZE may have a very material effect on the logical I/O performed by a query, resulting from the fact that if you have to access the same block over and over again across calls (across fetch calls specifically, in this case) to the database, Oracle must retrieve that block again from the buffer cache. Therefore, if you ask for 100 rows from the database in a single call, Oracle might be able to fully process a database block and not need to retrieve that block again. If you ask for 15 rows at a time, Oracle might well have to get the same block over and over again to retrieve the same set of rows.

In the example earlier in this section, we were using the SQL*Plus default array fetch size of 15 rows (if you divide the total rows fetched (100005) by the number of fetch calls (6675), the result is very close to 15). If we were to compare the execution of the previous queries using 15 rows per fetch vs. 100 rows per fetch, we would observe the following for the COLOCATED table:

The first query was executed with the ARRAYSIZE of 15, and the (cr=nnnn) values in the Row Source Operation shows we performed 1374 logical I/Os against the index and then 1625 logical I/Os against the table (2899–1374; the numbers are cumulative in the Row Source Operation steps).

When we increased the ARRAYSIZE to 100 from 15 (via the SET ARRAYSIZE 100 command), the amount of logical I/O against the index dropped to 245, which was the direct result of not having to reread the index leaf blocks from the buffer cache every 15 rows, but only every 100 rows. To understand this, assume that we were able to store 200 rows per leaf block. As we are scanning through the index reading 15 rows at a time, we would have to retrieve the first leaf block 14 times to get all 200 entries off it. On the other hand, when we array fetch 100 rows at a time, we need to retrieve this same leaf block only two times from the buffer cache to exhaust all of its entries.

The same thing happened in this case with the table blocks. Since the table was sorted in the same order as the index keys, we would tend to retrieve each table block less often, as we would get more of the rows from it with each fetch call.

So, if this was good for the COLOCATED table, it must have been just as good for the DISORGANIZED table, right? Not so. The results from the DISORGANIZED table would look like this:

The results against the index here were identical, which makes sense, as the data stored in the index is just the same regardless of how the table is organized. The logical I/O went from 1374 for a single execution of this query to 245, just as before. But overall the amount of logical I/O performed by this query did not differ significantly: 21,365 vs. 20,236. The reason? The amount of logical I/O performed against the table did not differ at all—if you subtract the logical I/O against the index from the total logical I/O performed by each query, you’ll find that both queries did 19,991 logical I/Os against the table. This is because every time we wanted N rows from the database—the odds that any two of those rows would be on the same block was very small—there was no opportunity to get multiple rows from a table block in a single call.

Every professional programming language I have seen that can interact with Oracle implements this concept of array fetching. In PL/SQL, you may use BULK COLLECT or rely on the implicit array fetch of 100 that is performed for implicit cursor for loops. In Java/JDBC, there is a prefetch method on a connect or statement object. Oracle Call Interface (OCI; a C API) allows you to programmatically set the prefetch size, as does Pro*C. As you can see, this can have a material and measurable effect on the amount of logical I/O performed by your query, and it deserves your attention.

Thus, in this particular case, the full scan is very appropriate due to the way the data is physically stored on disk. So why didn’t the optimizer full scan in the first place for this query? Well, it would have if left to its own design, but in the first example query against DISORGANIZED, I purposely hinted the query and told the optimizer to construct a plan that used the index. In the second case, I let the optimizer pick the best overall plan.

Physical Organization-Indexes-2

Leave a Reply Cancel reply

Archives

Calendar

Categories