This page contains materials that were copied from Oracle's on-line documents.

The SQL processing architecture (see the following figure) contains four main components: parser, optimizer, row source generator, and SQL execution engine. 

Text description of stu81185.gif follows

The parser performs syntax and semantics analysis of the SQL statements. The optimizer uses internal rules or costing methods to determine the most efficient way of producing the result of the query. The output from the optimizer is a plan that describes an optimum method of execution. The Oracle server provides two methods of optimization: cost-based optimizer (CBO) and rule-based optimizer (RBO). The row source generator receives the optimal plan from the optimizer. It outputs the execution plan for the SQL statement. The execution plan is a collection of row sources structured in the form of a tree. Each row source returns a set of rows for that step. SQL execution is the component that operates on the execution plan associated with a SQL statement. It then produces the results of the query. Each row source produced by the row source generator is executed by the SQL execution engine.

By default, the goal of the CBO is the best throughput. This means that it chooses the least amount of resources necessary to process all rows accessed by the statement. Also, Oracle can optimize a statement with the goal of best response time. This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement. In general, use the cost-based approach. Oracle Corporation is continually improving the CBO, and new features work only with the CBO. The rule-based approach is available for backward compatibility with legacy applications. The CBO architecture is illustrated as follows:

Text description of stu81184.gif follows

The input to query transformer is a parsed query. The main objective of the query transformer is to determine if it is advantageous to change the form of the query so that it enables generation of a better query plan. Four different query transformation techniques are employed by the query transformer: view merging, predicate pushing, subquery unnesting, and query rewrite using materialized views. Any combination of these transformations might be applied to a given query.

The estimator generates three different types of measures: selectivity, cardinality, and cost. The selectivity, which represents a fraction of rows from a row set, lies in the value range 0.0 to 1.0. It indicates how many rows from a row set will pass the predicate test. Cardinality represents the number of rows in a row set. There are several types of cardinality measures: effective, join, distinct, and group cardinality. The cost represents units of work or resource used in performing an operation. The CBO uses disk I/O, CPU usage, and memory usage as units of work. The operation can be scanning a table, accessing rows from a table using an index, joining two tables together, or sorting a row set.

The main function of the plan generator is to try out different possible plans for a given query and pick the one that has the lowest cost. Many different plans are possible because of the various combinations of different access paths, join methods, and join orders that can be used to access and process data in different ways and produce the same result. The number of possible plans for a query block is proportional to the number of join items in the FROM clause. This number rises exponentially with the number of join items. Because of this reason, the plan generator uses an internal cutoff to reduce the number of plans it tries when finding the one with the lowest cost. The cutoff is based on the cost of the current best plan.

Access Path

Access paths are ways in which data is retrieved from the database. For any row in any table accessed by a SQL statement, there are three common ways by which that row can be located and retrieved:

  1. A row can be retrieved through a full table scan, which reads all rows from a table and filters out those that do not meet the selection criteria. 
  2. A row can be retrieved also through the use of an index, by traversing the index using the indexed column values specified by the statement.
  3. A row can be retrieved by specifying the ROWID. ROWID access is the fastest way to retrieve a row, because this specifies the exact location of the row in the database.

In general, index access paths should be used for statements that retrieve a small subset of the table's rows, while full scans are more efficient when accessing a large portion of the table.

The CBO chooses an access path based on the available access paths for the statement and the estimated cost of executing the statement using each access path or combination of paths. The following examples illustrate how the optimizer uses selectivity.

SELECT * 
  FROM emp 
  WHERE ename = 'JACKSON'; 

If the ename column is a unique or primary key, then the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key.

If the ename column is not a unique or primary key, then the optimizer can use the following statistics to estimate the query's selectivity:

bulletUSER_TAB_COLUMNS.NUM_DISTINCT is the number of values for each column in the table.
bulletUSER_TABLES.NUM_ROWS is the number of rows in each table.

By dividing the number of rows in the emp table by the number of distinct values in the ename column, the optimizer estimates what percentage of employees have the same name. By assuming that the ename values are distributed uniformly, the optimizer uses this percentage as the estimated selectivity of the query.

SELECT * 
  FROM emp 
  WHERE empno < 7500; 

To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the empno column, if available. These statistics can be found in the USER_TAB_COL_STATISTICS view (or the USER_TAB_COLUMNS view). The optimizer assumes that empno values are distributed evenly in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.

The following query uses a bind variable rather than a literal value for the boundary value in the WHERE clause condition:

SELECT * 
  FROM emp 
  WHERE empno < :e1; 

The optimizer does not know the value of the bind variable e1. Indeed, the value of e1 might be different for each execution of the query. For this reason, the optimizer cannot use the means described in the previous example to determine selectivity of this query. In this case, the optimizer heuristically guesses a small value for the selectivity. This is an internal default value. The optimizer makes this assumption whenever a bind variable is used as a boundary value in a condition with one of the operators <, >, <=, or >=.

The optimizer's treatment of bind variables can cause it to choose different execution plans for SQL statements that differ only in the use of bind variables rather than constants. In one case in which this difference can be especially apparent, the optimizer might choose different execution plans for an embedded SQL statement with a bind variable in an Oracle precompiler program and the same SQL statement with a constant in SQL*Plus.

The following query uses two bind variables as boundary values in the condition with the BETWEEN operator:

SELECT * 
  FROM emp 
  WHERE empno BETWEEN :low_e AND :high_e; 

The optimizer decomposes the BETWEEN condition into these two conditions:

empno >= :low_e 
empno <= :high_e 

The optimizer heuristically estimates a small selectivity (an internal default value) for indexed columns in order to favor the use of the index.

Join

To choose an execution plan for a join statement, the optimizer must choose an access path to retrieve data from each table in the join statement, determine appropriate join method as well as join order. Here're some join methods supported by Oracle:

bulletNested Loop Joins: The optimizer determines a driving table (the outer loop). The other table is designated the inner table. For every row in the first (outer) table, Oracle accesses all the rows in the second (inner) table. The outer loop appears above the inner loop in the execution plan.
bulletNested Loop Outer Joins: This operation is used when an outer join is used between two tables. The outer join returns the outer (preserved) table rows, even when there are no corresponding rows in the inner (optional) table.
bulletHash Joins: The optimizer uses the smaller of the two tables/data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
bulletHash Join Outer Joins: This operation is used for outer joins where the optimizer decides that the amount of data is large enough to warrant an hash join, or it is unable to drive from the outer table to the inner table. The order of tables is not determined by the cost, but by the join condition. The outer table (whose rows are being preserved) is used to build the hash table, and the inner table is used to probe the hash table.
bulletSort Merge Joins: In a merge join, there is no concept of a driving table. Both the inputs are sorted on the join key. The sorted lists are then merged together.
bulletSort Merge Outer Joins: When an outer join cannot drive from the outer (preserved) table to the inner (optional) table, it cannot use hash join or nested loop joins. Then it uses the sort merge join for executing the join operation.
bulletCartesian Joins: A Cartesian join happens when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source (Cartesian product of the two sets). Cartesian joins generally result from poorly-written SQL.
bulletFull Outer Joins: Full outer joins let you join tables together, yet still show rows which do not have corresponding rows in tables joined-to.

Optimizer Hints

Hints allow you to make decisions usually made by the optimizer. They apply only to the optimization of the statement block in which they appear. You can send hints for a SQL statement to the optimizer by enclosing them in a comment within the statement. The syntax below shows hints contained in both styles of comments that Oracle supports within a statement block.

{DELETE|INSERT|SELECT|UPDATE} /*+ hint [text] [hint[text]]... */
{DELETE|INSERT|SELECT|UPDATE} --+ hint [text] [hint[text]]...

where:

DELETE 
SELECT 
UPDATE

Is a keyword that begins a statement block. Comments containing hints can appear only after these keywords.

+

Causes Oracle to interpret the comment as a list of hints. The plus sign must immediately follow the comment delimiter (no space is permitted).

hint

Is one of the hints discussed in this section. If the comment contains multiple hints, then each hints must be separated by at least one space.

text

Is other commenting text that can be interspersed with the hints.

There are different categories of hints:
bulletHints for Optimization Approaches and Goals
bulletHints for Access Paths
bulletThe FULL hint explicitly chooses a full table scan for the specified table.
SELECT /*+ FULL(A) */ accno, bal FROM accounts a WHERE accno = 7086854;
bulletThe ROWID hint explicitly chooses a table scan by rowid for the specified table.
SELECT /*+ROWID(emp)*/ * FROM emp WHERE rowid > 'AAAAtkAABAAAFNTAAA' AND empno = 155;
bulletThe HASH hint explicitly chooses a hash scan to access the specified table. It applies only to tables stored in a cluster.
bulletThe INDEX hint explicitly chooses an index scan for the specified table.
SELECT /*+ INDEX(patients sex_index) */ name, height, weight FROM patients WHERE sex = 'm';
bulletThe NO_INDEX hint explicitly disallows a set of indexes for the specified table.
SELECT /*+NO_INDEX(emp emp_empno)*/ empno FROM emp WHERE empno > 200;
bulletHints for Query Transformations
bulletHints for Join Orders
bulletThe ORDERED hint causes Oracle to join tables in the order in which they appear in the FROM clause.
SELECT /*+ ORDERED */ tab1.col1, tab2.col2, tab3.col3 FROM tab1, tab2, tab3 WHERE tab1.col1 = tab2.col1 AND tab2.col1 = tab3.col1;
bulletThe STAR hint forces a star query plan to be used, if possible. A star plan has the largest table in the query last in the join order and joins it with a nested loops join on a concatenated index. The STAR hint applies when there are at least three tables, the large table's concatenated index has at least three columns, and there are no conflicting access or join method hints.
SELECT /*+ STAR */ tab1.col1, tab2.col2, tab3.col3 FROM tab1, tab2, tab3 WHERE tab1.col1 = tab2.col1 AND tab2.col1 = tab3.col1;
bulletHints for Join Methods
bulletThe USE_NL hint causes Oracle to join each specified table to another row source with a nested loops join using the specified table as the inner table.
SELECT /*+ ORDERED USE_NL(customers) to get first row faster */
accounts.balance, customers.last_name, customers.first_name
FROM accounts, customers
WHERE accounts.custno = customers.custno;
bulletThe USE_MERGE hint causes Oracle to join each specified table with another row source with a sort-merge join.
SELECT /*+USE_MERGE(emp dept)*/ *
FROM emp, dept
WHERE emp.deptno = dept.deptno;
bulletThe USE_HASH hint causes Oracle to join each specified table with another row source with a hash join.
SELECT /*+use_hash(emp dept)*/ *
FROM emp, dept
WHERE emp.deptno = dept.deptno;
bulletThe DRIVING_SITE hint forces query execution to be done at a different site than that selected by Oracle.
SELECT /*+DRIVING_SITE(dept)*/ *
FROM emp, dept@rsite
WHERE emp.deptno = dept.deptno;
bulletThe LEADING hint causes Oracle to use the specified table as the first table in the join order.
SELECT /*+LEADING(dept)*/ *
FROM emp, dept
WHERE emp.deptno = dept.deptno;

 

Back to CS643 schedule