Performance Tuning :: How Oracle Optimizer Choose Joins (hash / Merge And Nested Loop Join)
Oct 18, 2012
I want to know how the Oracle optimizer choose joins and apply them while executing the query. So that I will insure about optimizer join before writing any query.
explain plan for select count(*) from orders, lineitem where o_orderkey= l_orderkey.
The trace 10053 (as shown below) for this query shows nested loop join with Lineitem as outer table and Orders as inner table. It is effectively join on composite index (pk_lineitem) of Lineitem and unique index(Pk_orderkey) of Orders table. The cost calculation formula as given in the book as "outer table cost + cardinality of outer table * inner table cost " fails here. I am not able to understand this.
I'm looking to see if there is a way (fully expecting it to be an underscore, or two...) to force the optimizer to keep churning until all permutations are exhausted.I'm aware that it, to paraphrase, cuts out when it's spent more time parsing than it would just running it based on it's estimates.
I've got some irritating problems with xml rewrite, xml indexes and access paths/cardinalities etc and I'm really needing the entire thing considered as a one off for debugging this. I've already cranked up the maximum permutations to the max but it's not enough, it shorts out after 5041 permutations (I'd set that to 80000 max).
I know you'd not want to do this in the real world but I cant get the damned thing to run the plan I want in a 10053 so I can see the values it has there. I know I can hint it, but I'm trying to ascertain why it's not even considering it in a "normal" parse.
I'm joinging two tables event_types and tmp_acc tables.
event_types contains 2 Billion records tmp_acc contains 20,000 records.
Resulting rows are about 300,000 records in event_types table end_t and account_obj_id0 are joined indexed
no indexs in tmp_acc.
When I run below query with nexted loop it takes 6 hrs to complete. But when I run with hash join even after 4 days it was still running. what is wrong with hash join here. Why it takes so long. I'm joining only 20000 rows. So I think there should be a way to get result rows quickly.
show parameters hash_area_size
NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ hash_area_size integer 2097152
explain plan for select --+ parallel(e,6) [code]....
I want to make sure I am describing correctly what happens in a query where there is distributed database access and it is participating in a NESTED LOOPS JOIN. Below is an example query, the query plan output, and the remote SQL information for such a case. Of particular note are line#4 (NESTED LOOPS) and line#11 (REMOTE TABLE_0002).
What I want to know is more detail on how this NESTED LOOPS JOIN handles the remote operation. For example, for each row that comes out of line#5 and is thus going into the NESTED LOOPS JOIN operation @line#4, does the database jump across the network to do the remote loopkup? Thus if there are 1 million rows, does that mean 1 million network hops? Does batchsize play a role? For example, if the database batches in groups of 100 then does that mean 10 thousand network hops?
I think each row that comes out of line#5 means a network hop to the remote database. But I do not know for a fact.I have done some abbreviating in the plan in an attempt to make it fit on the page (line#7 TA = TABLE ACCESS).
I see one of my SQL's which is ran by the user on a 10.2.0.3 database changing its SQL_ID after some runs even if the query is not changed a bit! However the HASH VALUE for this query remains the same.
how a same query can have different SQL_ID's but same HASH_VALUE?
Note: Statistics are not modified on the base tables of this query.
I have two tables with 113M records in DWH_BILL_DET & 103M in prd_rerate_chg_que and Im running following merge query, which is running for 13 hrs to update records, which is quiet longer time.
SQL> explain plan for MERGE /*+ parallel (rq, 16) */ INTO DWH_BILL_DET rq USING (SELECT rated_que_rowid, detail_rerate_flag_code, rerate_sel_key,
Query - SELECT * FROM sysadm.ps_tmtl_post_vw a WHERE a.month_prepared_for = 'JUNE,2012' AND a.ca_status = 'P5 CUST GO AHEAD'
[code]...
When I try for the SQL-Tuning sets its throws error that
ADDITIONAL INFORMATION SECTION ------------------------------------------------------------------------------- - The optimizer could not merge the view at line ID 2 of the execution plan. The optimizer cannot merge a view that contains a set operator.
I read earlier forum where it says that optimizer unable to interpret the conditions like order by etc etc.Now there is one view which is getting used in the query when I did select * from vw it took more than 16 hrs to complete. (bad view).
Attached File(s)
exec_plan.txt ( 2.06MB ) Number of downloads: 1 view_def.txt ( 14.12K ) Number of downloads: 2
how joins work with in-line views.I have a query and its explain plan as below:
SELECT e.ename,e.deptno,d.dname FROM dept d, emp e WHERE e.deptno=d.deptno AND e.deptno=20 [code]....
I do not find any difference in both the explain plans. Both are same. In my second query, the filtered rows will be joined to dept table. And hence the baggage will reduce.But how can I verify that in-line view has worked better?
mbr has 60,000 rows and member has 60,000 rows approx. two tables have indexes on ssn, and citi_no on them.
PK of mbr : mbr_id PK of member : mbr_id
other columns are not PK, and have no index on it.
I'm wondering why the statment doesn't use index while ssn and citi_no have index.
MERGE INTO mbr t USING (SELECT mbr_id,citi_no FROM member) a ON (t.ssn = a.citi_no) WHEN MATCHED THEN UPDATE SET t.asis_mbr_id = a.mbr_id where t.ssn not in(select ssn from mbr group by ssn having count(*) > 1)
Predicate Information (identified by operation id): ---------------------------------------------------
4 - access("A"."DEPTNO"="B"."DEPTNO")
Nested loop by defintion means,for every row returned by the outer query,the inner query is executed that many times.
In the above example,oracle does a full table scan and returned 14 rows.Now for dept table,it does a index unique scan and applies the predicate a.deptno=b.deptno and returns 1 row.
My question is why it is returning only 1 row? That measn for every 14 rows,this one row is fetched 14 times.
Oracle Version: 11.2.0.2.0. I have two explicit cursors and I would like to choose at run time which one to run. Here is a simplified code snippet of what I am doing today:
DECLARE CURSOR Cursor_A IS SELECT * FROM EMP_A; CURSOR Cursor_B IS SELECT * FROM EMP_B; RUNA CHAR(1) := 'Y';
[code]....
I want to avoid maintaining the same long list of transformations. I also want to avoid, if possible, an explicit FETCH INTO, because there are hundreds of fields in both tables. I'm looking for something like this (and I know this doesnt work):
DECLARE CURSOR Cursor_A IS SELECT * FROM EMP_A; CURSOR Cursor_B IS SELECT * FROM EMP_B; RUNA CHAR(1) := 'Y'; CursorToRun IS REF CURSOR;
Can this be optimized, in dev and Ist we didn't realize since 1000 rows were there, but in PERF since 2 mil rows are there this is taking a long time,
SET SERVEROUTPUT ON DECLARE counter number := 0; CURSOR insertValues IS select roleid, productcode, functioncode, typecode, restrictiontype, value1 from restrictions where actionmode = 'INSERT';
[code]...
can this be done in a single update since Selects /Updates are happening on same table
So usually joining a local table and a remote table (much larger table), the best practice is using a /*+ DRIVING_SITE(remote table) */ hint.
*CASE1:
INSERT INTO another_local_table SELECT /*+ DRIVING_SITE(remote_table) */ FROM local_table ,remote_table@remotedb WHERE join condition
*CASE2: However I saw this particular FOR x IN Select id From local_table l LOOP INSERT INTO another_local_table
[code]...
So far I haven't seen the explain plan of both cases since most of the tables are Global temporary tables. But in terms of the logic of the two cases and the common best practice, logically doesn't CASE1 should have a better performance?
For me, its like taking a trip to a grocery. CASE1 buys all that you need, one time, it will take you a half-day trip perhaps. However CASE2 is like quickly buying a grocery item, one at a time, for several short trips. You'll save on gas on CASE1 right.
BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for Solaris: Version 10.2.0.4.0 - Production NLSRTL Version 10.2.0.4.0 - Production
5 rows selected.
I have a problem with views and nested selects which I cannot explain. Here is a trimed down version of the research I have done. notice the following:
1) all code is executed from the same user CDRNORMALCODE. this user has all views and procedural code 2) all data is owned by a different user CDRDATA. This user has no views and no code.
My problem is this:
If I reference the table directly with a delete statement that uses a nested select (i.e. IN clause with select), the index I expect and want is used.But if I execute the same delete but reference even the most simple of views (select * from <table>) instead of the table itself, then a full table scan is done of the table.
Here is an execute against the table directly (owned by cdrdata). Notice the reference to the table in the table schema on line 3. Also please notice INDEX RANGE SCAN BSNSS_CLSS_CASE_RULE_FK1 at the bottom of the plan.
SQL> show user USER is "CDRNORMALCODE" SQL> SQL> explain plan for 2 delete
[code]...
OK, here is an update. The views I am useing normally have instead of triggers on them. If I remove the instead of trigger the problem looks like it goes away, when I put the trigger back the problem comes back.But why would an instead-of-trigger change the query plan for a view?
SQL> DELETE FROM PLAN_TABLE;
5 rows deleted.
SQL> explain plan for 2 delete 3 from BSNSS_CLSS_MNR_CASE_RULE_SV
SELECT v.key$ FROM VERSION_TABLE v, DOCUMENT_TABLE d, CLASS_TABLE z WHERE v.documentKey = d.key$ AND d.classKey = z.key$ AND z.key$ IN (SELECT zz.key$ FROM CLASS_TABLE zz START WITH zz.name = 'esDTTemplate' CONNECT BY PRIOR zz.key$ = zz.parentKey) AND v.ESGROUP = 'SearchOperatorsMapping' ORDER BY d.name
Now I noticed that the subquery is never used to seed the join: indexes - if any - are used. Otherwise a full table scan is performed.In the example - if ESGROUP is indexed, then it's chosen to start the join evaluation. If not, a full table scan is performed.Is there any way to suggest to the optimizer to use the subquery in case there are no indexes - as a fallback ?
In the above example where VERSION_TABLE contains nearly two million records, the no index solution takes 60 secs. vs. less than 1 sec. in the index case.Wrapping the hierarchical query in a inline view leads to same result.
PS: the execution plan (without index) is: ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 9 (100)| | | 1 | SORT ORDER BY | | 1 | 171 | 9 (23)| 00:00:01 | |* 2 | HASH JOIN SEMI | | 1 | 171 | 8 (13)| 00:00:01 |
I have to do the optimization of a query that has the following characteristics:
- Takes 3 hours to process - Performs the inner join with 30 tables - Produces an output of 280 million records with 450 fields
First of all it is not feasible to make 30 updates (one for each table) to 280 million records.
The best solution that I had found so far was to create 3 temporary tables, where each of them to do the join with 1/3 of the 30 tables, and in the end I make the join between the main table and these three tables temporary.
I know that you will ask (or maybe not) to the query and samples, but it is impossible to create 30 examples.
how to optimize this type of querys that perform the join with multiple tables and produce a large output with (too) many columns.
The product I work on requires a query to tell us what tables are dependent on certain types.
SELECT dba_tab_cols.owner, dba_tab_cols.table_name, dba_tab_cols.data_type_owner, dba_tab_cols.data_type FROM dba_tab_cols JOIN dba_types ON dba_types.owner = dba_tab_cols.data_type_owner AND dba_types.type_name = dba_tab_cols.data_type WHERE (dba_types.owner IN ('SCHEMA1', 'SCHEMA2'......))
I find this query to be pretty slow. I think it is because data_type_owner in dba_tab_cols is not indexed. Adding an index is not an option because users expect our product to read-only.
however I was able to identify a poorly performing query that seemed to be maxing out our CPU. I have been trying to understand the Explain Plan. The plan below is from our test system which has considerably less information in the tables than our PROD system.
I can see there are a bunch of table scans at the end which may indicate missing indexes, but I am unclear on whether this is actually a problem as the %CPU seems to be worse for the JOIN near the top of the plan.
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1870M| 3018G| | 677M (1)|999:59:59 | | | | 1 | SORT ORDER BY | | 1870M| 3018G| 3567G| 677M (1)|999:59:59 | | |
Having production system: 11.2.0.1 on Windows Server x64 Test system: 9.2.0.1 on Windows XP
Problem preface: to get all unique CASEID which should be checked up by biometric system.What i should check - all CASEs for different PERSONs having same PHONEs at least among one phone type (1..4).Real table contains little bit more than 10 million records.I made test scripts.
Below the DDL for test table creation: ------------------------------------------ -- Create CASEINFO test table ------------------------------------------ DROP TABLE CASEINFO; CREATE TABLE CASEINFO
[code]...
Below i've put SQL/DLL to make test data.number of records inserted 2 millions. PERSON_COUNT := #/8; ------------------------------------------ -- fill CASEINFO with sample data ------------------------------------------ DECLARE I INTEGER;
[code]...
Below SQL select to check the data in created table. ------------------------------------------ -- Check test data counters ------------------------------------------ SELECT 'TOTAL',count(*) from CASEINFO UNION ALL SELECT 'LEGAL',count(*) from CASEINFO where
[code]...
The PROBLEM is that i am experiencing HUGE perfomance problems on both test and production systems with that query:
select distinct b.caseid from CASEINFO a, CASEINFO b where (a.person<>b.person) and (a.sex=b.sex) and ( (a.phone1=b.phone1) or (a.phone1=b.phone2) or (a.phone1=b.phone3) or
[code]...
This query takes almost 90 minutes to execute.And i do not know how to avoid this.Full SQL file to make test attached.