Performance Tuning :: Force Optimizer To Consider All Join Permutations?
Oct 14, 2013
I'm looking to see if there is a way (fully expecting it to be an underscore, or two...) to force the optimizer to keep churning until all permutations are exhausted.I'm aware that it, to paraphrase, cuts out when it's spent more time parsing than it would just running it based on it's estimates.
I've got some irritating problems with xml rewrite, xml indexes and access paths/cardinalities etc and I'm really needing the entire thing considered as a one off for debugging this. I've already cranked up the maximum permutations to the max but it's not enough, it shorts out after 5041 permutations (I'd set that to 80000 max).
I know you'd not want to do this in the real world but I cant get the damned thing to run the plan I want in a 10053 so I can see the values it has there. I know I can hint it, but I'm trying to ascertain why it's not even considering it in a "normal" parse.
I want to know how the Oracle optimizer choose joins and apply them while executing the query. So that I will insure about optimizer join before writing any query.
Select tag0.TAG_VALUE pid, tag1.TAG_VALUE, tag2.TAG_VALUE From TAGGER.TAGGABLE_RESOURCE r , TAGGER.TAG tag0 , TAGGER.TAG tag1 , TAGGER.TAG tag2 where 1=1
[code]....
This runs in about 400ms. Now I replace this:
ANDtag0.TAG_TYPE in (4602, 5228) ANDtag1.TAG_TYPE in (4612, 5225) ANDtag2.TAG_TYPE in (4613, 5226)
with this:
ANDtag0.TAG_TYPE in (select COLUMN_VALUE from ( select * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patient ID') ) x1 )) ANDtag1.TAG_TYPE in (select COLUMN_VALUE from ( select * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patients Sex') )x2 )) ANDtag2.TAG_TYPE in (select COLUMN_VALUE from ( select * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patients Birth Date') ) x3 ))
So instead of hard coding the IDs there is a function that looks them up. The function itself is reporting that it runs in 0ms. But when I run the new query:
Select tag0.TAG_VALUE pid, tag1.TAG_VALUE, tag2.TAG_VALUE From TAGGER.TAGGABLE_RESOURCE r , TAGGER.TAG tag0 , TAGGER.TAG tag1 , TAGGER.TAG tag2 where 1=1
[code]....
it takes around 6s to run. I have looked at the explain plans it it seems as though the function based approach is triggering a full table scan of 'TAG'.
I have tried it with query hints to use index, but it doesn't change the execution plan, or the query time.
The explain plan for the quick query is:
PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------------- Plan hash value: 1031492929 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
[code]....
And the slow one is:
PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Plan hash value: 2741657371 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |Te ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20M| 1602M|
SELECT v.key$ FROM VERSION_TABLE v, DOCUMENT_TABLE d, CLASS_TABLE z WHERE v.documentKey = d.key$ AND d.classKey = z.key$ AND z.key$ IN (SELECT zz.key$ FROM CLASS_TABLE zz START WITH zz.name = 'esDTTemplate' CONNECT BY PRIOR zz.key$ = zz.parentKey) AND v.ESGROUP = 'SearchOperatorsMapping' ORDER BY d.name
Now I noticed that the subquery is never used to seed the join: indexes - if any - are used. Otherwise a full table scan is performed.In the example - if ESGROUP is indexed, then it's chosen to start the join evaluation. If not, a full table scan is performed.Is there any way to suggest to the optimizer to use the subquery in case there are no indexes - as a fallback ?
In the above example where VERSION_TABLE contains nearly two million records, the no index solution takes 60 secs. vs. less than 1 sec. in the index case.Wrapping the hierarchical query in a inline view leads to same result.
PS: the execution plan (without index) is: ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 9 (100)| | | 1 | SORT ORDER BY | | 1 | 171 | 9 (23)| 00:00:01 | |* 2 | HASH JOIN SEMI | | 1 | 171 | 8 (13)| 00:00:01 |
I have to do the optimization of a query that has the following characteristics:
- Takes 3 hours to process - Performs the inner join with 30 tables - Produces an output of 280 million records with 450 fields
First of all it is not feasible to make 30 updates (one for each table) to 280 million records.
The best solution that I had found so far was to create 3 temporary tables, where each of them to do the join with 1/3 of the 30 tables, and in the end I make the join between the main table and these three tables temporary.
I know that you will ask (or maybe not) to the query and samples, but it is impossible to create 30 examples.
how to optimize this type of querys that perform the join with multiple tables and produce a large output with (too) many columns.
The product I work on requires a query to tell us what tables are dependent on certain types.
SELECT dba_tab_cols.owner, dba_tab_cols.table_name, dba_tab_cols.data_type_owner, dba_tab_cols.data_type FROM dba_tab_cols JOIN dba_types ON dba_types.owner = dba_tab_cols.data_type_owner AND dba_types.type_name = dba_tab_cols.data_type WHERE (dba_types.owner IN ('SCHEMA1', 'SCHEMA2'......))
I find this query to be pretty slow. I think it is because data_type_owner in dba_tab_cols is not indexed. Adding an index is not an option because users expect our product to read-only.
however I was able to identify a poorly performing query that seemed to be maxing out our CPU. I have been trying to understand the Explain Plan. The plan below is from our test system which has considerably less information in the tables than our PROD system.
I can see there are a bunch of table scans at the end which may indicate missing indexes, but I am unclear on whether this is actually a problem as the %CPU seems to be worse for the JOIN near the top of the plan.
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1870M| 3018G| | 677M (1)|999:59:59 | | | | 1 | SORT ORDER BY | | 1870M| 3018G| 3567G| 677M (1)|999:59:59 | | |
Having production system: 11.2.0.1 on Windows Server x64 Test system: 9.2.0.1 on Windows XP
Problem preface: to get all unique CASEID which should be checked up by biometric system.What i should check - all CASEs for different PERSONs having same PHONEs at least among one phone type (1..4).Real table contains little bit more than 10 million records.I made test scripts.
Below the DDL for test table creation: ------------------------------------------ -- Create CASEINFO test table ------------------------------------------ DROP TABLE CASEINFO; CREATE TABLE CASEINFO
[code]...
Below i've put SQL/DLL to make test data.number of records inserted 2 millions. PERSON_COUNT := #/8; ------------------------------------------ -- fill CASEINFO with sample data ------------------------------------------ DECLARE I INTEGER;
[code]...
Below SQL select to check the data in created table. ------------------------------------------ -- Check test data counters ------------------------------------------ SELECT 'TOTAL',count(*) from CASEINFO UNION ALL SELECT 'LEGAL',count(*) from CASEINFO where
[code]...
The PROBLEM is that i am experiencing HUGE perfomance problems on both test and production systems with that query:
select distinct b.caseid from CASEINFO a, CASEINFO b where (a.person<>b.person) and (a.sex=b.sex) and ( (a.phone1=b.phone1) or (a.phone1=b.phone2) or (a.phone1=b.phone3) or
[code]...
This query takes almost 90 minutes to execute.And i do not know how to avoid this.Full SQL file to make test attached.
I want to make sure I am describing correctly what happens in a query where there is distributed database access and it is participating in a NESTED LOOPS JOIN. Below is an example query, the query plan output, and the remote SQL information for such a case. Of particular note are line#4 (NESTED LOOPS) and line#11 (REMOTE TABLE_0002).
What I want to know is more detail on how this NESTED LOOPS JOIN handles the remote operation. For example, for each row that comes out of line#5 and is thus going into the NESTED LOOPS JOIN operation @line#4, does the database jump across the network to do the remote loopkup? Thus if there are 1 million rows, does that mean 1 million network hops? Does batchsize play a role? For example, if the database batches in groups of 100 then does that mean 10 thousand network hops?
I think each row that comes out of line#5 means a network hop to the remote database. But I do not know for a fact.I have done some abbreviating in the plan in an attempt to make it fit on the page (line#7 TA = TABLE ACCESS).
explain plan for select count(*) from orders, lineitem where o_orderkey= l_orderkey.
The trace 10053 (as shown below) for this query shows nested loop join with Lineitem as outer table and Orders as inner table. It is effectively join on composite index (pk_lineitem) of Lineitem and unique index(Pk_orderkey) of Orders table. The cost calculation formula as given in the book as "outer table cost + cardinality of outer table * inner table cost " fails here. I am not able to understand this.
Looking to understand the difference between instance tuning and database tuning.
What is the difference between these two tuning exercises? I understand that an instance is memory based structures (logical) where as database consists of physical structures.
However, how does one tune a database the physical structure? Does it have to do with file placements/block sizes etc. Would you agree that a lot of that is taken care by ASM now in 11g? What tools are required/available (third party as well as oracle supplied) for these types of tuning scenarios?
I have two tables with 113M records in DWH_BILL_DET & 103M in prd_rerate_chg_que and Im running following merge query, which is running for 13 hrs to update records, which is quiet longer time.
SQL> explain plan for MERGE /*+ parallel (rq, 16) */ INTO DWH_BILL_DET rq USING (SELECT rated_que_rowid, detail_rerate_flag_code, rerate_sel_key,
How the length of column width effects index performance?
For example if i had IOT table emp_iot with columns: (id number, job varchar2(20), time date, plan number)
Table key consist of(id, job, time)
Column JOB has fixed list of distinct values ('ANALYST', 'NIGHT_WORKED', etc...).
What performance increase i could expect if in column "job" i would store not names but concrete numbers identifying job names. For e.g. i would store "1" instead 'ANALYST' and "2" instead 'NIGHT_WORKED'.
I have a question about database fragmentation.I know that fragmentation can reduce performance in query times. The blocks are distributed in many extents and scans process takes a long time. Oracle engine have to locate the address of the next extent..
I want to know if there is any system view in which you can check if your table or index has high fragmentation. If it's needed I will have to re-create, move or rebulid the table or index, but before I want to know if the degree of fragmentation is high.
Any useful script or query to do this, any interesting oracle system view?
There is a simple way to increase the performance of a query by reducing the row-size of the table it hits. I used it in the past by dividing the table into smaller parts and querying respective smaller table in each query.
what is this method called ? just forgot the method and can't recall it. what this type of row-reduction optimization is called ?
How many records could I have in a single table without performance degradation with Standard Edition without partitioning with cutting-edge server (8 or 12 cores, 72 GB RAM, FC 4 Gbit, etc...) and good storage?
300 Millions in only one table with 500K transactions / day is too much?
Testing our 9i to 11g upgrade, we've imported the entire DB into the new machine.We've found that certain procedures are really suffering performance problems. BUT, we've also found, that if we check out a production copy of the procedure from our source code control, and reinstall it, the performance issue goes away. Just alter the procedure and recompiling does NOT work.
The new machine where the 11g database exists is slightly different than the source, but it's not like we have this problem with every procedure. It's only a couple.
any possible reason that we'd have to re-install a procedure to correct a performance problem?
I need to check the package performance and need to improve the package performance.
1. how to check the package performance(each and every statement in the package)? 2. In the package using the delete statement to delete all records and observed that delete is taking long time to delete all the records in the table(Table records 7000000). This table is like staging table.Daily need to clean the data before inserting the data into it. what can I use instead of Delete.
Somewhere I read that we should not use hints in Oracle production environments, but we can use hints in the development environment and on achieving the desired execution plan we can adjust the 'statistics' to follow that plan without hints.
Q1. If it is true what statistics do we adjust for influencing the execution plan and how?
For example, I have the following simple query:
select e.empid, e.ename, d.dname from emp e, dept d where e.deptno=d.deptno;
emp.empid, emp.deptno and dep.deptno columns have indexes and the tables have the standard structure as found in the basic oracle examples.
If I look at the execution plan of the above query then I see that the driving table is empand the driven table is dept.Also the type of join that is taking place is 'Nested Loop'.
Questions: With respect to the above query, Q 2. If I want to make dept the driving table and emp the driven table then how can I adjust the statistics to achieve that? Q 3. If I want to use hash join instead of a nested loop join then then how can I adjust the statistics to achieve that?
I can put the ordered and the use_hash hint to effect this but again I have heard that altering statistics is a more robust way to control an execution plan as compared to hints.
When i exporting an user using expdp utility, the load the on the server is going up-to 5. The size of the database is 180GB. Below is the command that i use for export.
The following query gets input parameter from the Front End application, which User queries to get Reports.There are many drop down boxes like LOB, FAMILY, BRAND etc., The user may or may not select values from drop down boxes.
If the user select any one or more values ( against each drop down box) it has to fetch all matching values from DB. If the user does'nt select any values it has to fetch all the records, in this case application will send a value 'DEFAULT' (which is not a value in DB ) so that the DB will fetch all the records.
For getting this I wrote a query like below using DECODE, which colleague suggested that will hamper performance.From the below query all the variables V_ are defined in procedure which gets the values selected by user as a comma separated string here V_SELLOB and LOB_DESC is column in DB.
DECODE (V_SELLOB, 'DEFAULT', V_SELLOB, LOB_DESC) IN OPEN v_refcursor FOR SELECT /*+ FULL(a) PARALLEL(a, 5) */ * FROM items a WHERE a.sku_status = 'A'
what the principal things to look at when we have for the same query different performance results are?I have 2 different bases: the plan and data are the same but performance results are very differents.