Performance Tuning :: Join Condition In Index?
Mar 14, 2012For a hash join statement, is it beneficial to have the join condition objects in the index as well as the objects in the where clause?
View 19 RepliesFor a hash join statement, is it beneficial to have the join condition objects in the index as well as the objects in the where clause?
View 19 RepliesI have a huge table (about 60 gb) partition over range. The index on this table is global index created on 4 columns together. I have a query which is running very slowly. The explain plan is showing the use of this global index.Explain plan is not showing pstart and pend because the index is global.
View 6 Replies View RelatedI have a query which had a join:
a.c1=b.c1 and a.c2=@var
where @var is user supplied input at runtime...We had a index on a.c2 . The CBO would use this index to generate an opitimised query plan.We found some records from table "b" were dropping due to inner join. So we made a change in join. It'd be like
a.c1(+)=b.c1 and nvl(a.c2,@var)=@var
This query is no longer using the index, instead its doing a full table scan causing the query to slowdown.I have tried creating index on nvl(a.c2,'31-dec-9999')
But the CBO won't use it.Anyway to create index on this col so that full table scan can be avoided?
How to force an index if the table not using the index?
View 10 Replies View RelatedHow the length of column width effects index performance?
For example if i had IOT table emp_iot with columns: 
(id   number,
job  varchar2(20),
time date,
plan number)
Table key consist of(id, job, time)
Column JOB has fixed list of distinct values ('ANALYST', 'NIGHT_WORKED', etc...).
What performance increase i could expect if in column "job" i would store not names but concrete numbers identifying job names.
For e.g.  i would store "1" instead 'ANALYST' and "2" instead 'NIGHT_WORKED'.
I have to do the optimization of a query that has the following characteristics:
- Takes 3 hours to process
- Performs the inner join with 30 tables 
- Produces an output of 280 million records with 450 fields
First of all it is not feasible to make 30 updates (one for each table) to 280 million records.
The best solution that I had found so far was to create 3 temporary tables, where each of them to do the join with 1/3 of the 30 tables, and in the end I make the join between the main table and these three tables temporary.
I know that you will ask (or maybe not) to the query and samples, but it is impossible to create 30 examples.
how to optimize this type of querys that perform the join with multiple tables and produce a large output with (too) many columns.
In SQL, almost all the thing which are possible with join is possible with sub-query also and vice-a-versa.
So when should I use sub-query and when should I go for join?
when am trying to use nvl for one condition it is taking lot of time to execute but when am removing nvl function then the query executing in 2 min. condition is given below
(HOI2.ORG_INFORMATION1)=nvl(TO_CHAR(:p_set_of_books_id) , HOI2.ORG_INFORMATION1)
but when am using the same condition as below the querry executing in 2 min 
(HOI2.ORG_INFORMATION1)=TO_CHAR(:p_set_of_books_id)
my query given below
(SELECT   cust.customer_number cust_no, cust.customer_name customer,
                     cnv.item_no, SUM(wd.shipped_quantity) shp_qty_nos,
                    0 rtn_qty_nos,
                    ROUND(SUM(cnv.cnf * wd.shipped_quantity), 3) shp_qty_tons,
                    0 rtn_qty_tons, 0 net_shp_qty_nos, 0 net_shp_qty_tons
[code]...
I'm looking to see if there is a way (fully expecting it to be an underscore, or two...)  to force the optimizer to keep churning until all permutations are exhausted.I'm aware that it, to paraphrase, cuts out when it's spent more time parsing than it would just running it based on it's estimates.
I've got some irritating problems with xml rewrite, xml indexes and access paths/cardinalities etc and I'm really needing the entire thing considered as a one off for debugging this. I've already cranked up the maximum permutations to the max but it's not enough, it shorts out after 5041 permutations (I'd set that to 80000 max).
I know you'd not want to do this in the real world but I cant get the damned thing to run the plan I want in a 10053 so I can see the values it has there. I know I can hint it, but I'm trying to ascertain why it's not even considering it in a "normal" parse.
The product I work on requires a query to tell us what tables are dependent on certain types.
SELECT dba_tab_cols.owner,
dba_tab_cols.table_name,
dba_tab_cols.data_type_owner,
dba_tab_cols.data_type
FROM dba_tab_cols
JOIN dba_types
ON dba_types.owner      = dba_tab_cols.data_type_owner
AND dba_types.type_name = dba_tab_cols.data_type
WHERE (dba_types.owner IN ('SCHEMA1', 'SCHEMA2'......))
I find this query to be pretty slow. I think it is because data_type_owner in dba_tab_cols is not indexed. Adding an index is not an option because users expect our product to read-only.
however I was able to identify a poorly performing query that seemed to be maxing out our CPU.  I have been trying to understand the Explain Plan.  The plan below is from our test system which has considerably less information in the tables than our PROD system.
I can see there are a bunch of table scans at the end which may indicate missing indexes, but I am unclear on whether this is actually a problem as the %CPU seems to be worse for the JOIN near the top of the plan.
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                         |  1870M|  3018G|       |   677M  (1)|999:59:59 |        |      |
|   1 |  SORT ORDER BY              |                         |  1870M|  3018G|  3567G|   677M  (1)|999:59:59 |        |      |
[code]...
Having production system: 11.2.0.1 on Windows Server x64
Test system: 9.2.0.1 on Windows XP
Problem preface: to get all unique CASEID which should be checked up by biometric system.What i should check - all CASEs for different PERSONs having same PHONEs at least among one phone type (1..4).Real table contains little bit more than 10 million records.I made test scripts.
Below the DDL for test table creation:
------------------------------------------
-- Create CASEINFO test table 
------------------------------------------
DROP TABLE CASEINFO;
CREATE TABLE CASEINFO
[code]...
Below i've put SQL/DLL to make test data.number of records inserted 2 millions.
PERSON_COUNT := #/8;
------------------------------------------
-- fill CASEINFO with sample data
------------------------------------------
DECLARE
  I INTEGER;
 
[code]...
Below SQL select to check the data in created table.
------------------------------------------
-- Check test data counters
------------------------------------------
SELECT 'TOTAL',count(*) from CASEINFO
UNION ALL
SELECT 'LEGAL',count(*) from CASEINFO where 
 
[code]...
The PROBLEM is that i am experiencing HUGE perfomance problems on both test and production systems with that query:
select distinct b.caseid
from CASEINFO a, CASEINFO b 
where (a.person<>b.person) and (a.sex=b.sex) and 
(
      (a.phone1=b.phone1) or 
      (a.phone1=b.phone2) or 
      (a.phone1=b.phone3) or 
    
[code]...
This query takes almost 90 minutes to execute.And i do not know how to avoid this.Full SQL file to make test attached.
I am posting the below query:
SELECT PEA.INCURRED_BY_PERSON_ID AS PERSON_ID,
PEA.EXPENDITURE_ENDING_DATE AS WEEK_END_DATE,
CASE
[Code].....
The explain is below:
SELECT STATEMENT  ALL_ROWSCost: 48,287  Bytes: 18,428,818  Cardinality: 297,239  
3 HASH JOIN  Cost: 48,287  Bytes: 18,428,818  Cardinality: 297,239  
1 TABLE ACCESS FULL TABLE PA.PA_EXPENDITURES_ALL Cost: 2,964  Bytes: 3,506,094  Cardinality: 194,783  
2 TABLE ACCESS FULL TABLE PA.PA_EXPENDITURE_ITEMS_ALL Cost: 43,425  Bytes: 26,637,468  Cardinality: 605,397  
  I want to make sure I am describing correctly what happens in a query where there is distributed database access and it is participating in a NESTED LOOPS JOIN.  Below is an example query, the query plan output, and the remote SQL information for such a case.  Of particular note are line#4 (NESTED LOOPS) and line#11 (REMOTE TABLE_0002).
What I want to know is more detail on how this NESTED LOOPS JOIN handles the remote operation.  For example, for each row that comes out of line#5 and is thus going into the NESTED LOOPS JOIN operation @line#4, does the database jump across the network to do the remote loopkup?  Thus if there are 1 million rows, does that mean 1 million network hops?  Does batchsize play a role?  For example, if the database batches in groups of 100 then does that mean 10 thousand network hops?
I think each row that comes out of line#5 means a network hop to the remote database.  But I do not know for a fact.I have done some abbreviating in the plan in an attempt to make it fit on the page (line#7 TA = TABLE ACCESS).
SELECT                     A.POLICY , 
                           F.MIN_MEMBER_ID, 
                           MIN(A.EFF_DATE) EFF_DATE, 
                           A.EXP_DATE , 
                           G.DESCRIPTION PROGRAM_NAME, 
                        
[code]...
Following is the query on TPC-H schema.
explain plan for select
count(*)
from
        orders,
        lineitem
where
 o_orderkey= l_orderkey.
The trace 10053 (as shown below) for this query shows nested loop join with Lineitem as outer table and Orders as inner table. It is effectively join on composite index (pk_lineitem) of Lineitem and unique index(Pk_orderkey) of Orders table. The cost calculation formula as given in the book as "outer table cost +  cardinality of outer table * inner table cost "  fails here. I am not able to understand this. 
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: LINEITEM  Alias: LINEITEM
    #Rows: 6001215  #Blks:  109048  AvgRowLen:  124.00
  Column (#1): L_ORDERKEY(NUMBER)
    AvgLen: 6.00 NDV: 1500000 Nulls: 0 Density: 6.6667e-07 Min: 1 Max: 6000000
[code]....
how the cost has been calculated. This does not follow the traditional nested loop cost formula as mentioned in the book.
We have a DELETE statement when coming from application is not using index but when run from Toad or SQLplus as same user uses index. Explain plan also shows using index.I did a query on v$sql below is the output of the query( I have attached the same as a txt file). All the stats are up to date and confirmed from the developer the variable B1 is using the same datatype as column MAXMKY. 
SQL_TEXTSQL_ID DISK_READSOPTIMIZER_HASH_VALUE     
DELETE LOTA WHERE MAXMKY=:B1 2g2prrp3z56ah19,099,1891,846,735,884
DELETE LOTA WHERE MAXMKY=:B1 2g2prrp3z56ah0        1,846,735,884
OPTIMIZER_COST HASH_VALUEPLAN_HASH_VALUE MODULEPARSING_SCHEMA_NAME
[code].....
I am working on a query for a feedback response system which is going to be targeted at the common case when the user only want the most recent 10-20 rows in the feedback table.  My though is to create an index on the date column, do a sort in an inner query and rownum <= in an outer query.  This works as I expect when I am only querying the main table (lookup by index with a stop key), but when I start joining the main table to attribute tables I end up with a full table scan of the main table with the stop key applied after all the joins are completed, the index is nowhere to be found.
CREATE TABLE attr1_tbl(attr1_id NUMBER NOT NULL, attr1 VARCHAR2(10) NOT NULL,
    CONSTRAINT attr1_pk PRIMARY KEY (attr1_id));
CREATE TABLE attr2_tbl(attr2_id NUMBER NOT NULL, attr2 VARCHAR2(10) NOT NULL,
    CONSTRAINT attr2_pk PRIMARY KEY (attr2_id));
CREATE TABLE attr3_tbl(attr3_id NUMBER NOT NULL, attr3 VARCHAR2(10) NOT NULL,
    CONSTRAINT attr3_pk PRIMARY KEY (attr3_id));
[code]....
One thing I noticed was that when no data is selected from the attribute tables, even if they are joined in the query, the CBO throws them out of the plan and only accesses the main table.  With the foreign keys this makes sense and really just disqualified my first thought that maybe I was missing a foreign key or not null constraint somewhere.
I also added the cardinality hint to overcome the chance that in my test case there was so little data that index access is not worth it.
I am going through this scenario:
* 35 | ID                             TABLE ACCESS BY INDEX ROW | S_ORG_EXT        |  3064K|  2472M|       |     1   (0)| 00:00:01 |
|  36 |                                INDEX FULL SCAN          | S_ORG_EXT_U1     |    14 |       |       |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
---------------------------------------------------
35 - filter("T2"."ACCNT_FLG"<>'N' AND ("T2"."INT_ORG_FLG"<>'Y' OR "T2"."PRTNR_FLG"<>'N'))
This unselective index scan on step 36 of the explain is returning 14 rows but optimizer is selecting 3064 K rows from the table .
I tried creating combined index on all 3 columns mentioned in the  predicates for 35th step , but that is not utilized .
how to index this whole expression ::--
(ACCNT_FLG<>'N' AND (INT_ORG_FLG<>'Y' OR PRTNR_FLG<>'N'))
Something like CREATE INDEX XYZ on table((ACCNT_FLG<>'N' AND (INT_ORG_FLG<>'Y' OR PRTNR_FLG<>'N')) compute statistics ;
is there any way to reduce the index creation time.
in my case one index creation took 5 minute and there are 5 indexes , so it took 25 minutes.
I have the following problem. When I used in the IN-Statement fixed values e.q. 197321,197322,197323 ..., the index i_tab2_index works fine (index range scan).
But when I used in the IN-Statement an Sub-Select, the index i_tab2_index doesn't work (fast full scan)!My scale indices and used Selects:
CREATE INDEX i_tab1_index ON tab1 ( datum, flag_inst );
CREATE INDEX i_tab2_index ON tab2 ( tab2Idx, kontro );
SELECT count(epidx) as rowAnz
FROM tab2
WHERE tab2Idx IN ( SELECT tab1IDX FROM tab1
                   WHERE datum BETWEEN '20120117' AND '20120117'
                   AND flag_inst = '1' )
  AND kontro = '9876521'
[code]...
get all the unused index in the system , if i put this query in batch job and execute it every night upto one months and store its data in a table and after one months i can get all the used indexes and left would be our unused indexes.
select
  distinct  p.object_name c1
   from
   dba_hist_sql_plan p,
   dba_hist_sqlstat s
[Code]....
I have a table whose size is 2.3 GB and there are two indexes on it. One index is based on a Date column whose size is 900 MB, and the Other index consists of 5 columns including the date column, and the size is almost 2GB. But when i query the table using the Date column, it is doing a range scan on the second index which is almost the same size as the table. why is it not using the first index? What steps should i take so that it uses the First index without passing hints.
View 4 Replies View RelatedWhat is the difference between index rebuild and index rebuild online.
View 3 Replies View RelatedWhere filter middle_rows save before join and grop by operation?
It is save rows in PGA Private SQL Area or save blocks in SGA databuffer?
mbr has 60,000 rows and member has 60,000 rows approx. two tables have indexes on ssn, and citi_no on them.
PK of mbr : mbr_id
PK of member : mbr_id
other columns are not PK, and have no index on it.
I'm wondering why the statment doesn't use index while ssn and citi_no have index.
MERGE INTO mbr t
USING (SELECT mbr_id,citi_no
FROM member) a
ON (t.ssn = a.citi_no)
WHEN MATCHED THEN
UPDATE SET t.asis_mbr_id = a.mbr_id
where t.ssn not in(select ssn from mbr group by ssn having count(*) > 1)
I have to create indexes on foreign key columns ,now if composite index is already there with foreign key column then that will work or i will have to create a single column index.
View 17 Replies View RelatedI am just curious to know what is clustering factor in index and How is it impartant in a sql tuning perspective ?
View 3 Replies View Relatedi am trying to find the index want to rebuild or not for that i have analyzed that index after that i don't know how to calculate the ration could any one steps to do calculate the following ratio 
Run the ANALYZE INDEX command on the index to validate its structure and then calculate the ratio of LF_BLK_LEN/LF_BLK_LEN+BR_BLK_LEN and if it isn?t near 1.0 (i.e. greater than 0.7 or so) then the index should be rebuilt. Or if the ratio BR_BLK_LEN/ LF_BLK_LEN+BR_BLK_LEN is nearing 0.3.
I have the following query:
Select 
 tag0.TAG_VALUE pid, tag1.TAG_VALUE, tag2.TAG_VALUE  From TAGGER.TAGGABLE_RESOURCE r 
, TAGGER.TAG tag0
, TAGGER.TAG tag1
, TAGGER.TAG tag2
 where 1=1 
[code]....
This runs in about 400ms. Now I replace this:
ANDtag0.TAG_TYPE in (4602, 5228)
ANDtag1.TAG_TYPE in (4612, 5225)
ANDtag2.TAG_TYPE in (4613, 5226)
with this:
ANDtag0.TAG_TYPE in (select  COLUMN_VALUE from ( select  * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patient ID') ) x1 ))
ANDtag1.TAG_TYPE in (select  COLUMN_VALUE from ( select  * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patients Sex') )x2 ))
ANDtag2.TAG_TYPE in (select  COLUMN_VALUE from ( select  * from table( TAGGER.GET_IDS_OF_SIMILAR_TAG_TYPES('Patients Birth Date') ) x3 ))
So instead of hard coding the IDs there is a function that looks them up. The function itself is reporting that it runs in 0ms. But when I run the new query:
Select 
 tag0.TAG_VALUE pid, tag1.TAG_VALUE, tag2.TAG_VALUE  From TAGGER.TAGGABLE_RESOURCE r 
, TAGGER.TAG tag0
, TAGGER.TAG tag1
, TAGGER.TAG tag2
 where 1=1 
[code]....
it takes around 6s to run. I have looked at the explain plans it it seems as though the function based approach is triggering a full table scan of 'TAG'.
I have tried it with query hints to use index, but it doesn't change the execution plan, or the query time.
The explain plan for the quick query is:
PLAN_TABLE_OUTPUT                                                                                   
--------------------------------------------------------------------------------------------------- 
Plan hash value: 1031492929                                                                         
--------------------------------------------------------------------------------------------------- 
| Id  | Operation                       | Name            | Rows  | Bytes | Cost (%CPU)| Time     | 
[code]....
And the slow one is:
PLAN_TABLE_OUTPUT                                                                                    
---------------------------------------------------------------------------------------------------- 
Plan hash value: 2741657371                                                                         
---------------------------------------------------------------------------------------------------- 
| Id  | Operation                                 | Name                         | Rows  | Bytes |Te 
---------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT                          |                              |    20M|  1602M|   
[code]....
I have a table "NEWS_COMMENT" like this:
Name      Type           
-------   -------------- 
ID  NUMBER(8)
USERID    NUMBER(8)
SORT_TEXT VARCHAR2(100) 
TEXT      VARCHAR2(1000) 
DATE      DATE
VALID     VARCHAR2(1)
CODNEW    NUMBER(10)   
The table has a normal index for the userid column.
There is a query that looks for the differents CODNEW for a USERID but allways the CODNEW has to be greater than 2248833
select codnew from news-comment  where userid=2914655 and valid='N' and codnew>2248833
I have created a new index for this kind of querys
create index coment_new_IDX on news_comment 
 (CASE  WHEN codnew >2248833 and valid='N' THEN userid ELSE NULL END )
but oracle doesn't use it. I have used a hint to force it but doesn't run.