Performance Tuning :: Does High Volume Of Data Can Change Plan
Jun 4, 2010
The prod stats has been implemented in development. The stats has been gathered 2 months back on dev while in production the stats has been gathered 2 weeks back.
My question shouldn't the high volume of data causes changes in plan in both the environment? My thinking is that plan can be different as the high volume of data are changing in prod it may lead to a different plan.
I need to warn readers that I am not a DBA but am heavily involved in application development. Whatever I know about database tuning is whatever I've managed to pick up via self-learning, and I must admit that the sum total of my knowledge isn't a lot.
Anyway, our "DBAs" recently did an upgrade to our 10g database, going from version 10.2.0.2.0 to 10.2.0.4.0. Immediately after the upgrade, a particular query has started to under-perform. The query itself was not altered in any way during the upgrade.
We have two explain plans for the query, a before and an after plan. The two plans are similar but not identical. The plans are too massive to post here, so I hope the following synopsis of the differences will do.
The 10.2.0.2.0 plan:
shows a HASH GROUP BY has a TempSpc column in the explain plan shows a particular table (EMP_HISTORY) as having ~1700 rows
The 10.2.0.4.0 plan:
shows SORT GROUP BY instead of HASH GROUP BY does not show a TempSpc column in the explain plan shows the EMP_HISTORY table as having only 25 rows
Other than these points, no other discernible differences can be noted. I'm wondering what would cause HASH to change to SORT. I'm told that stats are up-to-date.
Session 1 create table tab1 as select * from dba_objects where object_id is not null; alter session set events '10046 trace name context forever, level 12'; declare x number; begin for i in 1..4 loop
[code]....
Session 2
after "starting" the above pl/sql block from Session 1, I keep on querying tab2 from Session 2 And as soon as 2 records are inserted in tab2, I create index from Session 2
select * from tab2; select * from tab2; select * from tab2; N ---------- 1 2 create index i on tab1(object_id);
As I have tested from a single session (just before this test) such index is used for the sql statement
select count(1) into x from tab1 where object_id=2331;
However when I checked the trace file I am not geeting results as expected
I am expecting 4 execution plans - 2 FTS and 2 Index Access scans and for this I am issuing following command
SELECT COUNT(1) FROM TAB1 WHERE OBJECT_ID=2331 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 1 0 0 Execute 4 0.00 0.00 0 2 0 0
[code]....
1) Why I am unable to see 4 execution plans - 2 with FTS and 2 with Index access when I mentioned 'aggregate=no'?
2) Whether the index i will be used for last 2 iterations after first 2 iterations of FTS?
If answer to above question 2) is 'No'
By which method I can force an ongoing sql statement in loop to take different execution path? Of course I can't hard parse sql in 'that' current session Will flushing Shared pool work in above case?
I am in the very early planning stages of a project the goal of which is to identify separate organizations which may in fact be the same organization.
Our first implementation of this task was a process designed to look for a few thousand organizations in a pool of a few hundred thousand organizations. To accomplish this we made heavy use of Oracle's Text index as well as a custom index type we created which utilized n-grams. This approach worked quite well for on-demand editing of the organizations, in which a user might log in and say in addition to what we already know about organization A we also know x, y and z does that change anything and worked acceptably well for the bulk processing we did on our "known" information once a week running for a couple of hours on the weekend.
We have now been tasked with reworking this initial implementation only now we want to look at a set consisting of several million organizations for potential matches which exist within the set. As in our initial implementation we will be breaking what we know about organizations into groupings so we aren't comparing a phone number to an email address and normalizing the data as much as we can so we ignore things like case and punctuation. Even after all this we are still talking about looking for similar values in a group which might be in the tens of millions (some types of data will have more than one value per organization).
My initial thought on the problem is to use n-grams though not in the way we did in the past. The basic idea here is that we break the search values up into all the substrings it is made of and look for other values which have a high number of those substrings in common.
SQL & PL/SQL was the best place for the question, but I could not think of a better one.
I am facing one performance issue, in which the query cost is very low compare to cpu cost and as a result the cpu always show the high graph.I am also attaching the gv$sql and gv$sql_plan data of this query.
This is the query:
SELECT PTLS.ITEMTYPE , PTLS.ITEMID , PTLS.STAGEID, TS.USERID, SUM(PREVIOUSHOURS) AS PREVIOUSHOURS, MIN(STARTDATE) AS STARTDATE, MAX(STARTDATE) AS ENDDATE FROM PROJECTTIMELOGSSTAGE PTLS, PROJECTTIMESHEETITEM PTSI, TIMESHEET TS WHERE PTLS.PROJECTID = :B2 AND TS.TIMESHEETID = PTSI.TIMESHEETID AND TS.USERID = :B1 AND PTSI.TIMESHEETID = PTLS.TIMESHEETID AND PTSI.ITEMTYPE = PTLS.ITEMTYPE AND PTSI.ITEMID = PTLS.ITEMID AND (PTSI.ISPWFITEM = 'N' OR PTSI.ISPWFITEM IS NULL) AND PTLS.ITEMTYPE NOT IN ('OtherTsk','NewTsk','Loc','Glb') AND (PTLS.ITEMTYPE, PTLS.ITEMID ) IN (SELECT ITEMTYPE, ITEMID FROM PROJECTTIMELOGSSTAGE PTLS1 WHERE PTLS1.PROJECTID = :B2 AND PTLS1.TIMESHEETID = :B3 ) GROUP BY PTLS.ITEMTYPE, PTLS.ITEMID, PTLS.STAGEID, TS.USERID
Why the query is behaving differently with the different database.(execution plan)
Whatever the production database is having same database instance replicated to a new schema. I tried both the queries running on both environment.In prod the index has been used but in newdev it is not. This case existing primary key index were not been used.
Is there any way to tune the following query using lot of CPU:-select description,time_stamp,user_id from bhi_tracking where description like 'Multilateral:%'The explain plan for this is query is:-
Bhi_tracking is used for reporting purpose and contain millions of records.Generally we keep one year data in this table and delete the remaining.Can I drop the table after taking export and then import it back or can i truncatethe table and then insert the rows into it to enhancethe performance.
what privilege is require for a user to execute explain plan? I get below error while try to execute explain plan.
SQL> explain plan for SELECT /*+ FULL(t) */ COUNT(*) FROM "DREAM"."CONSUMER.TAB" t WHERE ROWNUM <= 1000000; explain plan for SELECT /*+ FULL(t) */ COUNT(*) FROM "DREAM"."CONSUMER.TAB" t WHERE ROWNUM <= 1000000 * ERROR at line 1: ORA-01031: insufficient privileges
How can i check the avg time taken by an execution plan. Actually i have a very big query and it changes its execution plan very often, we would like to lock the best execution plan and to find it , i would like to know the Average Execution Time the query takes when it runs using different different execution plans.
I have queries on the execution plan of a sql statement
Following is the example
create table t1 as select s1.nextval id,a.* from dba_objects a; create table t2 as select s2.nextval id,a.* from dba_objects a; insert into t1 select s1.nextval id,a.* from dba_objects a; insert into t1 select s1.nextval id,a.* from dba_objects a; insert into t2 select s2.nextval id,a.* from dba_objects a; insert into t2 select s2.nextval id,a.* from dba_objects a; insert into t2 select s2.nextval id,a.* from dba_objects a; commit;
create index i1 on t1(id); create index i2 on t2(id); create index i11 on t1(object_type);
(1) First index on object_type is accessed to get rowids - t1.object_type='VIEW' (2) Then the filter on owner is applied - t1.owner='SYS' (3) Then the table T1 is accessed to fetch data from the rowids returned by the index I11 and filer application - TABLE ACCESS BY INDEX ROWID
Though I am unable to understand how filter can be applied to the rowids retrieved from index, we can see from the plan below that The rows accessed have reduced from 8550 to 1221 before we access the table...Thus filter "t1.owner='SYS'" is applied in between. Right?
another question is
Case 1 - do we retrieve a rowid from index for a given value, then retrieve required values from table for that rowid Thus row at a time in both ... in loop OR Case 2 - we first fetch all rowids from index and then retrieve values from table one row at a time from the collection of rowids fetched?
Suppose Case 1 is what is happening then can we say, both the steps mentioned by IDS 2,3 in plan below are executed exactly equal number of times and the filter "t1.owner='SYS'" is applied at some later stage? Of course in this case the values in ROWS stand misleading then
select * from t1,t2 where t1.id = t2.id and t1.object_type='VIEW' and t1.owner='SYS';
Execution Plan ---------------------------------------------------------- Plan hash value: 26873579 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1221 | 233K| 915 (1)| 00:00:11 | |* 1 | HASH JOIN | | 1221 | 233K| 915 (1)| 00:00:11 | |* 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1221 | 116K| 381 (1)| 00:00:05 | |* 3 | INDEX RANGE SCAN | I11 | 8550 | | 24 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 161K| 15M| 533 (1)| 00:00:07 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."ID"="T2"."ID") 2 - filter("T1"."OWNER"='SYS') 3 - access("T1"."OBJECT_TYPE"='VIEW')
- Both of these databases run on different hardware (A is a VM, B is on a physical host)
- The 20 tables in A and B have exactly same number of rows and after preparing the data, the schemas were analysed using the same DBMS_STATS parameters
Despite this, the execution plans appear to be quite different for the same queries between A and B
I imagine there is something outside of the Oracle table rowcounts, table stats, column stats, index stats that's resulting in the different execution plans.
refere to below 2 queries and their execution plans:
First Query INSERT INTO temp_vendor(vendor_record_seq_no,checksum,rownumber,transaction_type,iu_flag) SELECT /*+ USE_NL ( vd1 ,vd2 ,vd3 ) leading ( vd1 ,vd2 ,vd3 , tvd) */ vd1.vendor_record_seq_no, tvr.checksum, tvr.rownumber, tvr.transaction_type, 'U' FROM vendor_data vd1,
[code]...
Second Query SELECT vd1.vendor_record_seq_no, tvr.checksum, tvr.rownumber, tvr.transaction_type, 'U' FROM ( select * from vendor_data vd1 where vd1.study_seq_no = 99903 AND vd1.control_column_seq_no = 435361232
[code]...
Both are to achieve same output but written in different ways. CAn I get same exectuion plan from 1st query as there is for 2nd using hints
I have 15 million of records as csv, want to load through sqlloader Is sqlloader is the right option to load high volume of data? I have loaded with 2.5 lac records which has taken 4 mins to load.
I have two tables with same columns(15 of them), I am trying to find difference between two tables using minus operator and then insert in stage table using below code
Issue is table1 has 50 million records table2 is empty
so when first time when we execute this v_collection1,v_collection2 collection will have 50 million records in it which will go in memory, I think this is not good, because going in memory will eat memory and resources while sorting and other activities ?
After fetching records in collection we are inserting that in stage table and then COMMIT so i think that wont be good because committing 50 million will generate large amount of redo?
below is snippet of my code
DECLARE type lst_collection1 IS TABLE OF table1.col1%type INDEX BY binary_integer; type lst_collection2 IS [code].......
One of our clients is using Rule Based Optimizer on Oracle 10.2.0.3.0
2-3 weeks backs, during performance issue in one of the sql queries, one of our team members executed tuning adviser for it, created SQL profile and the subsequent execution of the SQL did not took much time (less I/O). Now it took hardly a minute to execute
When this happened I checked that the SQL profile forced that particular query to use CBO (say plan_hash_value is PHV1 here). Yesterday the same query again took 15-20 minutes for execution. I checked that even for this execution the query used the same SQL profile but "this time" with different plan_hash_value - say PHV2.
Today again the query executed in less than a minute and used the plan_hash_value as PHV1.
select distinct plan_hash_value,timestamp from dba_hist_sql_plan where sql_id='mysqlid' order by 1,2;
I confirmed from awrsqrpt as well that different plans were used for different plan_hash_values and every time same SQL profile was used
SQL> select name,CATEGORY,SIGNATURE,CREATED,LAST_MODIFIED,TYPE,STATUS,FORCE_MATCHING from dba_sql_profiles;
NAME CATEGORY SIGNATURE CREATED LAST_MODIFIED TYPE STATUS FOR ------------------------------ ------------------------------ ---------- -------------------- -------------------- --------- -------- --- SYS_SQLPROF_015ffffcc3e1c5b000 DEFAULT 1.5512E+19 20-feb-2013 16:30:48 20-feb-2013 16:30:48 MANUAL ENABLED NO
I am unable to understand how execution plan and thus plan_hash_value is changing for the same SQL Profile. I read that SQL Profile (unlike stored outline) keeps up with increasing data volume and may not keep up with changing data distribution.
I checked that values for 4 bind variables out of 81 are different for execution between today and yesterdays' run(queried v$sql_bind_capture based on last_captured)
My questions are 1) does the different plan_hash_values with different execution plans for query using same SQL profile mean the query was hard parsed multiple times and still used the same SQL profile? 2) If that is the case why I never saw child_number = 1 in any of the views for the same sql_id. I tried it repeatedly over last 2 weeks and always found child_number=0 in v$sql (also loaded_versions=1) 3) Does the different values of bind variable are causing this flip-flop of the plans? How can I conclude this?
I have 2 plans with 2 different plan_hash_values. I know which would be better. How can I force the sql to use better plan in the two in this case where I am using Rule Based Optimizer and have SQL profile created If this is not possible then how can I create stored outline from the existing plan (not waiting for subsequent execution to take place).
I am facing a weird situation wherein the explain plan of same sql in SIT and PROD is different.In fact the explain plan is very costly in Prod.Also the DB version of both SIT and PROD is same.
Below is the sql and corresponding explain plan in Prod and SIT respectively.
Query: SELECT seq,CCN,ProcessorPart,root_item,comp_path,Item,comp_item,comp_item_type, lag(comp_item_type,1,'PART') over(PARTITION BY seq ORDER BY lvl)Nxt_comp_item_type,lvl,bom_qty, ROUND(CASE min(abs(bom_qty)) OVER (PARTITION BY seq ORDER BY lvl) WHEN 0 THEN 0 ELSE 1 END * EXP (SUM (LN (nullif(abs(bom_qty),0))) OVER (PARTITION BY seq ORDER BY lvl))) Ulti_qty, 'AMER'
[code]...
The tables referred in above query is small tables containing arnd 10k records.The above tables are partitioned on Region and not indexed.
I am not able to attribute why there is a huge change in Cost between SIT and Prod.Apparently the Job is going for 3-5 hours which used to get completed within 20mins in SIT.
however I was able to identify a poorly performing query that seemed to be maxing out our CPU. I have been trying to understand the Explain Plan. The plan below is from our test system which has considerably less information in the tables than our PROD system.
I can see there are a bunch of table scans at the end which may indicate missing indexes, but I am unclear on whether this is actually a problem as the %CPU seems to be worse for the JOIN near the top of the plan.
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1870M| 3018G| | 677M (1)|999:59:59 | | | | 1 | SORT ORDER BY | | 1870M| 3018G| 3567G| 677M (1)|999:59:59 | | |
The types of query I refer to in the title are of this pseudo-code ilk:
select t.column_value from table1 o, xmltable('for $co in $data where $co/path1=$bind1 and $co/path2=$bind2 passing o.field as "data", :b1 as "bind1", :b2 as "bind2") t where o.field = :b3
They're querying a table with a (binary) xmltype with a path/domain index over this column.As those who have had the (mis)fortune to run into these will know, the queries are extensively rewritten under the covers to access to xml via the paths supplied.
getting a baseline to work with queries like this? I was suspicious because whilst I can hint it to pick a certain access path first (leading()), the plan hashes remain the same.
I'm not sure, however, if I'm simply "doing it wrong" or it is just not possible with the level of recursive rewriting going on.NB: I consider myself reasonably competent in applying baselines to "traditional" queries...
The below query is utilizing more than 17 Gb temp space. But still it is getting failed out due to insufficient temp space. is there any way to rewrite this query to reduce the temp utilization?
I have two Oracle instances that are setup identically.When I run a query on one of them, it takes around 3 seconds, on the other it takes around 200 seconds.
I have looked at the explain plans, and it has shown me what I think is the problem. On one instance, it does a join on two tables, then runs the other filter/access predicates. On the other instance it runs the filter/access predicated first, then does the expensice join. The one that does the join first is the one that takes around 200 seconds. How to tell Oracle to make this join after runnning the other predicates?
when i runnung the explain plan syntax , show error : running --- SELECT * FROM TABLE(dbms_xplan.display) ; ERROR: an uncaught error in function display has happened; please contact Oracle support Please provide also a DMP file of the used plan table PLAN_TABLE ORA-00904: "OTHER_TAG": 無效的 ID
In a 3-node RAC setup; one node is showing high CPU utilization around 40~50%. The CPU utilization was less than 20% 10 days back but from 9th oldest day it jumped and consistently shows the double figure. I ran AWR reports on all three nodes and found one node with high CPU utilization and shows below tops events-
EVENT WAITS TIME(S) AVG WAIT(MS) %TOTAL CALL TIME WAIT CLASS CPU time 5,802 34.9
RFS ping 15 5,118 33,671 30.8 Other
Log file sequential read 234,831 5,036 21 30.3 System I/O
Sql*Net more data from client 24,1711,08745 6.5 Network
Db file sequential read130,939 4533 2.7 User I/O
Findings:- On AWR report(file attached) for node= sipd207; we can see that "RFS PING" wait event takes 30% of the waits and "log file sequential read" wait event takes 30% of the waits that occurs in database.
1)Are these symptoms of undersized log buffer? 2)I feel Network wait can be reduced by tweaking SDU & TDU values based on MDU.
We are using the 11g AMM feature and Memory_Target set to 96GB and total RAM on the Server is 128GB Now the top and free shows up only 200MB memory free on the system.
There are 2 process dbw0 and dbw1 which consumes the top memory and this is 30GB per dbw.
Why is the dbw process taking up so much memory when there is not much load on the database.
How can I find out the particular oracle session which was consuming high memory in the past?
I can't get the data in v$sessstat Unable to get the information in AWR
dba_hist_active_session_history do not have field which indicate memory related information Shall I concetrate on EVENT in dba_hist_active_session_history which continuosly had sort, direct path read Or Locate sql_id from dba_hist_sqlstat with high SORTS_DELTA for snapshots belonging to problematic time period and then using the sql_id query dba_hist_active_session_history
which approach I shall take to find out the session which consumed most memory in the past?
since the optimizer (during explain plan) assumes all bind variable to be of varchar type, while checking plan for SQL statement using bind variable of numeric and date type shall we convert (typecast) it as following?
variable n_sal number variable dt_joining date exec n_sal:= 1000 exec dt_joining := '12-dec-2005' select first_name from emp_data where sal=to_number(n_sal) and joining=to_date(dt_joining);
I have an APP that truncates tables and loads data, which in turn makes the stats stale. I ran the query advisor (see attachment) and of course it ecommends running stats or accept a profile.I really don't want to do that as it may cause a load on my DB.
In turn, I would like to consider having my APP team change the query to pass a hint to use the best query plan.syntax to pass the hint to emulate good attached plan? Or is this a bad way to proceed?
select /* INDEX FAST FULL SCAN PK_PLACEMENT_REQUEST_QUEUE */ sum(lastshares) as "ROSEN" from nyeo.fix_exec_reports fer, nyeo.placement_request_queue q, nyeo.nyeo_block_control bc where fer.clordid = q.sequence_number and q.blockid = bc.blockid and upper(bc.deskname) like '%ROSEN%'