I have to do the optimization of a query that has the following characteristics:
- Takes 3 hours to process
- Performs the inner join with 30 tables
- Produces an output of 280 million records with 450 fields
First of all it is not feasible to make 30 updates (one for each table) to 280 million records.
The best solution that I had found so far was to create 3 temporary tables, where each of them to do the join with 1/3 of the 30 tables, and in the end I make the join between the main table and these three tables temporary.
I know that you will ask (or maybe not) to the query and samples, but it is impossible to create 30 examples.
how to optimize this type of querys that perform the join with multiple tables and produce a large output with (too) many columns.
I'm looking to see if there is a way (fully expecting it to be an underscore, or two...) to force the optimizer to keep churning until all permutations are exhausted.I'm aware that it, to paraphrase, cuts out when it's spent more time parsing than it would just running it based on it's estimates.
I've got some irritating problems with xml rewrite, xml indexes and access paths/cardinalities etc and I'm really needing the entire thing considered as a one off for debugging this. I've already cranked up the maximum permutations to the max but it's not enough, it shorts out after 5041 permutations (I'd set that to 80000 max).
I know you'd not want to do this in the real world but I cant get the damned thing to run the plan I want in a 10053 so I can see the values it has there. I know I can hint it, but I'm trying to ascertain why it's not even considering it in a "normal" parse.
The product I work on requires a query to tell us what tables are dependent on certain types.
SELECT dba_tab_cols.owner, dba_tab_cols.table_name, dba_tab_cols.data_type_owner, dba_tab_cols.data_type FROM dba_tab_cols JOIN dba_types ON dba_types.owner = dba_tab_cols.data_type_owner AND dba_types.type_name = dba_tab_cols.data_type WHERE (dba_types.owner IN ('SCHEMA1', 'SCHEMA2'......))
I find this query to be pretty slow. I think it is because data_type_owner in dba_tab_cols is not indexed. Adding an index is not an option because users expect our product to read-only.
however I was able to identify a poorly performing query that seemed to be maxing out our CPU. I have been trying to understand the Explain Plan. The plan below is from our test system which has considerably less information in the tables than our PROD system.
I can see there are a bunch of table scans at the end which may indicate missing indexes, but I am unclear on whether this is actually a problem as the %CPU seems to be worse for the JOIN near the top of the plan.
------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1870M| 3018G| | 677M (1)|999:59:59 | | | | 1 | SORT ORDER BY | | 1870M| 3018G| 3567G| 677M (1)|999:59:59 | | |
Having production system: 11.2.0.1 on Windows Server x64 Test system: 9.2.0.1 on Windows XP
Problem preface: to get all unique CASEID which should be checked up by biometric system.What i should check - all CASEs for different PERSONs having same PHONEs at least among one phone type (1..4).Real table contains little bit more than 10 million records.I made test scripts.
Below the DDL for test table creation: ------------------------------------------ -- Create CASEINFO test table ------------------------------------------ DROP TABLE CASEINFO; CREATE TABLE CASEINFO
[code]...
Below i've put SQL/DLL to make test data.number of records inserted 2 millions. PERSON_COUNT := #/8; ------------------------------------------ -- fill CASEINFO with sample data ------------------------------------------ DECLARE I INTEGER;
[code]...
Below SQL select to check the data in created table. ------------------------------------------ -- Check test data counters ------------------------------------------ SELECT 'TOTAL',count(*) from CASEINFO UNION ALL SELECT 'LEGAL',count(*) from CASEINFO where
[code]...
The PROBLEM is that i am experiencing HUGE perfomance problems on both test and production systems with that query:
select distinct b.caseid from CASEINFO a, CASEINFO b where (a.person<>b.person) and (a.sex=b.sex) and ( (a.phone1=b.phone1) or (a.phone1=b.phone2) or (a.phone1=b.phone3) or
[code]...
This query takes almost 90 minutes to execute.And i do not know how to avoid this.Full SQL file to make test attached.
I want to make sure I am describing correctly what happens in a query where there is distributed database access and it is participating in a NESTED LOOPS JOIN. Below is an example query, the query plan output, and the remote SQL information for such a case. Of particular note are line#4 (NESTED LOOPS) and line#11 (REMOTE TABLE_0002).
What I want to know is more detail on how this NESTED LOOPS JOIN handles the remote operation. For example, for each row that comes out of line#5 and is thus going into the NESTED LOOPS JOIN operation @line#4, does the database jump across the network to do the remote loopkup? Thus if there are 1 million rows, does that mean 1 million network hops? Does batchsize play a role? For example, if the database batches in groups of 100 then does that mean 10 thousand network hops?
I think each row that comes out of line#5 means a network hop to the remote database. But I do not know for a fact.I have done some abbreviating in the plan in an attempt to make it fit on the page (line#7 TA = TABLE ACCESS).
explain plan for select count(*) from orders, lineitem where o_orderkey= l_orderkey.
The trace 10053 (as shown below) for this query shows nested loop join with Lineitem as outer table and Orders as inner table. It is effectively join on composite index (pk_lineitem) of Lineitem and unique index(Pk_orderkey) of Orders table. The cost calculation formula as given in the book as "outer table cost + cardinality of outer table * inner table cost " fails here. I am not able to understand this.
I want to know how the Oracle optimizer choose joins and apply them while executing the query. So that I will insure about optimizer join before writing any query.
We have a procedure, which do truncate to some of the tables. Most of the time it finished in short of spam of time. But from last few days, it is taking much longer time.
So our situation is pretty simple. We have 3 tables.
A, B and C
the model is A->>B->>C
Currently A, B and C are range partitioned on a key created_date however it's typical that only C is every qualfied with created date. There is a foreign key from B -> A and C -> Bhave many queries where the data is identified by state that is indexed currently non partitioned on columns in A ... there are also indexes on the foreign keys that get from C -> B -> A. Again these are non partitioned indexes at this time.
It is typical that we qualifier A on either account or user or both. There are indexes (non partitioned on these) We have a problem with now because many of the queries use leading wildcards ie. account like '%ACCOUNT' etc. This often results in large full table scans. Our solution has been to remove the leading wildcard.
We are wondering how we can benefit from partitioning and or sub partitioning table A. since it's partitioned on created_date but rarely qualified by that. We are also wondering where and how we can benefit from either global partitioned index or local partitioned indexes on tables A. We suspect that the index on the foreign key from C to B could be a local partitioned index.
i am using 11.2.0.3.0 version of oracle. We are planning to move some ~40 tables/indexes to new encrypted tablespace as a part of TDE(transparent data encryption). Currently three tables are having size ~30GB and one having ~800GB other have <2GB in size. And tables/indexes are altogether placed in different tablespaces.
whether i should create as many no of encrypted table spaces as it was before as unencrypted tablespace? or I should create one encrypted tablespace and move all the tables/indexes into that?
I have tried a lot by alternate solutions like rearranging the order of tables in join and moving where conditions before but no success...Its a bottleneck and I could not have indexes on these tables in production...I want to change the approach in subquery
SELECT g.COLUMN1, g.COLUMN2, e.COLUMN3, g.COLUMN4, MIN(e.dat1) KEEP ( DENSE_RANK FIRST ORDER BY date2 Desc) * -1, min(to_char(date3,'dd-mm-yyyy')) [code]....
Ways for improving the Table performance which holds million of records for oracle. Currently we have partitioning and indexing but it doesn't seem to work.
SELECT department_id FROM (SELECT department_id FROM employees UNION SELECT department_id FROM employees_old ) WHERE department_id=100; [code]....
The index has been created on both depart_id for the two tables. The only difference between the two I observed was the 1 recursive call for the 1st sql.and also, one additional view in the plan.There is a little difference in bytes sent over the network.
The scale of the tests that generate the following scenario is not huge right now, only 50 users simulated (or you can think of them as independently running threads if you like). But here is the crunch, the queries generated (from generic transaction layer) are all running against a table that has 600 columns! We can't really control this right now, but this is causing masses amounts of IO (5GB per request) making requests queue for disk availability (which are setup RAID 0/1); its even noticable for as few as 3 threads.
I have rendered the SQL on one occasion to execute in 13 seconds for a single user but this appears short lived as when stats were freshly gathered it went up to the normal 90-120 seconds. I've added the original query to the file, however the findings here along with our DBA (who I trust implicitly) suggest that no amount of editing the query will improve the response times, increasing the PGA/SGA (currently 4/6GB respectively) will only delay the queuing for a bit and compression can work either. In short it looks as though we've hit hardware restrictions already for this particular scenario.
As I can't really explain how my rendered query no longer takes 13 seconds, it's niggling me that we might be missing a trick.So I was hoping for some guidance on possible ways of optimising these type of queries against such wide tables, in other words possibilities that we haven't considered...
I have a view on base tables holding historical data for previous 60 months(one table per month) with union all operators.create index on those base tables will improve performance or creating a primary key with disabled novalidate will improve for retrieving data?
The view has around 8 million data and used as a fact table with 4 dimension tables.A DTS package from MSSql side refreshes OLAP cube by retrieving data from these tables in oracle.
We have a huge table in production, with LONG column. We are trying to change its datatype to CLOB. The table has 120 Million records and is of 270 GB in size.
We tried using the oracle expdp/impdp option to try the conversion in our perf environment. With 32 parallels, the export completed in 1.5 hrs. However, the import took 13 hrs.
I also tried the to_lob option using inserts, it went on for 20 hrs and I killed the process. Are there any ways to improve the performance of LONG to CLOB conversion on huge tables?
Create small functional indexes for special cases in very large tables.
When there is a column having one values in 99% records and another values that have to be search for, it is possible to create an index using null value. Index will be small and the rebuild fast.
Example
create index vh_tst_decode_ind_if1 on vh_tst_decode_ind (decode(S,'I','I',null),style)
It is possible to do index more selective when the key is updated and there are many records to create more levels in b-tree.
create index vh_tst_decode_ind_if3 on vh_tst_decode_ind (decode(S,'I','I',null), decode(S,'I',style,null) )
To access the record can by like:
SQL> select --+ index(vh_tst_decode_ind_if3) 2 style ,count(*) 3 from vh_tst_decode_ind 4 where 5 decode(S,'I','I',null)='I' 6 group by style 7 ;
Looking to understand the difference between instance tuning and database tuning.
What is the difference between these two tuning exercises? I understand that an instance is memory based structures (logical) where as database consists of physical structures.
However, how does one tune a database the physical structure? Does it have to do with file placements/block sizes etc. Would you agree that a lot of that is taken care by ASM now in 11g? What tools are required/available (third party as well as oracle supplied) for these types of tuning scenarios?
I have two tables with 113M records in DWH_BILL_DET & 103M in prd_rerate_chg_que and Im running following merge query, which is running for 13 hrs to update records, which is quiet longer time.
SQL> explain plan for MERGE /*+ parallel (rq, 16) */ INTO DWH_BILL_DET rq USING (SELECT rated_que_rowid, detail_rerate_flag_code, rerate_sel_key,
I have 2 tables SEC_MASTER_HISTA and SEC_MASTER_HISTB.
Now, I need to compare the data of the two tables column-wise.
Ideally the 2 tables should have the same security_alias values but in my case they do not as the two tables belong to 2 diff client models. There is however a main SECURITY_MASTERA and SECURITY_MASTERB tables which have the security_alias recorded and a primary_asset_id column value which can act as a link between SEC_MASTER_HISTA and SEC_MASTER_HISTB. But, I have not been able to figure out the exact query which will be ideal.
Attached are the table structures and the data it contains.
Note: I need to compare the Coupon and Freq column values of SEC_MASTER_HISTA and SEC_MASTER_HISTB.
How the length of column width effects index performance?
For example if i had IOT table emp_iot with columns: (id number, job varchar2(20), time date, plan number)
Table key consist of(id, job, time)
Column JOB has fixed list of distinct values ('ANALYST', 'NIGHT_WORKED', etc...).
What performance increase i could expect if in column "job" i would store not names but concrete numbers identifying job names. For e.g. i would store "1" instead 'ANALYST' and "2" instead 'NIGHT_WORKED'.