I am in the very early planning stages of a project the goal of which is to identify separate organizations which may in fact be the same organization.
Our first implementation of this task was a process designed to look for a few thousand organizations in a pool of a few hundred thousand organizations. To accomplish this we made heavy use of Oracle's Text index as well as a custom index type we created which utilized n-grams. This approach worked quite well for on-demand editing of the organizations, in which a user might log in and say in addition to what we already know about organization A we also know x, y and z does that change anything and worked acceptably well for the bulk processing we did on our "known" information once a week running for a couple of hours on the weekend.
We have now been tasked with reworking this initial implementation only now we want to look at a set consisting of several million organizations for potential matches which exist within the set. As in our initial implementation we will be breaking what we know about organizations into groupings so we aren't comparing a phone number to an email address and normalizing the data as much as we can so we ignore things like case and punctuation. Even after all this we are still talking about looking for similar values in a group which might be in the tens of millions (some types of data will have more than one value per organization).
My initial thought on the problem is to use n-grams though not in the way we did in the past. The basic idea here is that we break the search values up into all the substrings it is made of and look for other values which have a high number of those substrings in common.
SQL & PL/SQL was the best place for the question, but I could not think of a better one.
I have 15 million of records as csv, want to load through sqlloader Is sqlloader is the right option to load high volume of data? I have loaded with 2.5 lac records which has taken 4 mins to load.
The prod stats has been implemented in development. The stats has been gathered 2 months back on dev while in production the stats has been gathered 2 weeks back.
My question shouldn't the high volume of data causes changes in plan in both the environment? My thinking is that plan can be different as the high volume of data are changing in prod it may lead to a different plan.
I have a query in oracle report in which i am getting this output.Manager Arnav have 2 employees Inder and kaushal whose salary is 10000 and 20000 respectively,
And another manager is Anjali whose employees are Kavya and inder whose salary is 40000 and 10000 respectively .as Inder is repeated I want the salary become 0 in place of 10000 second time.I am in dilemma,What should i do ,if i want to change 10000 to 0 Manager employee salary Arnav Tiwari Inder 10000 kaushal 20000 Anjali Kavya 40000 Inder 10000[/b]
What should i do in the formula of salary.according to employee name .means if Name exists already then salary value should be 0 and if it comes for the 1st time then its actual value i.e 10000 should be printed.
I have a question about the ADD VOLUME command, I can't understand the difference between ADD DISK and ADD VOLUME.What are the difference between them?When should I use each one?How can I control the stripping and mirroring (NORMAL and HIGH) adding VOLUMES in a DISKGROUP?Can I add a volume to a fail group?
I was asked by my systems administrator if I could tell him how much redo log volume, on average, do I figure we generate in a day?
Just wondering how I might calculate this?
We have several production databases. If I wanted to calculate the above for one of them, would it be take all the redo logs for a day and total up the size in bytes? Maybe take a 5 day work week and take the average over the 5 days?
We have Root disk on all the unix servers and is mirrored using SVM using the internal disks. Also VxVM 5.0 MP3 is used as the volume manager to manage the SAN space allocated to the servers. On some instances, SVM is also used for allocating filesystem for application and database
We are required to move all the files from SVM to VxVM.
In the process we are Copying our Oracle binaries from SVM to VXVM.I am planning to use the CPIO command to do the same.Once i move to VXVM (Veritas Volume Manager), i will bring down the SVM.
I need to calculate the redo log volume generated by certain tables. If I have 100 tables in the database I need to know only 25 tables redolog vloume per day. How I can calculate this , Is the log miner useful on this issue.
I am facing a problem with utl_http.write_text in my pl/sql application. My requirement is to write data of size>32k. So I used a clob variable in write_text. But still it is showing numeric or value error when the data size is above 8k.
I have read that chunked transfer encoding will work. But I couldn't find out how this is done.
I have a requirement to get the records group wise.Ex: For each departments, i need to get the employee details as a coma seperated.It means that the output must have the department name in first column and the second column must contain all the employees in that particular department (As a coma seperated).
In the below data, a container is moving from one city to another. 1,2 ,3 can be any number which i want to generate and use as keys to group the cities. Eg: AUH, JEB, CIW belong to the same key=2; SIN, IKT belong to a new group 4. The City where difference between the Seq# is greater than 1 (eg between S8W and AUH), a new group starts.
Name Hours date a810/11/2011 a 510/12/2011 a610/13/2011 a710/14/2011 a710/15/2011 a810/16/2011 a710/17/2011 a810/18/2011 a810/19/2011 a710/20/2011 a710/21/2011
If i want the sum of hours for 3 days range ,how should i do it.
E.g. say
name hrs startdate enddate
a 19 10/11/2011 10/13/2011 a 22 10/14/201110/16/2011 a 23 10/17/2011 10/19/2011
How can I get the grp_id for unique combination of manager and department, grp_id should be created on asc order of manager_id. In this example manager_id 100 is minimum, so it should be grp 1 and all the employees with that manager_id should be in grp_id 1, for manager_id 114 grp_id should be 2.
If, there is manager_id 117, it should create grp_id 3.
To get grp_num ,I can use row_number() over (partition by department_id,manager_id order by employee_id) grp_num
I am looking for an update statement for this issue.
And the problem is, that when i use sutp_price_proc and pbk_price in grouping, it splits my results by those rows. If i delete them from grouping, sql gives me error about not a single grouping in line 1.
pas_codepas_profilesutp_idsutp_pricex 2664good stuff310069< because pbk_price is like 67 from that period 2664good stuff310071< because pbk_price is like 50 from other period
I thought I would make use of the following query, but I am not getting the proper results when applying it to a real table with more than 20 mln records:
SELECT trunc(R_DUR/6)*5+1 as range_start_rdur, trunc(R_DUR/6)*5+5 range_end_rdur, sum(noofan) as no_of_an, sum(sumofrdur) as sum_of_rdur, sum(sumofchdur) as sum_of_chdur, [Code] ...........
i have a table where there are codes of length 6 or length 12 some times, i need to add the summary of amount based on two different types of codes, one is adding the distinct codes which are of 6 char long and other sum will be based on from substr(7) till last.
create table strings ( strings_var varchar2(12),strings_amt number); insert into strings (strings_var,strings_amt) values ('02.01',10 ); insert into strings (strings_var,strings_amt) values ('02.01_A11111',15); insert into strings (strings_var,strings_amt) values ('02.02_A11111',15); insert into strings (strings_var,strings_amt) values ('03.01_B11111',15); insert into strings (strings_var,strings_amt) values ('03.02_B11111',15);
In my schema the employees table has a number of 55 rows in department_id 30.
How can I spit the employees table into views group by department_id as
- one view with no more than 55 rows (this view will contain only a department) - another view with more departments but whose number of rows is not > 55 but can contains 2 department_id (e.g.: 9, 10 and the sum of these rows is 43 but if I would like to bring another department the rows count will be > 55)
Allow me to preface this with the notice that I am not familiar with XML outside of its hierarchical structure, and am not familiar with what you can do with it using formatting.
As an example, let us say you have the following table:
Object_Type | Object_Name | Descriptor | ------------------------------------------------------------ Fruit | Apple | Crunchy | Fruit | Orange | Sour | Utensil | Pencil | Wooden |
Now let's say you want to query this table to return an XML format, which will be used in a web site to display the information, and you want to group the display by Object_Type, so that you want an XML format like this:
Is it possible to group it in a way so that Apple and Orange end up in the the same <Object Group>? Or is that meaningless and such grouping can be done on the web site itself by formatting the XML?
When a GROUP BY clause contains multiple columns, which grouping is the most major grouping?What puzzled me was, I never knew there was such a thing as a "most major grouping" in a GROUP BY clause. Anyway, the answer:
the first column listed in the GROUP BY clause . what this means in practice? It must mean something different to your bog standard "select sum(order value) from sales group by city,country,region" because in that case, I can't see how city has any more or less relevance to the query than region.
I am trying to write a single SELECT statement that groups at 2 levels of aggregation (using grouping sets) and assigns row numbers (to rank each item) that are partitioned at the correct level for each grouping set. I have the grouping sets figured out but I can't find a way to make Partition By match each level of aggregation.
What I am looking for (in a single SELECT statement) is logically equivalent to:
SELECTweek ,region ,NULL as country ,item ,SUM(qty) ,ROW_NUMBER OVER (PARTITION BY week, region ORDER BY SUM(qty) DESC) as rownumFROM base GROUP BY week ,region ,item UNION ALL
SELECTweek ,NULL as region ,country ,item ,SUM(qty) ,ROW_NUMBER OVER (PARTITION BY week, country ORDER BY SUM(qty) DESC) as rownumFROM base GROUP BY week ,country ,item
I hoped that I could do something like this:
SELECTweek ,region ,country ,item ,SUM(qty) ,ROW_NUMBER OVER (PARTITION BY week, GROUPING SETS (region, country) ORDER BY SUM(qty) DESC) as rownumFROM base GROUP BY week ,GROUPING SETS (region, country) ,item
But it looks like I am not allowed to partition by grouping sets -- I get the error ORA-00907: missing right parenthesis. I didn't expect it to work but I am not sure how else to partition by multiple levels.
let me know if I could have tagged my code or met other forum standards better.
I have a query on displaying data as per my requirement. I have created a table called sales it has four columns
create table sales(country,state,district,sales); and am inserting some same data
insert into sales('india','TN','Chennai',100); insert into sales('india','TN','KPURAM',120); insert into sales('india','TN','Bangalore',35); insert into sales('india','ANDR','Guinder',100); insert into sales('india','ANDR','Nellai',76); insert into sales('london','city-a','xstreet',89); insert into sales('london','city-a','binroad',100);
select * from sales;
country state district sales india TN chennai 100 india TN KPURAM 120 india TN Bangalore 35 india ANDR Guinder 100 india ANDR Nellai 76 london city-a xstreet 89 london city-a binroad 100
the data is displayed in this format. How i am trying to display data.
I have to get totals from a table using different criteria, which I do like this:
<QUERY> SELECT DISTINCT SUM(CASE WHEN MYCONDITION1 THEN 1 ELSE 0 END) AS TOTAL1, SUM(CASE WHEN MYCONDITION2 THEN 1 ELSE 0 END) AS TOTAL2 FROM TABLE1, TABLE2 WHERE COMMON_CONDITION1 AND COMMON_CONDITION2 AND datevalue1 >= DATE1 AND datevalue1 <= DATE2; <QUERY>
This works fine and I get the intended result.Now, I have to repeat this for every week for the last 12 months, excluding holidays period. So, I generate a set of date ranges which will be used in the queries. So, I repeat the above sql statement for all the date ranges, which is a lengthy process.How can I do that in a single shot and get all totals for each date range.
The data in the table was imported from a csv file and there is a relationship between the rows. Each combination of col1, col2 and col3 describes a full route of a journey. The row with an entry in col6 describes the full route and the other rows describes each leg in the route.
For example, for R1, the route is AA to BB via CC. Another example for R4 the route is FF to SS via XX, PP, and OO.
What i would like to do is missing a route. For example the route for R3 is DD to EE via FF. There is an entry for DD to FF but is missing an entry for FF to EE.
The results should return the following rows which are incomplete
Here is what i have come up with but it doesnt quite returned the correct result.
select * from tableA a Where not exists( select 1 from tableA b where instr(col6,col4,1)>0 and instr(col6,col1,1)>0) And a.col1=b.col1 And a.col2=b.col2 And a.col3=b.col3 )
I have a one column table that store lists of elements :
create table test_table (c1 VARCHAR2(4000)); insert into test_table values ('1,23'); insert into test_table values ('1,2'); insert into test_table values ('3,4,5');
[code]...
The output column would be something like that:
output_column 1,2,7,23 6,9,0 3,4,5
I'm grouping columns that have at least one element in common.
(1,23) and (1,2) merge into : (1,2,23) (1,2,23) and (7,2) merge into : (1,2,7,23) --> Output (6,9) and (9,0) merge into : (6,9,0) --> Output (3,4,5) and (5,5) merge into : (3,5,5) --> Output
I have made this logic using only PL/SQL, with loops and nested tables using function memberof, but I suppose that there is a way to improve the performance using only SQL.