SQL & PL/SQL :: High Volume Similarity Grouping
Nov 15, 2010
I am in the very early planning stages of a project the goal of which is to identify separate organizations which may in fact be the same organization.
Our first implementation of this task was a process designed to look for a few thousand organizations in a pool of a few hundred thousand organizations. To accomplish this we made heavy use of Oracle's Text index as well as a custom index type we created which utilized n-grams. This approach worked quite well for on-demand editing of the organizations, in which a user might log in and say in addition to what we already know about organization A we also know x, y and z does that change anything and worked acceptably well for the bulk processing we did on our "known" information once a week running for a couple of hours on the weekend.
We have now been tasked with reworking this initial implementation only now we want to look at a set consisting of several million organizations for potential matches which exist within the set. As in our initial implementation we will be breaking what we know about organizations into groupings so we aren't comparing a phone number to an email address and normalizing the data as much as we can so we ignore things like case and punctuation. Even after all this we are still talking about looking for similar values in a group which might be in the tens of millions (some types of data will have more than one value per organization).
My initial thought on the problem is to use n-grams though not in the way we did in the past. The basic idea here is that we break the search values up into all the substrings it is made of and look for other values which have a high number of those substrings in common.
SQL & PL/SQL was the best place for the question, but I could not think of a better one.
View 10 Replies
ADVERTISEMENT
Jun 6, 2013
I have 15 million of records as csv, want to load through sqlloader Is sqlloader is the right option to load high volume of data? I have loaded with 2.5 lac records which has taken 4 mins to load.
View 9 Replies
View Related
Jun 4, 2010
The prod stats has been implemented in development. The stats has been gathered 2 months back on dev while in production the stats has been gathered 2 weeks back.
My question shouldn't the high volume of data causes changes in plan in both the environment? My thinking is that plan can be different as the high volume of data are changing in prod it may lead to a different plan.
View 6 Replies
View Related
Jun 3, 2013
I have a query in oracle report in which i am getting this output.Manager Arnav have 2 employees Inder and kaushal whose salary is 10000 and 20000 respectively,
And another manager is Anjali whose employees are Kavya and inder whose salary is 40000 and 10000 respectively .as Inder is repeated I want the salary become 0 in place of 10000 second time.I am in dilemma,What should i do ,if i want to change 10000 to 0
Manager employee salary
Arnav Tiwari Inder 10000
kaushal 20000
Anjali Kavya 40000
Inder 10000[/b]
What should i do in the formula of salary.according to employee name .means if Name exists already then salary value should be 0 and if it comes for the 1st time then its actual value i.e 10000 should be printed.
View 1 Replies
View Related
May 27, 2013
I have a question about the ADD VOLUME command, I can't understand the difference between ADD DISK and ADD VOLUME.What are the difference between them?When should I use each one?How can I control the stripping and mirroring (NORMAL and HIGH) adding VOLUMES in a DISKGROUP?Can I add a volume to a fail group?
View 1 Replies
View Related
May 22, 2011
I was asked by my systems administrator if I could tell him how much redo log volume, on average, do I figure we generate in a day?
Just wondering how I might calculate this?
We have several production databases. If I wanted to calculate the above for one of them, would it be take all the redo logs for a day and total up the size in bytes? Maybe take a 5 day work week and take the average over the 5 days?
View 4 Replies
View Related
Dec 15, 2010
We have Root disk on all the unix servers and is mirrored using SVM using the internal disks. Also VxVM 5.0 MP3 is used as the volume manager to manage the SAN space allocated to the servers. On some instances, SVM is also used for allocating filesystem for application and database
We are required to move all the files from SVM to VxVM.
In the process we are Copying our Oracle binaries from SVM to VXVM.I am planning to use the CPIO command to do the same.Once i move to VXVM (Veritas Volume Manager), i will bring down the SVM.
View 2 Replies
View Related
Mar 15, 2013
I need to calculate the redo log volume generated by certain tables. If I have 100 tables in the database I need to know only 25 tables redolog vloume per day. How I can calculate this , Is the log miner useful on this issue.
View 6 Replies
View Related
Oct 6, 2010
I am facing a problem with utl_http.write_text in my pl/sql application. My requirement is to write data of size>32k. So I used a clob variable in write_text. But still it is showing numeric or value error when the data size is above 8k.
I have read that chunked transfer encoding will work. But I couldn't find out how this is done.
View 5 Replies
View Related
Oct 6, 2009
I am having problem in inserting a long file (80000 B) into CLOB cloumn in oracle database 10g.
ERROR at line 23:
ORA-06550: line 23, column 1:
PLS-00172: string literal too long
I am calling the oracle stored procedure through unix shell script as fallows.
#!/bin/ksh
cd /usr/home/dfusr/backup
i=`cat pe_proxy_master_2160.Thu.log | sed "s/'/''/g"`
sqlplus username/password@db 1> temp.log <<!
DECLARE
BEGIN
Write_Text_To_CLOB(1,'$i');
COMMIT;
END;
/
The file pe_proxy_master_2160.Thu.log is about 50000 B.
My Stored Procedure is fallows
CREATE OR REPLACE PROCEDURE Write_Text_To_CLOB (
p_id IN NUMBER
, p_clob IN VARCHAR2
)
IS
[code]....
I also tried with ojdbc5.jar and ojdbc14.jar files in class path.
View 1 Replies
View Related
Mar 4, 2013
The db is 11.2.0.3 on a linux machine.
Is it possible to create a raw device on a logical volume WITHOUT cluster service from os vendor such as HACMP(AIX), Service Guard(HP)?
The following is os information.
[aa@mm01(abc):/home/abc] uname -a
Linux aa 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[aa@mm01(abc):/home/abc]
View 3 Replies
View Related
Aug 19, 2013
I have a requirement to get the records group wise.Ex: For each departments, i need to get the employee details as a coma seperated.It means that the output must have the department name in first column and the second column must contain all the employees in that particular department (As a coma seperated).
View 3 Replies
View Related
Mar 27, 2013
In the below data, a container is moving from one city to another. 1,2 ,3 can be any number which i want to generate and use as keys to group the cities. Eg: AUH, JEB, CIW belong to the same key=2; SIN, IKT belong to a new group 4. The City where difference between the Seq# is greater than 1 (eg between S8W and AUH), a new group starts.
Conotainer #CitySeqI want this
-------------------------------------------
Container1S8W5251
Container1S8W5261
Container1AUH5362
Container1AUH5372
Container1JEB5382
Container1JEB5392
[code]....
View 4 Replies
View Related
Oct 14, 2011
I have a table like this
Name Hours date
a810/11/2011
a 510/12/2011
a610/13/2011
a710/14/2011
a710/15/2011
a810/16/2011
a710/17/2011
a810/18/2011
a810/19/2011
a710/20/2011
a710/21/2011
If i want the sum of hours for 3 days range ,how should i do it.
E.g. say
name hrs startdate enddate
a 19 10/11/2011 10/13/2011
a 22 10/14/201110/16/2011
a 23 10/17/2011 10/19/2011
View 11 Replies
View Related
Feb 10, 2013
How can I get the grp_id for unique combination of manager and department, grp_id should be created on asc order of manager_id.
In this example manager_id 100 is minimum, so it should be grp 1 and all the employees with that manager_id should be in grp_id 1, for manager_id 114 grp_id should be 2.
If, there is manager_id 117, it should create grp_id 3.
To get grp_num ,I can use row_number() over (partition by department_id,manager_id order by employee_id) grp_num
I am looking for an update statement for this issue.
Oracle version : Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
CREATE TABLE HR.EMPLOYEES_2
(
EMPLOYEE_ID NUMBER(6),
FIRST_NAME VARCHAR2(20 BYTE),
LAST_NAME VARCHAR2(25 BYTE),
EMAIL VARCHAR2(25 BYTE),
PHONE_NUMBER VARCHAR2(20 BYTE),
[Code]....
Expected result
----------------
EMPLOYEE_ID SALARY MANAGER_ID DEPARTMENT_ID GRP_NUM GRP_ID
114 11000 100 30 1 1
115 3100 100 30 2 1
116 2900 114 30 1 2
117 2800 114 30 2 2
118 2600 114 30 3 2
119 2500 114 30 4 2
View 8 Replies
View Related
May 19, 2013
I am trying to break down the balance_date to display the following groupings:
7:00-17:30 CDT
18:00-4:30 CDT
I currently have the query setup to display by day instead of these time ranges. I would like the output to read
19 May Day
19 May Night
20 May Day
20 May Night
I am fairly new to this, but how would I go about making this change?
SELECT
TO_CHAR(TRUNC(balance_date,'D') + 4,'YYYY') || '-' ||
TO_CHAR(TRUNC(balance_date,'D') + 4,'IW') as year_wk,
TO_CHAR(TRUNC(balance_date,'D') + 4,'IW')as wk,
[Code] ........
View 1 Replies
View Related
May 31, 2012
Am getting an output like this in oracle9i,
S.No Column1 Column2 Column3 DateCol
1 A B C 10/2001
2 A B C 03/2001
3 B B C 02/2001
4 B B C 01/2001
5 A B C 03/2000
But My real scenario is , i need to populate the output in below structure
S.No Column1 Column2 Column3 DateCol
1 A B C 10/2001
A B C 03/2001
2 B B C 02/2001
B B C 01/2001
3 A B C 03/2000
I dont know how to form the query , to retrieve the structure..
View 2 Replies
View Related
Jun 6, 2013
SELECT
pas_code,
pas_profile,
count(sutp_id),
sum(sutp_price),
[code]...
And the problem is, that when i use sutp_price_proc and pbk_price in grouping, it splits my results by those rows. If i delete them from grouping, sql gives me error about not a single grouping in line 1.
pas_codepas_profilesutp_idsutp_pricex
2664good stuff310069< because pbk_price is like 67 from that period
2664good stuff310071< because pbk_price is like 50 from other period
how to get all results in a single line like:
pas_codepas_profilesutp_idsutp_pricex
2664good stuff6200140
View 3 Replies
View Related
Feb 24, 2010
TASK_ID, TASK_STATUS, TASK_OWNER
================================
00001 , OPEN , ABC
00002 , OPEN , XYZ
00003 , WIP , ABC
00004 , CLOSED , XYZ
00005 , WIP , XYZ
00006 , CLOSED , XYZ
00007 , OPEN , XYZ
Output Required
Owner , Open , WIP, Closed
ABC 1 1 0
XYZ 2 1 2
View 9 Replies
View Related
Oct 21, 2010
I have this table,
Create table TBL_OK_HIST
(
DATE_KEY NUMBER,
A_N VARCHAR2(22 BYTE),
R_DUR VARCHAR2(8 BYTE),
CH_DUR VARCHAR2(8 BYTE),
REV VARCHAR2(20 BYTE)
)
insert into TBL_OK_HIST
values (20101010,123456768,5,20,2);
insert into TBL_OK_HIST
values (20101010,123496568,15,20,2);
insert into TBL_OK_HIST
values (20101012,122235768,25,25,3);
[Code] ......
Thus, applying the following would yeld:
Select * from TBL_OK_HIST
DATE_KEYA_N R_DUR CH_DUR REV
201010101234567685202
2010101012349656815202
2010101212223576825253
201010112345676819202
2010101234567681252527
2010101323456768136365
2010101056768123411202
2010101134681256717202
2010101068123456755559
generate the following results:
range_start_rdur range_end_rdur no_of_an sum_of_rdur sum_of_chdur sum_of_rev
1 5 1 5 20 2
6 10 1 9 20 2
11 15 2 26 40 4
16 20 1 17 20 2
21 25 1 25 25 3
26 30 0 0 0 0
31 35 0 0 0 0
36 40 1 36 36 5
41 45 0 0 0 0
46 50 0 0 0 0
51 55 2 107 107 16
I thought I would make use of the following query, but I am not getting the proper results when applying it to a real table with more than 20 mln records:
SELECT trunc(R_DUR/6)*5+1 as range_start_rdur,
trunc(R_DUR/6)*5+5 range_end_rdur,
sum(noofan) as no_of_an,
sum(sumofrdur) as sum_of_rdur,
sum(sumofchdur) as sum_of_chdur,
[Code] ...........
View 1 Replies
View Related
Aug 4, 2013
i have a table where there are codes of length 6 or length 12 some times, i need to add the summary of amount based on two different types of codes, one is adding the distinct codes which are of 6 char long and other sum will be based on from substr(7) till last.
create table strings ( strings_var varchar2(12),strings_amt number);
insert into strings (strings_var,strings_amt) values ('02.01',10 );
insert into strings (strings_var,strings_amt) values ('02.01_A11111',15);
insert into strings (strings_var,strings_amt) values ('02.02_A11111',15);
insert into strings (strings_var,strings_amt) values ('03.01_B11111',15);
insert into strings (strings_var,strings_amt) values ('03.02_B11111',15);
the output which i want is as below.
string value
'02.01' 10
'A11111' 30
'B11111' 30
View 4 Replies
View Related
Aug 29, 2012
In my schema the employees table has a number of 55 rows in department_id 30.
How can I spit the employees table into views group by department_id as
- one view with no more than 55 rows (this view will contain only a department)
- another view with more departments but whose number of rows is not > 55 but can contains 2 department_id (e.g.: 9, 10 and the sum of these rows is 43 but if I would like to bring another department the rows count will be > 55)
View 5 Replies
View Related
May 14, 2013
Allow me to preface this with the notice that I am not familiar with XML outside of its hierarchical structure, and am not familiar with what you can do with it using formatting.
As an example, let us say you have the following table:
Object_Type | Object_Name | Descriptor |
------------------------------------------------------------
Fruit | Apple | Crunchy |
Fruit | Orange | Sour |
Utensil | Pencil | Wooden |
Now let's say you want to query this table to return an XML format, which will be used in a web site to display the information, and you want to group the display by Object_Type, so that you want an XML format like this:
<Object Group>
<Object Type>Fruit</Object Type>
<Object>
<Object Name>Apple</Object Name>
<Descriptor>Crunchy</Descriptor>
[code]........
However, from what I can tell, using the XMLELEMENT function, it appears the closest I can get is following:
SELECT XMLELEMENT("Object Group",
XMLELEMENT("Object Type", object_type),
XMLELEMENT("Object",
XMLELEMENT("Object Name", object_name),
XMLELEMENT("Descriptor", descriptor)
)
)
FROM object_tbl;
<Object Group>
<Object Type>Fruit</Object Type>
<Object>
<Object Name>Apple</Object Name>
[code].........
Is it possible to group it in a way so that Apple and Orange end up in the the same <Object Group>? Or is that meaningless and such grouping can be done on the web site itself by formatting the XML?
View 5 Replies
View Related
Jun 4, 2013
When a GROUP BY clause contains multiple columns, which grouping is the most major grouping?What puzzled me was, I never knew there was such a thing as a "most major grouping" in a GROUP BY clause. Anyway, the answer:
the first column listed in the GROUP BY clause . what this means in practice? It must mean something different to your bog standard "select sum(order value) from sales group by city,country,region" because in that case, I can't see how city has any more or less relevance to the query than region.
View 11 Replies
View Related
Jan 2, 2013
I am trying to write a single SELECT statement that groups at 2 levels of aggregation (using grouping sets) and assigns row numbers (to rank each item) that are partitioned at the correct level for each grouping set. I have the grouping sets figured out but I can't find a way to make Partition By match each level of aggregation.
What I am looking for (in a single SELECT statement) is logically equivalent to:
SELECTweek
,region
,NULL as country
,item
,SUM(qty)
,ROW_NUMBER OVER (PARTITION BY week, region ORDER BY SUM(qty) DESC) as rownumFROM base
GROUP BY week
,region
,item
UNION ALL
SELECTweek
,NULL as region
,country
,item
,SUM(qty)
,ROW_NUMBER OVER (PARTITION BY week, country ORDER BY SUM(qty) DESC) as rownumFROM base
GROUP BY week
,country
,item
I hoped that I could do something like this:
SELECTweek
,region
,country
,item
,SUM(qty)
,ROW_NUMBER OVER (PARTITION BY week, GROUPING SETS (region, country) ORDER BY SUM(qty) DESC) as rownumFROM base
GROUP BY week
,GROUPING SETS (region, country)
,item
But it looks like I am not allowed to partition by grouping sets -- I get the error ORA-00907: missing right parenthesis. I didn't expect it to work but I am not sure how else to partition by multiple levels.
let me know if I could have tagged my code or met other forum standards better.
View 1 Replies
View Related
Oct 13, 2010
I have a query on displaying data as per my requirement. I have created a table called sales it has four columns
create table sales(country,state,district,sales);
and am inserting some same data
insert into sales('india','TN','Chennai',100);
insert into sales('india','TN','KPURAM',120);
insert into sales('india','TN','Bangalore',35);
insert into sales('india','ANDR','Guinder',100);
insert into sales('india','ANDR','Nellai',76);
insert into sales('london','city-a','xstreet',89);
insert into sales('london','city-a','binroad',100);
select * from sales;
country state district sales
india TN chennai 100
india TN KPURAM 120
india TN Bangalore 35
india ANDR Guinder 100
india ANDR Nellai 76
london city-a xstreet 89
london city-a binroad 100
the data is displayed in this format. How i am trying to display data.
View 5 Replies
View Related
Feb 20, 2012
I need to query by fiscal year wise.
For example
1-Apr-2010 to 31-Mar-2011
View 11 Replies
View Related
Oct 7, 2010
I have to get totals from a table using different criteria, which I do like this:
<QUERY>
SELECT DISTINCT
SUM(CASE WHEN MYCONDITION1 THEN 1 ELSE 0 END) AS TOTAL1,
SUM(CASE WHEN MYCONDITION2 THEN 1 ELSE 0 END) AS TOTAL2
FROM TABLE1, TABLE2
WHERE COMMON_CONDITION1 AND COMMON_CONDITION2
AND datevalue1 >= DATE1 AND datevalue1 <= DATE2;
<QUERY>
This works fine and I get the intended result.Now, I have to repeat this for every week for the last 12 months, excluding holidays period. So, I generate a set of date ranges which will be used in the queries. So, I repeat the above sql statement for all the date ranges, which is a lengthy process.How can I do that in a single shot and get all totals for each date range.
View 4 Replies
View Related
Jul 9, 2010
Ok assume there is a table (TableA) in this format
col1col2 col3col4col5col6
--------------------------------------------------------------
R1route1route1Description1AABBCC
R1route1route1Description1AACC
R1route1route1Description1CCBB
R2route2route1Description2GGKKLL
R2route2route1Description2GGLL
R2route2route1Description2LLKK
[Code]..
The data in the table was imported from a csv file and there is a relationship between the rows. Each combination of col1, col2 and col3 describes a full route of a journey. The row with an entry in col6 describes the full route and the other rows describes each leg in the route.
For example, for R1, the route is AA to BB via CC.
Another example for R4 the route is FF to SS via XX, PP, and OO.
What i would like to do is missing a route. For example the route for R3 is DD to EE via FF. There is an entry for DD to FF but is missing an entry for FF to EE.
The results should return the following rows which are incomplete
R3route3route1Description3DDEEFF
R3route3route1Description3DDFF
R5route5route5Description5RRTTUU|VV
What is the best way to do this?
Here is what i have come up with but it doesnt quite returned the correct result.
select * from tableA a
Where not exists(
select 1 from tableA b
where instr(col6,col4,1)>0 and instr(col6,col1,1)>0)
And a.col1=b.col1
And a.col2=b.col2
And a.col3=b.col3
)
Is there an easier way to achieve this?
View 5 Replies
View Related
Oct 6, 2011
I have a one column table that store lists of elements :
create table test_table (c1 VARCHAR2(4000));
insert into test_table values ('1,23');
insert into test_table values ('1,2');
insert into test_table values ('3,4,5');
[code]...
The output column would be something like that:
output_column
1,2,7,23
6,9,0
3,4,5
I'm grouping columns that have at least one element in common.
(1,23) and (1,2) merge into : (1,2,23)
(1,2,23) and (7,2) merge into : (1,2,7,23) --> Output
(6,9) and (9,0) merge into : (6,9,0) --> Output
(3,4,5) and (5,5) merge into : (3,5,5) --> Output
I have made this logic using only PL/SQL, with loops and nested tables using function memberof, but I suppose that there is a way to improve the performance using only SQL.
View 9 Replies
View Related