A new world ORDER

Often I have seen sites where you need to select a country for some reason or other.  And of course you have to wade through a big list to find yours.    A list like this one from iso.org.  Some will put the U.S.  or the home country at the top but is there a way make this easy to maintain?  Generally what they do is add a sort column into the table.   This is just cruft that becomes another source of maintenance .

So what to do?   Add a table that lets the sort be defined and different sorts for different needs.    First the tables:

CREATE TABLE COUNTRY_CODES (
                COUNTRY_CODE CHAR(2) NOT NULL,
                COUNTRY_CODE# NUMBER(22) NOT NULL,
                COUNTRY_NAME VARCHAR2(64),
                CONSTRAINT COUNTRY_CODES_PK PRIMARY KEY (COUNTRY_CODE)
);

CREATE TABLE COUNTRY_SORT (
                USAGE VARCHAR2(16) NOT NULL,
                COUNTRY_CODE CHAR(2) NOT NULL,
                POS NUMBER(22) NOT NULL,
                CONSTRAINT COUNTRY_SORT_PK PRIMARY KEY (USAGE, COUNTRY_CODE, POS)
);

ALTER TABLE COUNTRY_SORT ADD CONSTRAINT COUNTRY_SORT_COUNTRY_CODE_FK1
FOREIGN KEY (COUNTRY_CODE)
REFERENCES COUNTRY_CODES (COUNTRY_CODE);

I then populated country_codes from the CSV in the link above and use a sequence.   Then I inserted the following into the country_sort_table:

REM INSERTING into COUNTRY_SORT
SET DEFINE OFF;
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('AT',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('BE',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('BG',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('CY',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('CZ',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('DE',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('DK',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('EE',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('ES',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('FI',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('FR',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('GB',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('GR',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('HU',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('IE',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('IT',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('LT',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('LU',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('LV',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('MT',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('NL',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('PL',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('PT',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('RO',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('SE',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('SI',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('SK',27,'EU');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('US',1,'North America');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('CA',2,'North America');
Insert into COUNTRY_SORT (COUNTRY_CODE,POS,USAGE) values ('MX',3,'North America');

Now I have two sets of sorting for the country_codes table and have not had to touch the base table.  I can select and change the order in what I select not by changing the tables or the order by

SQL> var loc varchar2(32) 
SQL> exec :loc:='North America' 
SQL> select c.*, ( select pos from country_sort s 
                    where usage = :loc 
                      and s.country_code = c.country_code ) pos 
       from country_codes c 
      order by pos, country_code# ;

COUNTRY_CODE# COUNTRY_NAME                        CO       POS
-------- --------------------------------------- -- ------------
      236 UNITED STATES                           US         1
       40 CANADA                                  CA         2
      144 MEXICO                                  MX         3
        1 AFGHANISTAN                             AF
        2 ?LAND ISLANDS                           AX
        3 ALBANIA                                 AL
        4 ALGERIA                                 DZ
...........
 249  ZIMBABWE                                    ZW 

SQL> exec :loc:='EU' 
SQL> select c.*, ( select pos from country_sort s 
 where usage = :loc 
 and s.country_code = c.country_code ) pos 
 from country_codes c
    order by pos, country_code# ;

COUNTRY_CODE# COUNTRY_NAME                          CO       POS
------------- ------------------------------------- -- ------------
       15 AUSTRIA                                   AT        27
       22 BELGIUM                                   BE        27
       35 BULGARIA                                  BG        27
       59 CYPRUS                                    CY        27
       60 CZECH REPUBLIC                            CZ        27
       61 DENMARK                                   DK        27
       70 ESTONIA                                   EE        27
       75 FINLAND                                   FI        27
       76 FRANCE                                    FR        27
       83 GERMANY                                   DE        27
       86 GREECE                                    GR        27
      101 HUNGARY                                   HU        27
      107 IRELAND                                   IE        27
      110 ITALY                                     IT        27
      123 LATVIA                                    LV        27
      129 LITHUANIA                                 LT        27
      130 LUXEMBOURG                                LU        27
      138 MALTA                                     MT        27
      157 NETHERLANDS                               NL        27
      177 POLAND                                    PL        27
      178 PORTUGAL                                  PT        27
      182 ROMANIA                                   RO        27
      202 SLOVAKIA                                  SK        27
      203 SLOVENIA                                  SI        27
      209 SPAIN                                     ES        27
      215 SWEDEN                                    SE        27
      235 UNITED KINGDOM                            GB        27
        1 AFGHANISTAN                               AF
...........
     249  ZIMBABWE                                  ZW

So that is cool,  change a variable and change the order.  But looking at the output of the second select, I don’t want UK so far down the list:

insert into country_code ( select conutry_code, 27, 'EU GB First' 
                             from country_code );
update country_sort set pos = 1 where country_code='GB' 
                                  and usage = 'EU GB First' ;

/* now we can get */
exec :loc:='EU GB First'

SQL> select c.*, ( select pos from country_sort s 
                    where usage = :loc 
                      and s.country_code = c.country_code ) pos 
      from country_codes c
    order by pos, country_code# ;

COUNTRY_CODE# COUNTRY_NAME                          CO       POS
------------- ------------------------------------- -- ------------
      235 UNITED KINGDOM                            GB         1
       15 AUSTRIA                                   AT        27
       22 BELGIUM                                   BE        27
....

Now using what we learned about pipelined functions, in the previous article:

CREATE TYPE country_t VARCHAR2(64); 
CREATE OR REPLACE  function ordered_country_list( pOrder VARCHAR2 )
   RETURN country_t  deterministic pipelined AS
  lRet VARCHAR2(64) ;
BEGIN
     FOR  rec IN  (  SELECT  c.*, ( SELECT pos 
                                      FROM country_sort s
                                     WHERE usage =  pOrder
                                       AND  s.country_code = c.country_code ) pos
                       FROM country_codes c
                      ORDER BY pos, country_code# )
      LOOP
         PIPE ROW (rec.country_name);
      END LOOP;
      RETURN ;
END;
/

Calling the ordered_country_list function and passing in the parameter will list the results in the  requested order:

select column_value as country_name 
  from table(  ordered_country_list( 'North America' ) ) ;

This is a real world example.  I have also used this technique for things like prescriptions where brand name are listed above the generic, or  just below.   Control is in the sort table and any number of ways of sorting can be added and used.

 

 

Where is the duplicate

I have often said  ‘One test is worth a thousand speculations.’    I never assume that one method is right or even right in every case.   I look at what might work and then test.   So there is a forum I follow,  where there was a question on how to delete duplicates.   The general consensus was the way I have done it in the past:

DELETE
  FROM tab
 WHERE ROWID NOT IN
    (SELECT   MIN (ROWID)
       FROM tab
      GROUP BY <dup column list> )

But there were also some new ideas, to me at least,  on alternative solutions.  The example was how to delete duplicate firstname and lastname columns in the emp table.  The first suggestions were variations on the template above:

/* del_by_group.sql */

DELETE FROM emp WHERE ROWID NOT IN (
SELECT MIN (ROWID) FROM emp GROUP BY firstname , lastname)

The next was using an analytic function:

/* del_by_analytic.sql */ 
delete emp where rowid in  (
select lag(rowid) 
 over(partition by firstname, lastname order by empid) from emp ) 
/

the last was inspired by using analytics and is using TOP-N logic.

/* del_by_rownumber.sql */
DELETE
  FROM EMP t
 WHERE t.ROWID IN
    ( SELECT   dupl.RID
       FROM ( SELECT row_number() 
               OVER (PARTITION BY FirstName, 
                                  LastName 
                      ORDER BY EmpID) AS ROWNR,
            FirstName  ,
            LastName   ,
            ROWID AS RID
       FROM EMP
   ORDER BY FirstName,
            LastName ) dupl
  WHERE dupl.ROWNR > 1 ) /

So I created an emp table with empid, firstname, and lastname by building data from all_objects,  then tested  the various methods.  Here is the script to create the test data.

/* del_dup_test */
truncate table emp;
drop table emp;
create table emp as ( select object_id as empid, OWNER firstname , OBJECT_NAME lastname from all_objects ) ;
insert into  emp  ( select object_id+26000,  initcap( owner ) ,initcap( OBJECT_NAME ) from all_objects ) ;
insert into  emp  ( select object_id+52000,  initcap( owner ) , ( OBJECT_NAME ) from all_objects ) ;
insert into  emp  ( select object_id+78000,  initcap( owner ) , ( OBJECT_NAME ) from all_objects where owner <> 'SYS'  ) ;
insert into  emp  ( select object_id+104000,  initcap( owner ) , initcap( OBJECT_NAME ) from all_objects where owner <> 'SYS'  ) ;
commit;
select count(*) from emp ; 
@del_by_group
select count(*) from emp ;
roll
/
@del_by_analytic
select count(*) from emp ;
roll
/
@del_by_rownumber
select count(*) from emp ;
roll
/
exec dbms_output.put_line( ' del_by_group');
set autotrace traceonly exp stat ;
@del_by_group
SET autotrace off ;
roll
exec dbms_output.put_line( ' del_by_analytic');
set autotrace traceonly exp stat ;
@del_by_analytic
SET autotrace off ;
roll
exec dbms_output.put_line( ' del_by_rownumber')
set autotrace traceonly exp stat ;
@del_by_rownumber
SET autotrace off ;
roll

If you try this you may need to change the multiple of 26000 to prevent duplicate  empid’s.   I create three rows for every row in all_objects, the base row, a row where both first and last names are initcap’ed and the third row where only the first is initcap’ed.   Then I create duplicates for those extra row excluding the ‘SYS’ schema.

The rest of the script calls the other three scripts, first to show they all produce the same result and then calls them again with tracing on (there are other ways this can be evaluated but this gives you the idea).

Now here’s the spooled output from running the script.  This is with feed and timing off to reduce the clutter.   The times were all sub second.

SQL> @del_dup_test

    COUNT(*)
------------
       87865

    COUNT(*)
------------
       58239

    COUNT(*)
------------
       87865

    COUNT(*)
------------
       58239

    COUNT(*)
------------
       87865

    COUNT(*)
------------
       58239

    COUNT(*)
------------
       87865
 del_by_group

Execution Plan
----------------------------------------------------------
Plan hash value: 3239269824

------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |          | 46243 |  1083K|       |   366   (3)| 00:00:05 |
|   1 |  DELETE               | EMP      |       |       |       |            |          |
|*  2 |   HASH JOIN ANTI      |          | 46243 |  1083K|  1088K|   366   (3)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | EMP      | 46243 |   541K|       |   127   (1)| 00:00:02 |
|   4 |    VIEW               | VW_NSO_1 | 46243 |   541K|       |   131   (4)| 00:00:02 |
|   5 |     SORT GROUP BY     |          | 46243 |  2077K|       |   131   (4)| 00:00:02 |
|   6 |      TABLE ACCESS FULL| EMP      | 46243 |  2077K|       |   127   (1)| 00:00:02 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(ROWID="MIN(ROWID)")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
         36  recursive calls
      30884  db block gets
        940  consistent gets
          0  physical reads
    8308824  redo size
       1141  bytes sent via SQL*Net to client
       1346  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      29626  rows processed

 del_by_analytic

Execution Plan
----------------------------------------------------------
Plan hash value: 1245347917

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT       |          |    38M|   873M|       |  2501  (15)| 00:00:31 |
|   1 |  DELETE                | EMP      |       |       |       |            |          |
|*  2 |   HASH JOIN            |          |    38M|   873M|  1448K|  2501  (15)| 00:00:31 |
|   3 |    VIEW                | VW_NSO_1 | 61779 |   723K|       |  1892   (1)| 00:00:23 |
|   4 |     SORT UNIQUE        |          | 61779 |  3559K|  4384K|  1892   (1)| 00:00:23 |
|   5 |      WINDOW SORT       |          | 61779 |  3559K|  4384K|  1892   (1)| 00:00:23 |
|   6 |       TABLE ACCESS FULL| EMP      | 61779 |  3559K|       |   127   (1)| 00:00:02 |
|   7 |    TABLE ACCESS FULL   | EMP      | 61779 |   723K|       |   127   (1)| 00:00:02 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(ROWID="$kkqu_col_1")

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
         36  recursive calls
      30863  db block gets
        940  consistent gets
          0  physical reads
    8303216  redo size
       1141  bytes sent via SQL*Net to client
       1362  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
      29626  rows processed

 del_by_rownumber

Execution Plan
----------------------------------------------------------
Plan hash value: 3671588261

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT             |          |     1 |    24 |   342   (2)| 00:00:05 |
|   1 |  DELETE                      | EMP      |       |       |            |          |
|   2 |   NESTED LOOPS               |          |     1 |    24 |   342   (2)| 00:00:05 |
|   3 |    VIEW                      | VW_NSO_1 | 46251 |   542K|   127   (1)| 00:00:02 |
|   4 |     SORT UNIQUE              |          |     1 |  1129K|            |          |
|*  5 |      VIEW                    |          | 46251 |  1129K|   127   (1)| 00:00:02 |
|   6 |       WINDOW SORT            |          | 46251 |  2664K|   127   (1)| 00:00:02 |
|   7 |        TABLE ACCESS FULL     | EMP      | 46251 |  2664K|   127   (1)| 00:00:02 |
|   8 |    TABLE ACCESS BY USER ROWID| EMP      |     1 |    12 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("DUPL"."ROWNR">1)

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
         36  recursive calls
      30883  db block gets
        940  consistent gets
          0  physical reads
    8309716  redo size
       1141  bytes sent via SQL*Net to client
       1499  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      29626  rows processed

So there were some surprises for me.   First that the tried and true way did so well.   But also the one that looked the most complex produced the best trace.    But even so, I’ll stick with the way I have always done delete of dupes as I can remember how to do it off the top of my head,  and my databases do not have duplicates very often.   In regards to the group by, changing min to max will easily delete different entries.   This is also not a complete test.  The duplicates are based on only two columns, and the percentage of duplicates is fairly high.   Sometimes there are duplicates and other columns  have a bearing on which records are to be removed.