Tag Archives: comparison

Oracle – UTL_MATCH.EDIT_DISTANCE_SIMILARITY string comparison

Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE_SIMILARITY which returns the percentage of matching strings: 0 meaning no similarity and 100 meaning complete similarity.

1. Logon to your Oracle database server as the Oracle software owner.

2. Logon to SQLPLUS with SYSDBA privileges.

mylinux:> sqlplus ‘/ as sysdba’

SQL*Plus: Release 10.2.0.4.0 – Production on Tues May 25 19:32:21 ge2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————————
100

SQL>

The stings are a 100% match.

4. Comparing strings with no simularity: ‘The First Dog’,’1234567890123′

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’1234567890123′)
—————————————————————–
0

SQL>

The strings are a 0% match.

5. Comparing strings of varying case: ‘The First Dog’,’tHE fIRST dOG’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘The First Dog ‘, ‘tHE fIRST dOG’)
—————————————————————-
16

The strings have a 16% match, due to function being case sensitive.

6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’)
from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’-THEFIRSTDOG’)
—————————————————————-
93

SQL>

The strings have a 93% match due to offsetting having no affect on comparison operation.

This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.

Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com

Oracle – UTL_MATCH.EDIT_DISTANCE string comparison

Oracle – UTL_MATCH.EDIT_DISTANCE

Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE which returns the number of changes required in a string comparison to make the strings identical.

1. Logon to your Oracle database server as the Oracle software owner.

2. Logon to SQLPLUS with SYSDBA privileges.

mylinux:> sqlplus ‘/ as sysdba’

SQL*Plus: Release 10.2.0.4.0 – Production on Mon May 24 21:41:18 2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’

select utl_match.edit_distance(‘The First Dog’,’The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————-
0

SQL>

The comparison returns a zero meaning no changes required to make the two strings match.

4. Comparing strings with no simularity: ‘The First Dog’ and ‘1234567890123’

select utl_match.edit_distance(‘The First Dog’,’1234567890123′) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’1234567890123′) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’1234567890123′)
——————————————————
13

SQL>

It would take 13 changes to make the strings match.

5. Comparing strings of variaring case: ‘The First Dog’ and ‘tHE fIRST dOG’

select utl_match.edit_distance(‘The First Dog’,’tHE fIRST dOG’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’tHE fIRST dOG’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————-
11

SQL>

The case of the charactors is critical, thus 11 changes are required to make the strings match.

6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’

select utl_match.edit_distance(‘The First Dog’,’Off Set Text The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’Off Set Text The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’OFFSETTEXTTHEFIRSTDOG’)
————————————————————–
13

SQL>

Off setting the text will still result in matches, thus in the example above, it would only take 13 changes to make the strings match as oppose to 26 changes.

This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.

Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com