Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE_SIMILARITY which returns the percentage of matching strings: 0 meaning no similarity and 100 meaning complete similarity.
1. Logon to your Oracle database server as the Oracle software owner.
2. Logon to SQLPLUS with SYSDBA privileges.
mylinux:> sqlplus ‘/ as sysdba’
SQL*Plus: Release 10.2.0.4.0 – Production on Tues May 25 19:32:21 ge2010
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’
select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;
SQL*PLUS Output:
SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————————
100
SQL>
The stings are a 100% match.
4. Comparing strings with no simularity: ‘The First Dog’,’1234567890123′
select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;
SQL*PLUS Output:
SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’1234567890123′)
—————————————————————–
0
SQL>
The strings are a 0% match.
5. Comparing strings of varying case: ‘The First Dog’,’tHE fIRST dOG’
select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;
SQL*PLUS Output:
SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘The First Dog ‘, ‘tHE fIRST dOG’)
—————————————————————-
16
The strings have a 16% match, due to function being case sensitive.
6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’
select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’) from dual;
SQL*PLUS Output:
SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’)
from dual;
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’-THEFIRSTDOG’)
—————————————————————-
93
SQL>
The strings have a 93% match due to offsetting having no affect on comparison operation.
This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.
Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com