Oracle – UTL_MATCH.EDIT_DISTANCE_SIMILARITY string comparison

Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE_SIMILARITY which returns the percentage of matching strings: 0 meaning no similarity and 100 meaning complete similarity.

1. Logon to your Oracle database server as the Oracle software owner.

2. Logon to SQLPLUS with SYSDBA privileges.

mylinux:> sqlplus ‘/ as sysdba’

SQL*Plus: Release 10.2.0.4.0 – Production on Tues May 25 19:32:21 ge2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————————
100

SQL>

The stings are a 100% match.

4. Comparing strings with no simularity: ‘The First Dog’,’1234567890123′

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’1234567890123′)
—————————————————————–
0

SQL>

The strings are a 0% match.

5. Comparing strings of varying case: ‘The First Dog’,’tHE fIRST dOG’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘The First Dog ‘, ‘tHE fIRST dOG’)
—————————————————————-
16

The strings have a 16% match, due to function being case sensitive.

6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’)
from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’-THEFIRSTDOG’)
—————————————————————-
93

SQL>

The strings have a 93% match due to offsetting having no affect on comparison operation.

This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.

Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com

Leave a Reply