Tag Archives: string

Oracle – UTL_MATCH.EDIT_DISTANCE_SIMILARITY string comparison

Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE_SIMILARITY which returns the percentage of matching strings: 0 meaning no similarity and 100 meaning complete similarity.

1. Logon to your Oracle database server as the Oracle software owner.

2. Logon to SQLPLUS with SYSDBA privileges.

mylinux:> sqlplus ‘/ as sysdba’

SQL*Plus: Release 10.2.0.4.0 – Production on Tues May 25 19:32:21 ge2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————————
100

SQL>

The stings are a 100% match.

4. Comparing strings with no simularity: ‘The First Dog’,’1234567890123′

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’1234567890123′) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’1234567890123′)
—————————————————————–
0

SQL>

The strings are a 0% match.

5. Comparing strings of varying case: ‘The First Dog’,’tHE fIRST dOG’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’tHE fIRST dOG’) from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘The First Dog ‘, ‘tHE fIRST dOG’)
—————————————————————-
16

The strings have a 16% match, due to function being case sensitive.

6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’

select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance_SIMILARITY(‘The First Dog’,’-The First Dog’)
from dual;

UTL_MATCH.EDIT_DISTANCE_SIMILARITY(‘THEFIRSTDOG’,’-THEFIRSTDOG’)
—————————————————————-
93

SQL>

The strings have a 93% match due to offsetting having no affect on comparison operation.

This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.

Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com

Oracle – UTL_MATCH.EDIT_DISTANCE string comparison

Oracle – UTL_MATCH.EDIT_DISTANCE

Oracle provides the procedure UTL_MATCH to compare the difference between to two sets of strings. In this article we will examine the function EDIT_DISTANCE which returns the number of changes required in a string comparison to make the strings identical.

1. Logon to your Oracle database server as the Oracle software owner.

2. Logon to SQLPLUS with SYSDBA privileges.

mylinux:> sqlplus ‘/ as sysdba’

SQL*Plus: Release 10.2.0.4.0 – Production on Mon May 24 21:41:18 2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

3. Comparing the same strings: ‘The First Dog’ and ‘The First Dog’

select utl_match.edit_distance(‘The First Dog’,’The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————-
0

SQL>

The comparison returns a zero meaning no changes required to make the two strings match.

4. Comparing strings with no simularity: ‘The First Dog’ and ‘1234567890123’

select utl_match.edit_distance(‘The First Dog’,’1234567890123′) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’1234567890123′) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’1234567890123′)
——————————————————
13

SQL>

It would take 13 changes to make the strings match.

5. Comparing strings of variaring case: ‘The First Dog’ and ‘tHE fIRST dOG’

select utl_match.edit_distance(‘The First Dog’,’tHE fIRST dOG’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’tHE fIRST dOG’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’THEFIRSTDOG’)
—————————————————-
11

SQL>

The case of the charactors is critical, thus 11 changes are required to make the strings match.

6. Comparing strings with an off-set: ‘The First Dog’ and ‘-The First Dog’

select utl_match.edit_distance(‘The First Dog’,’Off Set Text The First Dog’) from dual;

SQL*PLUS Output:

SQL> select utl_match.edit_distance(‘The First Dog’,’Off Set Text The First Dog’) from dual;

UTL_MATCH.EDIT_DISTANCE(‘THEFIRSTDOG’,’OFFSETTEXTTHEFIRSTDOG’)
————————————————————–
13

SQL>

Off setting the text will still result in matches, thus in the example above, it would only take 13 changes to make the strings match as oppose to 26 changes.

This completes the use of EDIT_DISTANCE function for the Oracle RDBMS procedure UTL_MATCH.

Larry J. Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com

Shell script to perform string replacement in multiple files for UNIX and LINUX:

As a DBA, regardless of RDBMS type, you will come across the need to replace text strings in dozens if not hundreds of files to facilitate the completion of your job. In this article we will cover the use of bash and perl scripts to perform text replacement of multiple files within a UNIX or LINUX environment.

1. Logon to your UNIX or LINUX server as the owner of the files you want to update or a user which has permission to update these files.

2. In this procedure we will create a file named files.txt containing a listing of all files we wish to update.

mylinux:> more files.txt
./test1.txt
./test2.txt
./test3.txt
./test4.txt

3. Next create a file called update.sh with the following text.

dt=`date “+%m%d%Y”` # Gets current date.
cat ./files.txt|while read line # Reads in all files from files.txt one line at a time.
do # Opens a loop
cp $line $line$dt # copies original file to backup with file_name+date.
ls $line |xargs perl -pi -e ‘s/{old_string}/{new_string}/g’ # if found replace old_string with new_stirng
done # ends loop

4. Change permissions on the update.sh to 770, so it will execute.

mylinx:>:>chmod 770 update.sh
mylinx:>:>

5. View the contents of one of the files in you files.txt file.

mylinx:>:>cat test*
one
one
one
one
mylinx:>:>

6. In this example, all of the files contain the text “one” which we will replace with the string “two”. Thus your update.sh file will look like the example below.

dt=`date “+%m%d%Y”`
cat ./files.txt|while read line
do
cp $line $line$dt
ls $line |xargs perl -pi -e ‘s/one/two/g’
done

7. Execute the update.sh file with the command: ./update.sh.

mylinx:>:>./update.sh
mylinx:>:>

8. Now cat all files named test*

mylinx:>:>cat test*
two
two
two
two
mylinx:>

As you can see all strings of “one” have been replaced with the string “two”. This completes replacement of strings in UNIX and LINUX.

Larry Catt, OCP 9i, 10g
oracle@allcompute.com
www.allcompute.com