NLS Overview-Datatypes-2
On 24/07/2024 by Robert CorvinoNotice how in the 7-bit session I received the letter “a” three times with no diacritical marks. However, the DUMP function is showing me that in the database there are, in fact, three separate distinct characters, not just the letter “a.” The data in the database hasn’t changed—just the values this client received. If this client were to retrieve that data into host variables as follows:
SQL> variable d varchar2(1)
SQL> variable r varchar2(20)
And then next, do nothing whatsoever with it, just send it back to the database:
SQL> update t set data = :d where rowid = chartorowid(:r);1 row updated.
SQL> commit;Commit complete.
I would observe in the original 8-bit session that I have lost one of the original characters. It has been replaced with the lowly 7-bit a, not the fancy à I had previously.
SQL> select data, dump(data) dump from t;D DUMP
a Typ=1 Len=1: 97
á Typ=1 Len=1: 225
â Typ=1 Len=1: 226
SQL DUMP FUNCTION
The Oracle SQL DUMP function allows you to display the datatype code, length in bytes, and the internal representation of a data value (also optionally the character set name). Its syntax is as follows:
DUMP( expression [,return_format] [8,start_position] [,length] )
The default return_format is 10 (decimal) and can be any of the following: 8, 10, 16, 17, 1008, 1010, 1016, or 1017, where 8 is octal notation, 10 is decimal, 16 is hexadecimal, 17 is single characters, 1008 is octal with the character set name, 1010 is decimal with character set name, 1016 is hexadecimal with the character set name, and 1017 is single characters with the character set name.
The following example dumps information regarding the “a” character:
SQL> select dump(‘a’), dump(‘a’,8), dump(‘a’,16) from dual;
DUMP(‘A’) DUMP(‘A’,8) DUMP(‘A’,16)
Typ=96 Len=1: 97 Typ=96 Len=1: 141 Typ=96 Len=1: 61
where 97, 141, and 61 are the corresponding ASCII codes for the “a” character in decimal, octal, and hexadecimal notations. The returned datatype code of Typ=96 indicates a CHAR datatype (see the Oracle Database SQL Language Reference manual for a complete list of Oracle datatype codes and meanings).
This demonstrates the immediate impact of an environment with a heterogeneous character set, whereby the clients and database use different NLS settings. It is something to be aware of because it comes into play in many circumstances. For example, if the DBA uses the deprecated legacy EXP tool to extract information, they may observe the following warning:
Export done in US7ASCII character set and UTF8 NCHAR character set server uses WE8MSWIN1252 character set (possible charset conversion) About to export specified tables via Conventional Path …
Such warnings should be treated very seriously. If you were exporting this table with the goal of dropping the table and then using IMP to re-create it, you would find that all of your data in that table was now lowly 7-bit data! Beware the unintentional character set conversion.
Note The problem of unintentional character set conversion does not affect every tool, nor does it affect every tool in the same ways. For example, if you were to use the recommended Data Pump export/import process, you would discover that the export is always done in the character set of the database containing the data, regardless of the client’s NLS settings. This is because Data Pump runs in the database server itself; it is not a client-side tool at all. Similarly, Data Pump import will always convert the data in the file to be imported from the source database’s character set into the destination database’s character set—meaning that character set conversion is still possible with Data Pump (if the source and target databases have different character sets) but not in the same fashion as with the legacy EXP/IMP tools!
But also be aware that, in general, character set conversions are necessary. If clients are expecting data in a specific character set, it would be disastrous to send them the information in a different character set.
Note I highly encourage everyone to read through the Oracle Database Globalization Support Guide document. It covers NLS-related issues to a depth we will not here. Anyone creating applications that will be used around the globe (or even across international boundaries) needs to master the information contained in that document.
Now that we have a cursory understanding of character sets and the impact they will have on us, let’s take a look at the character string types provided by Oracle.
Archives
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
Calendar
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 |
Leave a Reply