UNICODE/Wide Characters handling in C++

I was bitten again.

Life was never meant to be easier, and it’s tougher when you come to deal with wide characters in C++ with wfstream, wcout or any other WIDE versions of standard I/O facilities.

Two Rule of Thumbs:

#1 Unicode files must be opened as binary

Example:

std::wifstream xmlFile(m_FileName, ios::binary);

std::wofstream xmlFile(m_FileName, ios::binary);

#2 when working with languages other than English, wifstream/wofstream must be imbued with a non-default facet to read from or write to a real UNICOE file, or else wofstream ends up writing an ANSI file.

An explanation is available from here .

Example:

  1:  wstring ws(L"this is a wide string"); 
  2:  wofstream of_imbued;
  3: 
  4:  IMBUE_NULL_CODECVT(of_imbued); 
  5: 
  6:  of_imbued.open(L"c:\\imbued.txt", ios::binary);
  7:  of_imbued<<ws.c_str(); 
  8: 
  9:  wofstream of_not_imbued;
 10:  of_not_imbued.open(L"c:\\not_imbued.txt", ios::binary);
 11:  of_not_imbued<<ws.c_str();
 12: 

Outputs of the above code:

Two imbue facilities are available:

Boost Library

imbue_null_codecvt (the one used in above example)

There’s also a classical  C way to write UNICODE files:

  1: wchar_t myWString[] = L"Some strange characters." 
  2: fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t), 
  3: myFile ); 

However, it is not portable.

References:

  1. Unicode Implementation

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/ffe0912d1462d7a5/7601a62008fdd25a?lnk=st&q=wfstream+fstream+cout+wcout&rnum=6&hl=en#7601a62008fdd25a

  1. Unicode in C++

http://groups.google.com/group/comp.lang.c++/browse_thread/thread/f4a6a434b0453187/1edc2bc1f4187597?lnk=st&q=wfstream+fstream+cout+wcout&rnum=3&hl=en#1edc2bc1f4187597

  1. how to read a Unicode file with fstream?

http://groups.google.com/group/microsoft.public.vc.stl/browse_thread/thread/45d7520ec3ad3f51/d57b41e9abb20117?lnk=st&q=wfstream+fstream+cout+wcout&rnum=2&hl=en#

  1. A very puzzling problem: cout vs. wcout, fstream vs. wfstream

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/37c3e24861ca09e3/78fe0aeed7b728de?lnk=st&q=wfstream+fstream+cout+wcout&rnum=1&hl=en#78fe0aeed7b728de

  1. Upgrading an STL-based application to use Unicode

http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: