UNICODE, The C++ Way / C++中UNICODE的处理

It’s not as easy as one may think when he comes to deal with UNICODE file with C++ standard library – wchar_t is not enough. 

  • Standard C++(portable way, preferred)

Here below are 2 solutions after searching on the internet.

Solution 1: use codecvt (Upgrading an STL-based application to use Unicode)

 

imbue_null_codecvt.h source code:

#ifndef IMBUE_NULL_CODECVT_H_INCLUDED
#define IMBUE_NULL_CODECVT_H_INCLUDED
 
#include <locale>
using namespace std;    ///< import the c++ name space
using std::codecvt ;
typedef codecvt < wchar_t , char , mbstate_t > NullCodecvtBase ;
 
/**
 * \brief
 * a MACRO to facilitate imbuing a locale with facet NullCodecvt
 *
 */
#define IMBUE_NULL_CODECVT( outputFile ) \
{ \
    (outputFile).imbue( std::locale(locale::classic(), new NullCodecvt )) ; \
}
 

/**
 *  imbue_null_codecvt.h
 *  codecvt facet for std::wofstream to write wchar_t to file in UNICODE
 *
 * This code was originally written by Taka Muraoka and published at
 * http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp
 * and freely available.
 */

class NullCodecvt

    : public NullCodecvtBase
{ 
public:
    typedef wchar_t _E ;
    typedef char _To ;
    typedef mbstate_t _St ;
 
    explicit NullCodecvt( size_t _R=0 ) : NullCodecvtBase(_R) { }
 
protected:
    virtual result do_in( _St& _State ,
                   const _To* _F1 , const _To* _L1 , const _To*& _Mid1 ,
                   _E* F2 , _E* _L2 , _E*& _Mid2
                   ) const
    {
        return noconv ;
    }
    virtual result do_out( _St& _State ,
                   const _E* _F1 , const _E* _L1 , const _E*& _Mid1 ,
                   _To* F2, _E* _L2 , _To*& _Mid2
                   ) const
    {
        return noconv ;
    }
    virtual result do_unshift( _St& _State ,
            _To* _F2 , _To* _L2 , _To*& _Mid2 ) const
    {
        return noconv ;
     }
    virtual size_t do_length( _St& _State , const _To* _F1 ,
           const _To* _L1 , size_t _N2 ) const _THROW0()
    {
        return (_N2 < (size_t)(_L1 – _F1)) ? _N2 : _L1 – _F1 ;
    }
    virtual bool do_always_noconv() const _THROW0()
    {
        return true ;
    }
    virtual int do_max_length() const _THROW0()
    {
        return 2 ;
    }
    virtual int do_encoding() const _THROW0()
    {
        return 2 ;
    }
} ;
 
#endif // IMBUE_NULL_CODECVT_H_INCLUDED
 

 2.Usage, just a macro:

#include "imbue_null_codecvt.h"
#include <fstream>      // std::wfstream
using std::wofstream;
 
 // some code …
wofstream fileLng;
IMBUE_NULL_CODECVT(fileLng);    // prevent from converting UNICODE to MBCS while writing to file
fileLng.open(L"unicode_file", ios::binary | ios::in | ios::out);
if(fileLng.is_open()){…}

Solution 2: Standard C/C++: Multibyte  by PJ Plauger

  •   Windows specific

Paul Dilascia posted a Windows specific solution and explained it in his Q&A column MSDN Magzine Issue August 2004.

 

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s