Monthly Archives: April 2007

Powered by Python: Rename files

I have too many files that I want to unify the naming schemes, by replacing all spaces(" ") with periods(".") and capitalizing each part. e.g "this is a file.txt" -> "This.Is.A.File.txt"

Python comes to rescue with a breeze.

  1: from os import walk
  2: from os.path import join
  3: from os.path import basename
  4: from string import capwords
  5: import datetime
  6: import sys
  7: import os
  8: 
  9: def GoRename(path):
 10:     """ this function take a valid directory path as input
 11:     walk through all files and directories in the passed in path,
 12:     Raname the file by replacing all spaces with dots, and capitalize each word
 13:     path: path to a directory or file
 14:     Copyright @ Tomgee, 2007
 15:     """
 16: 
 17:     print sys.getdefaultencoding()
 18:     print sys.getfilesystemencoding()
 19:     logfile = path + r'\log.txt'
 20:     print "Going through %s to rename files, \nsubdirectory included..."%(path)
 21:     print "a log will be saved to %s"%logfile
 22:     f = open(logfile, 'a') # open a log file
 23:     sys.stdout = f # un-comment this line out if the output to a file is preferred
 24:     now = datetime.datetime.now()
 25:     print "\n***********************************************"
 26:     print "Timestamp: %s"%now.strftime("%y-%m-%d, %H:%M:%S")
 27: 
 28:     totalCount = 0
 29:     for root,dirs, files in walk(path):
 30:         for file in files:
 31:         if (file != os.path.basename(sys.argv[0])) and (file != "log.txt")and " " in file:
 32:             newfile = ".".join(capwords(file).split(" "))
 33:             os.rename(file, newfile)
 34:             print "File #%d: %s --> %s"%(totalCount,file, newfile)
 35:             totalCount += 1
 36:     print "Totally %d files/directories under %s have been searched"%(totalCount, path)
 37:     print r'********* The End *********'
 38: 
 39: def main():
 40:     argCurDir = os.getcwd()
 41:     GoRename(argCurDir)
 42: 
 43: if __name__ == '__main__': main()
 44: 

C++ Code: A lightweight Version Comparison class

This class is used for comparing version info in the form of "Major.Minor.Revision.Build".


// VersionInfo.h - Declararion of CVersionInfo 

class CVersionInfo
{
public:
    CVersionInfo(wstring s);
    CVersionInfo(unsigned int major,
    unsigned int minor,
    unsigned int revision,
    unsigned int build);

public: 
    wstring to_string() const;
    bool operator< (const CVersionInfo& vi);
    bool operator> (const CVersionInfo& vi);
    bool operator== (const CVersionInfo& vi);

private:
    unsigned int m_major;
    unsigned int m_minor;
    unsigned int m_revision;
    unsigned int m_build;
};


// VersionInfo.cpp - implementations of CVersionInfo 

CVersionInfo::CVersionInfo(unsigned int major,
        unsigned int minor, 
        unsigned int revision,
        unsigned int build)
        :m_major(major)
        ,m_minor(minor)
        ,m_revision(revision)
        ,m_build(build)
{} 

CVersionInfo::CVersionInfo(std::wstring s)
        :m_major(0)
        ,m_minor(0)
        ,m_revision(0)
        ,m_build(0)
{
    vector<wstring> vS;
    wstringstream ss;
    boost::algorithm::split(vS, s, boost::algorithm::is_any_of(L".")); 

    switch(vS.size())
    {
        case 4:
            ss.str(vS[3]); ss>>std::hex>>m_build; ss.clear();
        case 3:
            ss.str(vS[2]); ss>>std::hex>>m_revision; ss.clear();
        case 2:
            ss.str(vS[1]); ss>>std::hex>>m_minor; ss.clear();
        case 1:
            ss.str(vS[0]); ss>>std::hex>>m_major; ss.clear();
        default:
            ;
    }
} 

bool CVersionInfo::operator< (const CVersionInfo& vi)
{
    if(m_major != vi.m_major) return m_major < vi.m_major; 
    if(m_minor != vi.m_minor) return m_minor < vi.m_minor; 
    if(m_revision != vi.m_revision) return m_revision < vi.m_revision; 
    if(m_build != vi.m_build) return m_build < vi.m_build; 

    return false;
} 

bool CVersionInfo::operator> (const CVersionInfo& vi)
{
    if(m_major != vi.m_major) return m_major > vi.m_major; 
    if(m_minor != vi.m_minor) return m_minor > vi.m_minor; 
    if(m_revision != vi.m_revision) return m_revision > vi.m_revision; 
    if(m_build != vi.m_build) return m_build > vi.m_build; 

    return false;
} 

bool CVersionInfo::operator== (const CVersionInfo& vi)
{
    if((m_major == vi.m_major) 
        && (m_minor == vi.m_minor)
        && (vi.m_revision == vi.m_build)) 
        return true; 

    return false;
} 

wstring CVersionInfo::to_string() const
{
    wstringstream ss; 

    ss << m_major
       << L"."
       << m_minor
       << L"."
       << m_revision
       << L"."
       << m_build; 

    return ss.str();
}

Link: Build Incrementer Add-In for Visual Studio .NET (C++)

Finally I came across one here,  

The add-in works with C++ projects for Visual Studio .NET versions 2003, 2005, should also be working with 2002 version (no compiler by hand to make sure though).

UNICODE/Wide Characters handling in C++

I was bitten again.

Life was never meant to be easier, and it’s tougher when you come to deal with wide characters in C++ with wfstream, wcout or any other WIDE versions of standard I/O facilities.

Two Rule of Thumbs:

#1 Unicode files must be opened as binary

Example:

std::wifstream xmlFile(m_FileName, ios::binary);

std::wofstream xmlFile(m_FileName, ios::binary);

#2 when working with languages other than English, wifstream/wofstream must be imbued with a non-default facet to read from or write to a real UNICOE file, or else wofstream ends up writing an ANSI file.

An explanation is available from here .

Example:

  1:  wstring ws(L"this is a wide string"); 
  2:  wofstream of_imbued;
  3: 
  4:  IMBUE_NULL_CODECVT(of_imbued); 
  5: 
  6:  of_imbued.open(L"c:\\imbued.txt", ios::binary);
  7:  of_imbued<<ws.c_str(); 
  8: 
  9:  wofstream of_not_imbued;
 10:  of_not_imbued.open(L"c:\\not_imbued.txt", ios::binary);
 11:  of_not_imbued<<ws.c_str();
 12: 

Outputs of the above code:

Two imbue facilities are available:

Boost Library

imbue_null_codecvt (the one used in above example)

There’s also a classical  C way to write UNICODE files:

  1: wchar_t myWString[] = L"Some strange characters." 
  2: fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t), 
  3: myFile ); 

However, it is not portable.

References:

  1. Unicode Implementation

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/ffe0912d1462d7a5/7601a62008fdd25a?lnk=st&q=wfstream+fstream+cout+wcout&rnum=6&hl=en#7601a62008fdd25a

  1. Unicode in C++

http://groups.google.com/group/comp.lang.c++/browse_thread/thread/f4a6a434b0453187/1edc2bc1f4187597?lnk=st&q=wfstream+fstream+cout+wcout&rnum=3&hl=en#1edc2bc1f4187597

  1. how to read a Unicode file with fstream?

http://groups.google.com/group/microsoft.public.vc.stl/browse_thread/thread/45d7520ec3ad3f51/d57b41e9abb20117?lnk=st&q=wfstream+fstream+cout+wcout&rnum=2&hl=en#

  1. A very puzzling problem: cout vs. wcout, fstream vs. wfstream

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/37c3e24861ca09e3/78fe0aeed7b728de?lnk=st&q=wfstream+fstream+cout+wcout&rnum=1&hl=en#78fe0aeed7b728de

  1. Upgrading an STL-based application to use Unicode

http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

How to Write a Spelling Corrector

All you need is 20 lines of Python 2.5 code. 

  1: import re, string, collections
  2: 
  3: def words(text): return re.findall('[a-z]+', text.lower()) 
  4: 
  5: def train(features):
  6:     model = collections.defaultdict(lambda: 1)
  7:     for f in features:
  8:         model[f] += 1
  9:     return model
 10: 
 11: NWORDS = train(words(file('Documents/holmes.txt').read()))
 12: 
 13: def edits1(word):
 14:     n = len(word)
 15:     return set([word[0:i]+word[i+1:] for i in range(n)] + ## deletion
 16:                [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + ## transposition
 17:                [word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration
 18:                [word[0:i]+c+word[i:] for i in range(n+1) for c in string.lowercase]) ## insertion
 19: 
 20: def known_edits2(word):
 21:     return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
 22: 
 23: def known(words): return set(w for w in words if w in NWORDS)
 24: 
 25: def correct(word):
 26:     return max(known([word]) or known(edits1(word)) or known_edits2(word) or [word],
 27:                key=lambda w: NWORDS[w])
 28: 
 29:  
 30: 
 31: 

A complete analasis is availabe here: http://norvig.com/spell-correct.html

driver code for Serial Flash memories