5

If the registered trademark symbol does not appear at the end of a file or folder name, strip cannot be used. Why doesn't replace work?

I have some old files and folders named with a registered trademark symbol that I want to remove.

The files don't have an extension.

  • folder: "\data\originals\Word Finder®"
  • file 1: "\data\originals\Word Finder® DA"
  • file 2: "\data\originals\Word Finder® Thesaurus"

For the folder, os.rename(p,p.strip('®')) works. However, replace os.rename(p,p.replace('®','')) does not work on either the folder or the files.

Replace works on strings fed to it, ie: print 'Registered® Trademark®'.replace('®',''). Is there a reason the paths don't follow this same logic?

note:

  • I'm using os.walk() to get the folder and file names
skas
  • 79
  • 8
  • 11
    Worst.. filename.. ever. – Mike Christensen Jul 30 '14 at 17:03
  • In file 1 and file 2, did you forget the path separator after the `Word Finder ®` directory? – Bill Lynch Jul 30 '14 at 17:03
  • 5
    What python version are you using? – skrrgwasme Jul 30 '14 at 17:04
  • 1
    And BTW, the `# -*- coding: utf-8 -*-` tag has *nothing* to do with how your script handles input. It tells the interpreter what encoding your script is in. – skrrgwasme Jul 30 '14 at 17:05
  • 1
    Additionally, does `p.replace()` and `p.strip()` return the filename that you would expect? – Bill Lynch Jul 30 '14 at 17:05
  • You're using byte strings. Use Unicode strings `u'®'` instead (the same Unicode characters can be represented in many different encodings, and often multiple ways within a single encoding). – Cameron Jul 30 '14 at 17:07
  • @ScottLawson From the `#!/usr/bin/env python` it would appear that this is Python 2. – Lily Chung Jul 30 '14 at 17:09
  • @IstvanChung Not necessarily. Some systems may have "python" open a Python3 interpreter and require "python2" for a 2.x interpreter. Admittedly, the other way around is more common, but since Unicode is handled differently in 2.x vs. 3.x, I didn't want to make the assumption. It really depends on which version [appears first in your path](http://stackoverflow.com/a/2429517/2615940). – skrrgwasme Jul 30 '14 at 17:14
  • Works fine for me, too. Python 2.7 on Mac OS X. – nofinator Jul 30 '14 at 17:27
  • @Cameron When I do, I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 11: ordinal not in range(128). How do I fix this problem? – skas Jul 30 '14 at 18:23
  • @ScottLawson Python 2.7.5, Windows 7 – skas Jul 30 '14 at 18:24
  • @skas: Hmm, looks like `os.rename` doesn't work with Unicode strings. Too bad. But, you can still give it the bytes in the [correct encoding for the OS](https://docs.python.org/2/howto/unicode.html#unicode-filenames): `u'®'.encode('mbcs')` – Cameron Jul 30 '14 at 18:55

2 Answers2

4

I have been unable to recreate your issue, so I'm not sure why it isn't working for you. Here is a workaround though: instead of using the registered character in your source code with the string methods, try being more explicit with something like this:

import os

for root, folders, files in os.walk(os.getcwd()):
    for fi in files:
        oldpath = os.path.join(root, fi)
        newpath = os.path.join(root, fi.decode("utf-8").replace(u'\u00AE', '').encode("utf-8"))
        os.rename(oldpath, newpath)

Explicitly specifying the unicode codepoint you're looking for can help eliminate the number of places your code could be going wrong. The interpreter no longer has to worry about the encoding of your source code itself.

skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
1

My original question 'Registered Trademark: Why does strip remove ® but replace can't find it?' is no longer applicable. The problem isn't strip or replace, but how os.rename() deals with unicode characters. So, I added to my question.

Going off of what Cameron said, os.rename() seems like it doesn't work with unicode characters. (please correct me if this is wrong - I don't know much about this). shutil.move() ultimately gives the same result that os.rename() should have.

Despite ScottLawson's suggestion to use u'\u00AE' instead of '®', I could not get it to work.

Basically, use shutil.move(old_name,new_name) instead.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import shutil
import os

# from this answer: https://stackoverflow.com/q/1033424/3889452
def remove(value):
    deletechars = '®'
    for c in deletechars:
        value = value.replace(c,'')
    return value

for root, folders, files in os.walk(r'C:\Users\myname\da\data\originals\Word_4_0'):
    for f in files:
        rename = remove(f)
        shutil.move(os.path.join(root,f),os.path.join(root,rename))
    for folder in folders:
        rename = remove(folder)
        shutil.move(os.path.join(root,folder),os.path.join(root,rename))    

This also works for the immediate directory (based off of this) and catches more symbols, chars, etc. that aren't included in string.printable and ® doesn't have to appear in the python code.

import shutil
import os
import string

directory_path = r'C:\Users\myname\da\data\originals\Word_4_0'
for file_name in os.listdir(directory_path):
    new_file_name = ''.join(c for c in file_name if c in string.printable)
    shutil.move(os.path.join(directory_path,file_name),os.path.join(directory_path,new_file_name))
Community
  • 1
  • 1
skas
  • 79
  • 8