1

Is it possible to assign the Python Easter egg this to a variable?

Ideally what I would like to do is to try out the various string methods on the text, like how many times the word "the" appears.

I have tried the following:

import this

long_text = this

print long_text.count('the')

This prints out the Zen message but I see an error where the count method has been used:

...    
Namespaces are one honking great idea -- let's do more of those!
Traceback (most recent call last):
  File "count.py", line 5, in <module>
    print long_text.count('the')
AttributeError: 'module' object has no attribute 'count'

Thanks in advance.

Community
  • 1
  • 1
Ian Carpenter
  • 8,346
  • 6
  • 50
  • 82
  • 1
    Perhaps this question (and the source of the this module) can help: http://stackoverflow.com/questions/5855758/can-anyone-explain-me-the-source-code-of-python-import-this?lq=1 – Ben Sep 15 '14 at 20:32
  • @everyone: Blimey! Thanks for the rapid responses, as you can tell I am pretty new to Python, it is going to take me a while to properly digest all this information. Thanks - it's really appreciated. – Ian Carpenter Sep 15 '14 at 20:40

4 Answers4

6

The code in the this module is intentionally obfuscated code, as a joke,* so the string as-is doesn't directly appear anywhere.

However, if you look at the source (e.g. print inspect.getsource(this), or looking in the source repo), you'll see that the last line is:

print "".join([d.get(c, c) for c in s])

… which means the answer is:

text = "".join([this.d.get(c, c) for c in this.s])

Of course that's relying on undocumented, implementation-specific details, but then the this module itself is undocumented, implementation-specific details, as can be seen by looking at the library reference.


However, if you want something that works with any Python implementation that has a this module that prints things out even if it's implemented differently (which would be not a single Python implementation that exists, as far as I know…), you could always do this:

old_stdout = sys.stdout
sys.stdout = StringIO.StringIO()
try:
    import this
    text = sys.stdout.getvalue()
finally:
    sys.stdout = old_stdout

However, keep in mind that import only runs the code on first import, so if you do this twice in the same session, you're going to end up with '' the second time.


Another fun option is to download PEP 20 and parse it with your favorite HTML parser to extract the text. But I'll leave that as an exercise for the reader.


* In case you're wondering, the joke is that obfuscating source code, reimplementing rot13 from scratch, even though it's built into the stdlib, doing it with a nested loop, looping over 65 to 97 instead of the characters, looping over range(26) for a reason that you have to read the code multiple times to understand… all of this violates the Zen as badly as possible. See Barry Warsaw's post for a bit more background on their state of mind at the time they implemented it.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • You can also check the value of `this.__file__` which should guide you to the directory in your system where `this.py` can be found. – chepner Sep 15 '14 at 20:35
  • what are `this.i`, `this.c` etc..? – Padraic Cunningham Sep 15 '14 at 20:35
  • @PadraicCunningham Variables created in the for loop. – Ashwini Chaudhary Sep 15 '14 at 20:37
  • @PadraicCunningham: `this.i` and `this.c` are basically useless; they're the last two values in the loops used in the module. `this.d` and `this.s`, on the other hand, are the actual exported values from the module that you need. – abarnert Sep 15 '14 at 20:38
  • @chepner: But why, when `inspect.getsourcefile` or `inspect.source` does that for you, more robustly (e.g., when your stdlib is crammed in a zipfile) and more simply? – abarnert Sep 15 '14 at 21:07
  • I hadn't thought of `inspect` when I was working on my abandoned answer; it is the better way to get the source. I just thought it was interesting to see that `this.py` was (or as you point out, only might be) an actual file on disk, rather than something that the intepreter generated on the fly. – chepner Sep 15 '14 at 21:13
  • @chepner: If they'd had more time at IPC #10, it probably would have been generated on the fly, or at least a builtin… – abarnert Sep 15 '14 at 21:18
2

Just because no-one else has mentioned it yet:

from codecs import decode
from this import s

zen = decode(s, "rot13")
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
  • You don't need to `import` anything; just do `s.decode('rot13')`. – abarnert Sep 15 '14 at 21:04
  • @abarnert this version works across Python 2 and 3 unaltered, though :-) – Zero Piraeus Sep 15 '14 at 21:57
  • Are you sure? I'm pretty sure that `rot13` was one of the codecs removed in 3.0, added back in to 3.2 or 3.3 but without its aliases (so you can't call it `rot13`, you have to know the real name), and only fully restored in 3.4? – abarnert Sep 15 '14 at 23:05
  • Ah ... possibly; I only tested in 2.7 and 3.4. No point going overboard for something this trivial, though, and for the purposes of not having to edit I'm going to consider the situation you describe for 3.0—3.3 a bug ;-) – Zero Piraeus Sep 15 '14 at 23:17
  • It was only starting late in the 3.3 schedule that making it easier to write single-codebase apps that run in both 2.7 and 3.latest was even a goal… but it's amazing how many little things turned out to be harmless and useful to add back in, from `u` prefixes to codec aliases to `bytes.__mod__`. If only that had been obvious a few years earlier… – abarnert Sep 15 '14 at 23:41
1

You are currently assigning long_text to the module this, to get the text you can do:

>>> import this
>>> text = "".join([this.d.get(c, c) for c in this.s])
>>> print text.count('the')
5

Actual source code:

>>> this.__file__
'/usr/lib/python2.7/this.py'
>>> !cat /usr/lib/python2.7/this.py
s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""

d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print "".join([d.get(c, c) for c in s])
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
1

I wasn't going to write it, but… what the hell. Here is what's probably the only officially documented way to get the Zen—which will work in any Python implementation, even without the easter egg:

#!/usr/bin/env python

import HTMLParser
import urllib2

class ZenParser(HTMLParser.HTMLParser):
    def __init__(self, *args, **kwargs):
        HTMLParser.HTMLParser.__init__(self, *args, **kwargs)
        self.state = 'START'
    def handle_data(self, data):
        if self.state == 'CHECKING HEADER':
            if data.strip() == 'The Zen of Python':
                self.state = 'AWAITING ZEN'
            else:
                self.state = 'START'
        elif self.state == 'GATHERING ZEN':
            self.zen = data
            self.state = 'ENLIGHTENED BODDHISATVA'
    def handle_starttag(self, tag, attrs):
        if self.state == 'START' and tag == 'h3':
            self.state = 'CHECKING HEADER'
        elif self.state == 'AWAITING ZEN' and tag == 'pre':
            self.state = 'GATHERING ZEN'

def zen():
    r = urllib2.urlopen('http://legacy.python.org/dev/peps/pep-0020/')
    parser = ZenParser()
    parser.feed(r.read())
    return '\n'.join(line.lstrip() for line in parser.zen.splitlines())

print zen()

Or, if you're willing to use third-party libraries, BeautifulSoup (as usual) makes it a lot easier:

import urllib2
import bs4

def zen():
    r = urllib2.urlopen('http://legacy.python.org/dev/peps/pep-0020/')
    soup = bs4.BeautifulSoup(r.read())
    zen = soup.find('h3', text='The Zen of Python').find_next_sibling('pre').string
    return '\n'.join(line.lstrip() for line in parser.zen.splitlines())

print zen()
abarnert
  • 354,177
  • 51
  • 601
  • 671