Following the very nice explanations on this blog post about "Logging in With Requests" and the code snippet from this answer to a question on SO about 'How to “log in” to a website using Python's Requests module?', I have the following code (*) in order to enter and navigate through a website with authentification:
import requests, lxml.html
logurl = 'http://www.somesite.fr/subsite/'
url2 = 'http://www.somesite.fr/subsite/anotherpath/1135'
with requests.session() as s:
login = s.get(logurl)
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
form['email'] = 'myemail'
form['password'] = 'mypassword'
response = s.post(logurl, data=form)
r2 = s.get(url2)
If I print form:
{'form_action': 'connexion',
'CSRFGuard_token': '762bd944c74e4194db5248279a80bc3eba8e417f0439af2701364e39c0e4b67376c0afc19ba05f2b8fd98ce3b14ac9625d59827b19f2134b4da98c43bef2b57a',
'password': 'mypassword',
'email': 'myemail'}
With r2 = s.get(url2), I am trying to navigate into this website after authentification. url2 is the url I get when I navitage "manually" after logging in in logurl, and the html (and appearances) of these two pages are well different. But if I do print response.text and r2.text, I get exactly the same html code, i.e. the one of the login page. I conclude that the logging in was not successful, or that the session does not keep this status...
What am I doing wrong? Thanks!
EDIT
Running the code suggested by Brian M. Sheldon:
import logging
import requests
# enable debug logging with basic logging config
logging.basicConfig(level=logging.DEBUG)
with requests.session() as s:
s.headers['user-agent'] = 'myapp' # use non-default user-agent
response = s.post(logurl, data={'email': 'myemail', 'password': 'mypassword'})
print response.headers
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.somesite.fr
DEBUG:requests.packages.urllib3.connectionpool:http://www.somesite.fr:80 "POST /subsite/ HTTP/1.1" 200 1415
and response.headers is:
{'Content-Length': '1415', 'Content-Encoding': 'gzip', 'Set-Cookie': 'PHPSESSID=741q7fj6pnkdl1ho4pr6s35cl1; path=/', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Vary': 'Accept-Encoding,Origin', 'Keep-Alive': 'timeout=5, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Date': 'Tue, 25 Apr 2017 14:57:52 GMT', 'Content-Type': 'text/html; charset=UTF-8'}
s.cookies is:
RequestsCookieJar[ Cookie PHPSESSID=t9t9gvt7enp70v5mb2viebr8v0 for www.somsite.fr/ ]>
and s.get(url2) gives:
DEBUG:requests.packages.urllib3.connectionpool:http://www.somesite.fr:80 "GET /subsite/anotherpath/1135 HTTP/1.1" 200 1378
Does it help to understand what I am doing wrong?
PS: apparently the field has been moving fast the last years, and some answers from a few years ago already appear obsolete/replaced by better options. From my readings, I think Requests is the best to achieve what I want, but other solutions are welcome too. And if I forgot some useful info, please let me know and I'll edit.
(*) I am sorry but my problem is with a website with authentification and I cannot give a reproducible example.