上传文件
1 | >>> r = requests.post('http://httpbin.org/post',files=files) |
获得cookie
打开网页时, 每一个页面都是不连续的, 没有关联的, cookies 就是用来衔接一个页面和另一个页面的关系. 比如说当我登录以后, 浏览器为了保存我的登录信息, 将这些信息存放在了 cookie 中. 然后我访问第二个页面的时候, 保存的 cookie 被调用, 服务器知道我之前做了什么, 浏览了些什么. 像你在网上看到的广告, 为什么都可能是你感兴趣的商品? 你登录淘宝, 给你推荐的为什么都和你买过的类似? 都是 cookies 的功劳, 让服务器知道你的个性化需求.
所以大部分时候, 每次你登录, 你就会有一个 cookies, 里面会提到你已经是登录状态了. 所以 cookie 在这时候很重要. cookies 的传递也特别重要, 比如我用 requests.post + payload 的用户信息发给网页, 返回的 r 里面会有生成的 cookies 信息. 接着我请求去登录后的页面时, 使用 request.get, 并将之前的 cookies 传入到 get 请求. 这样就能已登录的名义访问 get 的页面了。1
2
3
4
5
6
7>>> r = requests.get('http://www.baidu.com')
>>> print(r.cookies)
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
>>> for key,value in r.cookies.items():
print(key+'='+value)
BDORZ=27315
使用cookie1
2cookies = r.cookies
r = requests.get('http://www.baidu.com',cookies=cookies)
会话维持(模拟登录)
使用2次requests.get,实际上相当于打开两个浏览器,cookie不存在了1
2
3
4
5
6
7
8
9requests.get('http://httpbin.org/cookies/set/number/1234')
Out[6]: <Response [200]>
r.cookies
Out[7]: <RequestsCookieJar[Cookie(version=0, name='BDORZ', value='27315', port=None, port_specified=False, domain='.baidu.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1548904980, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>
r = requests.get('http://httpbin.org/cookies')
print(r.text)
{
"cookies": {}
}
实现在同一个浏览器中使用Session保存cookies,维持登录会话,在这里我们请求了两次,一次是设置 cookies,一次是获得 cookies
1 | s = requests.Session() |
那么既然会话是一个全局的变量,那么我们肯定可以用来全局的配置了。1
2
3
4
5
6import requests
s = requests.Session()
s.headers.update({'x-test': 'true'})
r = s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})
print(r.text)
通过 s.headers.update 方法设置了 headers 的变量。然后我们又在请求中设置了一个 headers,两个变量都会传过去
运行结果1
2
3
4
5
6
7
8
9
10
11
{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.9.1",
"X-Test": "true",
"X-Test2": "true"
}
}
如果get方法传的headers 同样也是 x-test1
r = s.get('http://httpbin.org/headers', headers={'x-test': 'true'})
它会覆盖全局配置:1
2
3
4
5
6
7
8
9{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.9.1",
"X-Test": "true"
}
}
如果不想要全局配置中的一个变量,设置为 None 即可1
r = s.get('http://httpbin.org/headers', headers={'x-test': None})
运行结果1
2
3
4
5
6
7
8{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.9.1"
}
}
证书验证
可以使用verify=False设置不验证证书,但是可能产生警告1
2
3
4
5r = requests.get('https://www.12306.cn',verify=False)
C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
可以引入包消除警告信息:1
2
3from requests.packages import urllib3
urllib3.disable_warnings()
r = requests.get('https://www.12306.cn',verify=False)
或者手动指定证书:1
r = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))
代理设置
1 | proxies = {'http':'http://127.0.0.1:1080','https':'https://127.0.0.1:1080'} |
如果代理有用户名密码,可以使用1
proxies={'http':'http://user:password@127.0.0.1:1080'}
传入用户名和密码的信息。
超时设置
表示想乣在0.5秒内需要得到应答,如果没得到应答会报错。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57r = requests.get('http://httpbin.org/get',timeout=0.2)
Traceback (most recent call last):
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\util\connection.py", line 83, in create_connection
raise err
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 964, in send
self.connect()
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connection.py", line 166, in connect
conn = self._new_conn()
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connection.py", line 146, in _new_conn
(self.host, self.timeout))
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x00000205DE410DA0>, 'Connection to httpbin.org timed out. (connect timeout=0.2)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\urllib3\util\retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /get (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x00000205DE410DA0>, 'Connection to httpbin.org timed out. (connect timeout=0.2)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-43-4b2c974088fc>", line 1, in <module>
r = requests.get('http://httpbin.org/get',timeout=0.2)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\Users\34924\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\adapters.py", line 496, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /get (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x00000205DE410DA0>, 'Connection to httpbin.org timed out. (connect timeout=0.2)'))
可以引入try except处理:1
2
3
4
5from requests.exceptions import ReadTimeout
try:
r = requests.get('http://httpbin.org/get', timeout=0.1)
except ReadTimeout:
print("timeout!")
认证设置
遇到必须登录才能进入的网页,可以使用这个1
2from requests.auth import HTTPBasicAuth
r = requests.get('http://xx.xx.xx.xx',auth=HTTPBasicAuth('user','123'))
异常处理
1 | from requests.exceptions import ReadTimeout,HTTPError,RequestException |
所有requests包的error,都能被RequestException捕获到。