Skip to content

requests库

约 2126 个字 524 行代码 预计阅读时间 16 分钟

requests库介绍

urllib包好吗?它确实很强大,实现了很多爬虫需要的功能,依靠urllib完全有可能构建起一个爬虫。

但是urllib还是不够方便!从前面我们对Handler和Opener的学习就可以发现了,urllib库有时候写起来十分的繁琐。这里就不得不搬出我们强大的...

requests库了!!

requests库提供了“更像人话”的方法,下面来让我们具体了解一下。

基本用法

GET请求

基本实例

如何用requests库构建GET请求?非常简单:

import requests

url = 'http://httpbin.org/get'
r = requests.get(url=url)
print(r.text)

# 输出为
'''
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.32.3", 
    "X-Amzn-Trace-Id": "Root=1-678520af-723f207302193b7b37a75067"
  }, 
  "origin": "112.49.107.93", 
  "url": "http://httpbin.org/get"
}
'''

如果要加入query参数呢?

import requests

url = 'http://httpbin.org/get'
data = {
    'name': 'Doris',
    'Age': 18
}

# 将url转化为'http://httpbin.org/get?name=Doris&Age=18'
r = requests.get(url=url, params=data)
print(r.text)

# 输出为
'''
{
  "args": {
    "Age": "18", 
    "name": "Doris"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.32.3", 
    "X-Amzn-Trace-Id": "Root=1-678522ec-4a08c035250d32e35d219dae"
  }, 
  "origin": "112.49.107.93", 
  "url": "http://httpbin.org/get?name=Doris&Age=18"
}
'''

哦天哪孩子们,这也太方便了!

那么requests.get()到底返回了一个啥呢?让我们用type()一探究竟:

import requests

url = 'http://httpbin.org/get'

r = requests.get(url=url)
print(type(r.text))

# 输出为
'''
<class 'str'>
'''

使用requests,get()返回的网页内容是str类型的,实际上其是JSON格式的字符串,我们可以调用json()方法进行解析:

import requests

url = 'http://httpbin.org/get'

r = requests.get(url=url)
print(type(r.json()))
print(r.json())

# 输出为
'''
<class 'dict'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-67852670-55fe9b1d17773c7e7be87d28'}, 'origin': '112.49.107.93', 'url': 'http://httpbin.org/get'}
'''

抓取二进制数据

图片,音频和视频这些文件本质上都是通过二进制码直接存储的,如果要抓取他们,就要拿到其的二进制码并且按照正确的方式进行解析。

Tips:
如果直接用.text()方法打开,会将原本的二进制码直接转化为字符串str,对于图片等文件会导致其乱码

我们来实际看看图片文件的内容,这里以GtiHub的神秘小GIF动画举例:

import requests

r = requests.get('https://github.githubassets.com/assets/mona-loading-dark-7701a7b97370.gif')

print(r.content)

# 输出为
'''
b'GIF89a\x80\x01\x80\x01\x91\x02\x00\xc9\xd1...
'''

如果我们要将抓到的文件保存下来,再用上面的例子来说明:

import requests

r = requests.get('https://github.githubassets.com/assets/mona-loading-dark-7701a7b97370.gif')

with open('favicon.gif', 'wb') as f:
    f.write(r.content)

这样子就可以在我们的根目录下生成一个相应的文件了。

添加headers

与urllib.request类似,我们也可以在requests.get()中加入headers参数,比如:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

r = requests.get('https://www.zhihu.com/explore', headers=headers)
print(r.text)

POST请求

如果我们想向服务器提交数据,就要用到POST请求,在先前urllib的学习当中,我们在urrllib.request里接触到了如何使用POST请求,那么requests库里有没有相应的实现方法呢?

有的,兄弟有的。而且实现方法简单的很:

import requests

data = {
    'name': 'Doris',
    'age': 19
}

r = requests.post('http://httpbin.org/post', data=data)
print(r.text)

# 输出为
'''
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "19", 
    "name": "Doris"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "17", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.32.3", 
    "X-Amzn-Trace-Id": "Root=1-6785f96d-52756a5739089f23414c9fb3"
  }, 
  "json": null, 
  "origin": "112.49.107.93", 
  "url": "http://httpbin.org/post"
}
'''

可以看到我们POST请求内提交的数据都出现在了返回的form部分。

在requests库内的post()方法下,data以字典类型传入,而不需要像在urllib库下需要以字节流的形式传入。

响应

发送请求后,得到的服务器的答复被称为“响应”,前面我们已经采用了text和content获取了相应的内容。当然,还有很多属性和方法可以用来获取响应中的信息,比如状态码,响应头,Cookies等等,示例如下:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

r = requests.get('http://www.jianshu.com', headers=headers)
print(type(r.status_code), r.status_code)
print(type(r.headers), r.headers)
print(type(r.cookies), r.cookies)
print(type(r.url), r.url)
print(type(r.history), r.history)

# 输出为
'''
<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Tue, 14 Jan 2025 05:51:34 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'ETag': 'W/"c739cde420f00db8555adaabe9ddf8f7"', 'Cache-Control': 'max-age=0, private, must-revalidate', 'Set-Cookie': 'locale=zh-CN; path=/', 'X-Request-Id': '23c966cb-8ac2-4c3d-bee0-3464b9aea031', 'X-Runtime': '0.004685', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie locale=zh-CN for www.jianshu.com/>]>
<class 'str'> https://www.jianshu.com/
<class 'list'> [<Response [302]>]
'''

上面实例中,我们采用status_code获得了状态码,headers获得了响应头,cookies得到了Cookies,url获得了URL信息,利用history获得了请求历史(用于监测重定向)。

由于状态码是反馈请求情况的重要参照,requests库还提供了一个内置的状态码查询对象request.codes,实例如下:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

r = requests.get('http://www.jianshu.com', headers=headers)
if r.status_code == requests.codes.ok:
    print('Request Successfully!')
else:
    print('Something Wrong!')

# 输出为:
'''
Request Successfully!
'''

codes的构造定义(_code)为:

_codes = {
    # Informational.
    100: ("continue",),
    101: ("switching_protocols",),
    102: ("processing", "early-hints"),
    103: ("checkpoint",),
    122: ("uri_too_long", "request_uri_too_long"),
    200: ("ok", "okay", "all_ok", "all_okay", "all_good", "\\o/", "✓"),
    201: ("created",),
    202: ("accepted",),
    203: ("non_authoritative_info", "non_authoritative_information"),
    204: ("no_content",),
    205: ("reset_content", "reset"),
    206: ("partial_content", "partial"),
    207: ("multi_status", "multiple_status", "multi_stati", "multiple_stati"),
    208: ("already_reported",),
    226: ("im_used",),
    # Redirection.
    300: ("multiple_choices",),
    301: ("moved_permanently", "moved", "\\o-"),
    302: ("found",),
    303: ("see_other", "other"),
    304: ("not_modified",),
    305: ("use_proxy",),
    306: ("switch_proxy",),
    307: ("temporary_redirect", "temporary_moved", "temporary"),
    308: (
        "permanent_redirect",
        "resume_incomplete",
        "resume",
    ),  # "resume" and "resume_incomplete" to be removed in 3.0
    # Client Error.
    400: ("bad_request", "bad"),
    401: ("unauthorized",),
    402: ("payment_required", "payment"),
    403: ("forbidden",),
    404: ("not_found", "-o-"),
    405: ("method_not_allowed", "not_allowed"),
    406: ("not_acceptable",),
    407: ("proxy_authentication_required", "proxy_auth", "proxy_authentication"),
    408: ("request_timeout", "timeout"),
    409: ("conflict",),
    410: ("gone",),
    411: ("length_required",),
    412: ("precondition_failed", "precondition"),
    413: ("request_entity_too_large", "content_too_large"),
    414: ("request_uri_too_large", "uri_too_long"),
    415: ("unsupported_media_type", "unsupported_media", "media_type"),
    416: (
        "requested_range_not_satisfiable",
        "requested_range",
        "range_not_satisfiable",
    ),
    417: ("expectation_failed",),
    418: ("im_a_teapot", "teapot", "i_am_a_teapot"),
    421: ("misdirected_request",),
    422: ("unprocessable_entity", "unprocessable", "unprocessable_content"),
    423: ("locked",),
    424: ("failed_dependency", "dependency"),
    425: ("unordered_collection", "unordered", "too_early"),
    426: ("upgrade_required", "upgrade"),
    428: ("precondition_required", "precondition"),
    429: ("too_many_requests", "too_many"),
    431: ("header_fields_too_large", "fields_too_large"),
    444: ("no_response", "none"),
    449: ("retry_with", "retry"),
    450: ("blocked_by_windows_parental_controls", "parental_controls"),
    451: ("unavailable_for_legal_reasons", "legal_reasons"),
    499: ("client_closed_request",),
    # Server Error.
    500: ("internal_server_error", "server_error", "/o\\", "✗"),
    501: ("not_implemented",),
    502: ("bad_gateway",),
    503: ("service_unavailable", "unavailable"),
    504: ("gateway_timeout",),
    505: ("http_version_not_supported", "http_version"),
    506: ("variant_also_negotiates",),
    507: ("insufficient_storage",),
    509: ("bandwidth_limit_exceeded", "bandwidth"),
    510: ("not_extended",),
    511: ("network_authentication_required", "network_auth", "network_authentication"),
}

我们可以根据需要选取上面对应的关键词进行状态比较,比如requests.codes.not_found对应的就是404

高级用法

让我们用requests库大干一场吧!

文件上传

之前我们POST数据都是简单的一些字典类型的data,如果我们要向网站提交文件,应该如何做呢?

幸运的是,requests.post()里提供了files参数,支持传入字典类型的参数,我们只需要提前载入文件就可以方便提交了,举一实例如下:

import requests

files = {'github_cats': open('favicon.gif', 'rb')}
r = requests.post('http://httpbin.org/post', files=files)
print(r.text)

# 输出为
{
  "args": {}, 
  "data": "", 
  "files": {
    "github_cats": "data:application/octet-stream;base64,R0lGODlhgAGAAZECAMnR2UhPWP...
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18432", 
    "Content-Type": "multipart/form-data; boundary=25c11f231093593e0cde9a84f337e9be", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.32.3", 
    "X-Amzn-Trace-Id": "Root=1-6786020a-4abb40c84d91afea10617b50"
  }, 
  "json": null, 
  "origin": "112.49.107.93", 
  "url": "http://httpbin.org/post"
}

由于返回的files内容过多,这里省略一部分。

与前面的POST请求相比,这里的form部分是空的,可以证明上传文件会单独在files内标识。

Cookies

用urllib获取Cookies不能说简单易懂,至少也是繁文缛节,使用requests,让你的Cookies获取变得易如反掌!

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

r = requests.get('http://www.baidu.com', headers=headers)
print(r.cookies)

for item in r.cookies:
    print(item.name + '=' + item.value)

# 输出为
'''
<RequestsCookieJar[<Cookie H_PS_PSSID=61027_61219_60853_61362_61609_61543_61736_61780_61815 for .baidu.com/>, <Cookie BAIDUID_BFESS=26A2BE5FD831B08BB2D76E4B8614DB85:FG=1 for .baidu.com/>, <Cookie BDSVRTM=4 for www.baidu.com/>, <Cookie BD_HOME=1 for www.baidu.com/>]>
H_PS_PSSID=61027_61219_60853_61362_61609_61543_61736_61780_61815
BAIDUID_BFESS=26A2BE5FD831B08BB2D76E4B8614DB85:FG=1
BDSVRTM=4
BD_HOME=1
'''

当然,cookies属性也提供了items()方法,实例如下:

for name, value in r.cookies.items():
    print(name + '=' + value)

items()方法将获得的Cookies转化为了元组形式

当我们获得了Cookies,就可以干很多事情了,比如获得并维持登录状态:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Cookie': "buvid3=A62121A8-26DD-28CE-A621-73F2AB02B23D18869infoc; b_nut=1728225418; ...
}

r = requests.get('https://www.bilibili.com', headers=headers)
print(r.text)

# 输出为
'''<!DOCTYPE html>
<html lang="zh-CN" class="gray">
  <head>
    <meta charset="UTF-8" />
    <title>哔哩哔哩 (゜-゜)つロ 干杯~-bilibili</title>
    <meta
      name="description"
      content="哔哩哔哩(bilibili.com)是国内知名的视频弹幕网站,这里有及时的动漫新番,活跃的ACG氛围,有创意的Up主。大家可以在这里找到许多欢乐。"
    />
    <meta
      name="keywords"
      content="bilibili,哔哩哔哩,哔哩哔哩动画,哔哩哔哩弹幕网,弹幕视频,B站,弹幕,字幕,AMV,MAD,MTV,ANIME,动漫,动漫音乐,游戏,游戏解说,二次元,游戏视频,ACG,galgame,动画,番组,新番,初音,洛天依,vocaloid,日本动漫,国产动漫,手机游戏,网络游戏,电子竞技,ACG燃曲,ACG神曲,追新番,新番动漫,新番吐槽,巡音,镜音双子,千本樱,初音MIKU,舞蹈MMD,MIKUMIKUDANCE,洛天依原创曲,洛天依翻唱曲,洛天依投食歌,洛天依MMD,vocaloid家族,OST,BGM,动漫歌曲,日本动漫音乐,宫崎骏动漫音乐,动漫音乐推荐,燃系mad,治愈系mad,MAD MOVIE,MAD高燃"
    />
    <meta name="renderer" content="webkit" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="spm_prefix" content="333.1007" />
    <meta name="referrer" content="no-referrer-when-downgrade" />
    <meta name="applicable-device" content="pc">
    <meta http-equiv="Cache-Control" content="no-transform" />...
'''

由于使用的Cookies与获取的html太长,故作省略。

上面实例中,通过将登录状态下的Cookies截取并添加在headers中,使得爬虫也获得了和我们所使用Cookies来源一样的“登录身份”。

BTW,它现在还是个大会员呢()

当然,也可以通过cookies参数进行设置,只不过很繁琐就是了,举一实例如下:

import requests

cookies = "buvid3=A62121A8-26DD-28CE-A621-73F2AB02B23D18869infoc; b_nut=1728225418; ..."
jar = requests.cookies.RequestsCookieJar()
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

# 对Cookie信息进行切片,然后赋值给RequestCookieJar类的对象
for cookie in cookies.split(';'):
    name, value =cookie.split('=', 1)
    jar.set(name=name, value=value)

r = requests.get('https://www.bilibili.com', cookies=jar, headers=headers)
print(r.text)

这里我们采用的方法与前面使用urllib的时候有点类似,首先实例化了一个RequestCookieJar类的对象,然后通过对Cookies的处理传入对象,构建起我们需要的Jar。

会话维持

在requests中,使用get(),post()和其他方法模拟网页请求时,即使访问的是相同的URL,其“会话”也是不同的,这意味着我们前脚刚上传的数据,后脚就请求不到了,因为会话变了。

当然,如果我们使用相同的Cookies当然可以提醒服务器与我们保持相同的会话,但是每次都要去获取最新的Cookies并加载略显繁琐。

这不是requests库的风格!

接下来让Session对象登场。

利用Session对象,我们可以维持相同的会话而无需每次都设置一次Cookies,一切有关工作都交给Session吧,它会处理好的:

import requests

with requests.Session() as web:
    web.get('http://httpbin.org/cookies/set/number/12345')
r = web.get('http://httpbin.org/cookies')
print(r.text)

# 输出为
'''
{
  "cookies": {
    "number": "12345"
  }
}
'''

如果我们采用一般的方法进行请求:

import requests

requests.get('http://httpbin.org/cookies/set/number/12345')
r = requests.get('http://httpbin.org/cookies')
print(r.text)

# 输出为
'''
{
  "cookies": {}
}
'''

可以见到在前后多次访问一个URL的时候,Session成功维持了会话状态。

由于Session在维持会话状态山十分的便捷,它通常用在模拟登录成功后的下一步操作,这样子就可以维持登录状态了。

SSL证书验证

证书是一个运用在HTTPS下的技术,用于保证我们访问的网页是安全可信的,通常这么一个用于证明服务器身份的证书是由一个可信的第三方机构进行颁发的。我们用的浏览器会对网站的证书进行检查,所以如果一个网站的证书没有被官方CA机构验证,或者过期了,浏览器都会提示我们网页存在风险。

比如之前的12306(偷笑)

requests库同样提供了证书验证的功能,当发送HTTP请求时候,如果verify参数为True(默认为True),那么它就会检查网站证书的合法性。

下面我们用一个相关教学用网站进行演示:

import requests

try:
    r = requests.get('https://untrusted-root.badssl.com/')
except requests.exceptions.SSLError as e:
    print(e)
else:
    print('pass SSL!')

# 输出为
'''
HTTPSConnectionPool(host='untrusted-root.badssl.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)')))
'''

如果我们将verify参数设置为False:

import requests

try:
    r = requests.get('https://untrusted-root.badssl.com/', verify=False)
except requests.exceptions.SSLError as e:
    print(e)
else:
    print('pass SSL!')

# 输出为
'''
/home/yangshu233/python projects/crawler/ForCrawler/lib/python3.12/site-packages/urllib3/connectionpool.py:1097: InsecureRequestWarning: Unverified HTTPS request is being made to host 'untrusted-root.badssl.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
pass SSL!
'''

不过即便我们请求成功,Python仍然向我们发出警告,但是这个警告可以通过设置忽略警告的方式,或者捕获警告日志的方法来过滤它。

# 忽略警告的方式
import requests.packages.urllib3
requests.packages.urllib3.disable_warning()

# 通过日志捕获警告
import logging
logging.captureWarnings(True)

当然,如果我们有一个本地证书作为客户端证书,可以通过cert参数进行设置。

代理设置

正如在学习urllib时候我们所了解到的,设置代理对爬虫的正常运行有着极其重要的作用。

requests提供了十分方便的方法实现对代理的设置,举例如下:

# 等待研究

超时设置

如果本机网络不佳,服务器响应太慢甚至网络堵塞时,我们的爬虫可能要花费大量的时间在等待响应上,最终甚至可能报错,为了防止这种情况的出现,设置一个合理的超时时间就显得十分的重要。

要在requests库下实现超时设置,就需要用到timeout参数,这里的时间指的是从请求发出到收到服务器响应的时间:

# 超时设置

import requests

try:
    r = requests.get('http://www.baidu.com', timeout = 1)
    print(r.status_code)
except requests.exceptions.ReadTimeout as e:
    print(e)

身份认证

访问一些网站难免遇到需要身份认证的情况,requests库自带的身份认证功能就该登场了!

import requests

from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth('admin', 'admin')

# 此处auth可以直接传入('admin', 'admin'),不一定要HTTPBasicAuth
r = requests.get('http://192.168.0.1', auth=auth)
print(r.status_code)
print(r.text)

# 输出为
'''
200
<script type="text/javascript">
var framePara=new Array(
0,
0,0 );
</script>
<META http-equiv="Pragma" content="no-cache">
<META http-equiv="Expires" content="wed, 26 Feb 1997 08:21:57 GMT">
<link href="/dynaform/css_main.css" rel="stylesheet" />
<script src="/dynaform/common.js" type="text/javascript"></script>
<script type="text/javascript"><!--
function Click(){return false;}
document.oncontextmenu=Click;
function doPrev(){history.go(-1);}
//--></script>

<HTML><HEAD><TITLE>TL-WR842N</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META http-equiv=pragma content=no-cache>
<META http-equiv=expires content="wed, 26 Feb 1997 08:21:57 GMT">
<META content="MSHTML 6.00.2900.2912" name=GENERATOR>
<script language="JavaScript">
if(window != window.parent)
{
    window.parent.location.href = "/userRpm/Index.htm";
}
</script>
</HEAD>
<script language="JavaScript"><!--
document.write('<FRAMESET border="0" rows="94,*" cols="*" frameBorder="NO" frameSpacing="0" >');
document.write('<FRAMESET border="0" rows="94" cols="185,*" frameBorder="NO" frameSpacing="0" >'); 
document.write('<FRAME name="topLeftFrame" src="../frames/logo.htm" scrolling="no" noResize></FRAME>');
document.write('<FRAME name="topRightFrame"  src="../frames/banner.htm" scrolling="no" noResize></FRAME>');
document.write('</FRAMESET>');
document.write('<FRAMESET border="0" rows="*" cols="145,40,*" frameSpacing="0">');
document.write('<FRAME name="bottomLeftFrame"  src="../userRpm/MenuRpm.htm" noResize scrolling="auto"></FRAME>');<!--设置菜单树可以滚动,因为有些机型功能比较多,如果不可滚动的话导致某些菜单条目显示不全-->
document.write('<FRAME name="arcFrame" src="../frames/arc.htm"  noResize frameborder="NO"></FRAME>');
var url = "../userRpm/StatusRpm.htm";
if (framePara[0] == 1)
{
        url = "../userRpm/WzdStartRpm.htm";
}
else if (framePara[0] == 255)
{
        url = "../userRpm/WlanSecurityRpm.htm";
}
document.write('<FRAME name="mainFrame" src="' + url + '" frameborder="NO"></FRAME>');
document.write('</FRAMESET>');
document.write('</FRAMESET>');
--></script>
<noframes><body>对不起,您的浏览器不支持框架!</body></noframes>
</html>
'''

当然requests还支持其他的认证方式,比如OAuth认证,但首先需要安装oauth包,更多用关于OAuth的可以参照此处

Prepared Request

在先前的urllib学习中,我们曾经采用了Request对象来组织我们请求的数据结构,在requests库里同样可以做到:

from requests import Request, Session

url = 'http://httpbin.org/post'
data = {
    'name': 'Doris'
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}

web = Session()
req = Request(method='POST', url=url, headers=headers, data=data)
s = web.prepare_request(req)
r = web.send(s)
print(r.text)

# 输出为
'''
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name": "Doris"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "10", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-678683bb-342c30456c73fb61329b4172"
  }, 
  "json": null, 
  "origin": "112.49.107.93", 
  "url": "http://httpbin.org/post"
}
'''

这里我们实例化了一个Request对象,并利用Session下的prepare_request()将其转化为一个Prepare Request对象,然后用send()发送请求。