该模块定义了一个标准接口,用于分解组件中的统一资源定位符(URL)字符串(协议,域名服务器,路径等),将组件组合回URL字符串,并将“相对URL”转换为给定“原始URL”的绝对URL。
urlparse() 解析URL
>>> url='http://xinwen.eastday.com/a/n181106070849091.html?qid=news.baidu.com' >>> parse_res=parse.urlparse(url) >>> parse_res ParseResult(scheme='http', netloc='xinwen.eastday.com', path='/a/n181106070849091.html', params='', query='qid=news.baidu.com', fragment='')urlparse 会将URL解析为六个部分,返回一个6 元素的元组,每个元组项目都是一个字符串。其中 scheme 是协议 netloc 是域名服务器 path 相对路径 params是参数,query是查询的条件。urlparse只有在netloc前有 “//” 的情况下才能正确解析netloc。否则将netloc解析为path
>>> parse.urlparse('//www.cwi.nl:80/%7Eguido/Python.html') ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment='') >>> parse.urlparse('www.cwi.nl/%7Eguido/Python.html') ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',params='', query='', fragment='') >>> parse.urlparse('help/Python.html') ParseResult(scheme='', netloc='', path='help/Python.html', params='',query='', fragment='') #获取path >>> parse_res.path '/a/n181106070849091.html'urlunparse() 函数将各组成部分合并回一个URL
>>> parse.urlunparse(('http','xinwen.eastday.com','/a/n181106070849091.html','','qid=news.baidu.com','')) 'http://xinwen.eastday.com/a/n181106070849091.html?qid=news.baidu.com'urljoin() 组合绝对路径和相对路径
base = "http://spam.egg/my/little/pony" for path in "/index", "goldfish", "../black/cat": print(path, "=>", parse.urljoin(base, path))以下是运行结果
/index => http://spam.egg/index goldfish => http://spam.egg/my/little/goldfish ../black/cat => http://spam.egg/my/black/catparse_qs() 将查询参数解析成字典
>>> parse.parse_qs('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w') {'wd': ['w']} >>> parse.parse_qs('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') {'wd': ['w', 'p']} >>> parse.parse_qsl('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') [('wd', 'w'), ('wd', 'p')]parse_qsl() 将查询参数解析成列表
>>> parse.parse_qsl('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') [('wd', 'w'), ('wd', 'p')]urlencode() 将列表转换成查询字符串
>>> qs=[('wd', 'w'), ('wd', 'p')] >>> parse.urlencode(qs) 'wd=w&wd=p' >>> qs={'wd': ['w', 'p']} >>> parse.urlencode(qs) 'wd=%5B%27w%27%2C+%27p%27%5D'urlsplit() 功能和urlparse() 差不多
>>> parse.urlsplit('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') SplitResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', query='wd=w&wd=p', fragment='')urlunsplit() 和并urlsplit()返回的元组
>>> parse.urlunsplit(parse.urlsplit('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p/#')) 'http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p/'