Python urlparse方法总结

it2023-11-05  74

Python urlparse用法

这里写目录标题

Python urlparse用法1.调用库2.方法总结


该模块定义了一个标准接口,用于分解组件中的统一资源定位符(URL)字符串(协议,域名服务器,路径等),将组件组合回URL字符串,并将“相对URL”转换为给定“原始URL”的绝对URL。

1.调用库

from urllib import parse

2.方法总结

urlparse() 解析URL

>>> url='http://xinwen.eastday.com/a/n181106070849091.html?qid=news.baidu.com' >>> parse_res=parse.urlparse(url) >>> parse_res ParseResult(scheme='http', netloc='xinwen.eastday.com', path='/a/n181106070849091.html', params='', query='qid=news.baidu.com', fragment='')

urlparse 会将URL解析为六个部分,返回一个6 元素的元组,每个元组项目都是一个字符串。其中 scheme 是协议 netloc 是域名服务器 path 相对路径 params是参数,query是查询的条件。urlparse只有在netloc前有 “//” 的情况下才能正确解析netloc。否则将netloc解析为path

>>> parse.urlparse('//www.cwi.nl:80/%7Eguido/Python.html') ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment='') >>> parse.urlparse('www.cwi.nl/%7Eguido/Python.html') ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',params='', query='', fragment='') >>> parse.urlparse('help/Python.html') ParseResult(scheme='', netloc='', path='help/Python.html', params='',query='', fragment='') #获取path >>> parse_res.path '/a/n181106070849091.html'

urlunparse() 函数将各组成部分合并回一个URL

>>> parse.urlunparse(('http','xinwen.eastday.com','/a/n181106070849091.html','','qid=news.baidu.com','')) 'http://xinwen.eastday.com/a/n181106070849091.html?qid=news.baidu.com'

urljoin() 组合绝对路径和相对路径

base = "http://spam.egg/my/little/pony" for path in "/index", "goldfish", "../black/cat": print(path, "=>", parse.urljoin(base, path))

以下是运行结果

/index => http://spam.egg/index goldfish => http://spam.egg/my/little/goldfish ../black/cat => http://spam.egg/my/black/cat

parse_qs() 将查询参数解析成字典

>>> parse.parse_qs('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w') {'wd': ['w']} >>> parse.parse_qs('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') {'wd': ['w', 'p']} >>> parse.parse_qsl('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') [('wd', 'w'), ('wd', 'p')]

parse_qsl() 将查询参数解析成列表

>>> parse.parse_qsl('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') [('wd', 'w'), ('wd', 'p')]

urlencode() 将列表转换成查询字符串

>>> qs=[('wd', 'w'), ('wd', 'p')] >>> parse.urlencode(qs) 'wd=w&wd=p' >>> qs={'wd': ['w', 'p']} >>> parse.urlencode(qs) 'wd=%5B%27w%27%2C+%27p%27%5D'

urlsplit() 功能和urlparse() 差不多

>>> parse.urlsplit('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p') SplitResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', query='wd=w&wd=p', fragment='')

urlunsplit() 和并urlsplit()返回的元组

>>> parse.urlunsplit(parse.urlsplit('http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p/#')) 'http://www.cwi.nl:80/%7Eguido/Python.html?wd=w&wd=p/'
最新回复(0)