Python爬虫正则表达式匹配多个给定字符串间的内容

如题所述

第1个回答  2019-11-29
你的正则表达式使用了贪婪模式的匹配(.*),应该用非贪婪模式,正则表达式应该为<a
href="/(.*?)-desktop-wallpapers.html
完整的python语言程序如下
#!/usr/bin/python3

import re

a = '<html><body><p>[<a href="/aero-desktop-wallpapers.html" title="Aero HD Wallpapers">Aero</a>, <a href="/animals-desktop-wallpapers.html" title="Animals HD Wallpapers">Animals</a>, <a href="/architecture-desktop-wallpapers.html" title="Architecture HD Wallpapers">Architecture</a>,Wallpapers">Artistic</a>, ........(省略)......... <a href="/vintage-desktop-wallpapers.html" title="Vintage HD Wallpapers">Vintage</a>]</p></body></html>'
titles = re.findall('<a href="/(.*?)-desktop-wallpapers.html',str(a))
print (titles)
运行结果
['aero', 'animals', 'architecture', 'vintage']
相似回答