Python-查天气（4）

yfeer • 2017年11月10日 21:00 • Python

天气网的城市代码信息结构比较复杂，所有代码按层级放在了很多xml为后缀的文件中。而这些所谓的“xml”文件又不符合xml的格式规范，导致在浏览器中无法显示，给我们的抓取又多加了一点难度。

首先，抓取省份的列表：

url1 = 'http://m.weather.com.cn/data5/city.xml'
content1 = urllib2.urlopen(url1).read()
provinces = content1.split(',')

输出content1可以查看全部省份代码：

01|北京,02|上海,03|天津,...

对于每个省，抓取城市列表：

url = 'http://m.weather.com.cn/data3/city%s.xml'
for p in provinces:
p_code = p.split('|')[0]
url2 = url % p_code
content2 = urllib2.urlopen(url2).read()
cities = content2.split(',')

输出content2可以查看此省份下所有城市代码：
1901|南京,1902|无锡,1903|镇江,...

再对于每个城市，抓取地区列表：

for c in cities[:3]:
c_code = c.split('|')[0]
url3 = url % c_code
content3 = urllib2.urlopen(url3).read()
districts = content3.split(',')

content3是此城市下所有地区代码：
190101|南京,190102|溧水,190103|高淳,...

最后，对于每个地区，我们把它的名字记录下来，然后再发送一次请求，得到它的最终代码：

for d in districts:
d_pair = d.split('|')
d_code = d_pair[0]
name = d_pair[1]
url4 = url % d_code
content4 = urllib2.urlopen(url4).read()
code = content4.split('|')[1]

name和code就是我们最终要得到的城市代码信息。它们格式化到字符串中，最终保存在文件里：

line = " '%s': '%s',\n" % (name, code)
result += line

同时你也可以输出它们，以便在抓取的过程中查看进度：

print name + ':' + code

版权声明：
作者：yfeer
链接：https://www.yfeer.com/282.html
来源：个人编程学习网
文章版权归作者所有，未经允许请勿转载。

THE END

二维码

搜索内容

Python-查天气（4）

取消回复

共有 0 条评论