使用Python解析Excel
使用Python解析Excel
解析Excel文件常用的module有pyExcelerator与xlrd,目前这两个模块只能支持excel2004以下版本生成的文件。
我写了一个从excel文件读取工资,并给员工发送工资条邮件的程序放在了github上,里面有一些解析excel文件细节的代码,有兴趣的同学可以下回去看看。
代码存放地址: https://github.com/mlf4aiur/payslip
pyExcelerator范例
查看pyExcelerator的tools中xls2txt.py了解到pyExcelerator使用parse_xls方法解析excel文件。
调用parse_xls方法后,生成的结果为list,list的结构示例如下:
[(u'Sheet1', {(0, 0): 111, (0, 1): 112, (1, 0): 121, (1, 1): 122}),
(u'Sheet2', {(0, 0): 211, (0, 1): 212, (1, 0): 221, (1, 1): 222}),
(u'Sheet3', {})]
安装pyExcelerator
sudo pip install pyExcelerator
localhost:Share 4aiur$ ipython
Leopard libedit detected.
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
Type "copyright", "credits" or "license" for more information.
IPython 0.10 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from pyExcelerator import *
In [2]: l = parse_xls('foo.xls')
In [3]: l
Out[3]:
[(u'Sheet1', {(0, 0): 111, (0, 1): 112, (1, 0): 121, (1, 1): 122}),
(u'Sheet2', {(0, 0): 211, (0, 1): 212, (1, 0): 221, (1, 1): 222}),
(u'Sheet3', {})]
In [4]:
了解到解析的数据结构后,既可根据自己的需求进行数据分析。
localhost:Share 4aiur$ cat > foo.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from pyExcelerator import *
l = parse_xls('foo.xls', 'utf-8') # parse_xls(arg) -- default encoding
for sheet_name, values in l:
print 'Sheet = "%s"' % sheet_name.encode('utf-8', 'backslashreplace')
for row_idx, col_idx in sorted(values.keys()):
v = values[(row_idx, col_idx)]
if isinstance(v, unicode):
v = v.encode('utf-8', 'backslashreplace')
else:
v = str(v)
print '(%d, %d) =' % (row_idx, col_idx), v
localhost:Share 4aiur$ chmod +x foo.py
localhost:Share 4aiur$ ./foo.py
Sheet = "Sheet1"
(0, 0) = 111
(0, 1) = 112
(1, 0) = 121
(1, 1) = 122
Sheet = "Sheet2"
(0, 0) = 211
(0, 1) = 212
(1, 0) = 221
(1, 1) = 222
Sheet = "Sheet3"
localhost:Share 4aiur$
安装xlrd
sudo pip install xlrd
xlrd范例
localhost:Share 4aiur$ cat > bar.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from xlrd import open_workbook
workbook = open_workbook("foo.xls")
for sheet in workbook.sheets():
print("Sheet = \"%s\"" % (sheet.name))
for row_idx in range(sheet.nrows):
col_idx = 0
for row in sheet.row(row_idx):
print("(%d, %d) = %s" % (row_idx, col_idx, str(row.value)))
col_idx += 1
[mlf4aiur@4aiur excel]$ chmod +x bar.py
[mlf4aiur@4aiur excel]$ ./bar.py
Sheet = "Sheet1"
(0, 0) = 111.0
(0, 1) = 112.0
(1, 0) = 121.0
(1, 1) = 122.0
Sheet = "Sheet2"
(0, 0) = 211.0
(0, 1) = 212.0
(1, 0) = 221.0
(1, 1) = 222.0
Sheet = "Sheet3"
localhost:Share 4aiur$